EP3215963A1 - Data flow windowing and triggering - Google Patents

Data flow windowing and triggering

Info

Publication number
EP3215963A1
EP3215963A1 EP16741415.0A EP16741415A EP3215963A1 EP 3215963 A1 EP3215963 A1 EP 3215963A1 EP 16741415 A EP16741415 A EP 16741415A EP 3215963 A1 EP3215963 A1 EP 3215963A1
Authority
EP
European Patent Office
Prior art keywords
data
window
time
late
windows
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP16741415.0A
Other languages
German (de)
English (en)
French (fr)
Inventor
Tyler Akidau
Robert Bradshaw
Ben Chambers
Craig Chambers
Reuven Lax
Daniel Mills
Frances Perry
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/931,006 external-priority patent/US10037187B2/en
Application filed by Google LLC filed Critical Google LLC
Publication of EP3215963A1 publication Critical patent/EP3215963A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing

Definitions

  • This disclosure relates to data flow windowing and triggering.
  • One aspect of the disclosure provides a method for data flow windowing and triggering.
  • the method includes receiving data corresponding one of streaming data or batch data at data processing hardware, determining, using the data processing hardware, a content of the received data for computation, determining, using the data processing hardware, an event time of the data for slicing the data, and determining a processing time to output results of the received data using the data processing hardware.
  • the method also includes emitting at least a portion of the results of the received data based on the processing time and the event time.
  • Implementations of the disclosure may include one or more of the following optional features.
  • the method includes grouping, using the data processing hardware, the received data windows based on the event time.
  • the windows may include one of fixed windows, defined by a static time period, sliding windows defined by a time period and a slide period, session windows defined by a timeout gap, or user-defined windows defined by a pair of functions.
  • Each fixed window may be applied across all of the data within the associated time period.
  • Each sliding window may be applied across all of the data within the associated time period and associated with a start time separated from a start time of an immediately successive window by the slide period.
  • each session window may be applied across a subset of the data occurring within a span of time less than the associated timeout gap.
  • the method includes assigning, using the data processing hardware, a mergeable window for each element of the received data, each element including an associated input timestamp and each mergeable window extending a predefined range of time beyond the input timestamp for the associated window.
  • the method may also include merging, using the data processing hardware, two or more of the mergeable windows belonging to a same key that overlap into a single merged window, and setting, using the data processing hardware, an associated output timestamp for each element to a value greater than or equal to an earliest time in the associated merged window or the associated mergeable window.
  • the single merged window may include an associated range of time greater than the predefined range of time.
  • the method may include grouping, using the data processing hardware, the streaming data into windows and setting, using the data processing hardware, an input timestamp on an element of the streaming data.
  • the method may include determining, using the data processing hardware, the streaming data including late streaming data, and one of dropping the late streaming data or allowing the late streaming data by creating a duplicate window in an output for the late streaming data.
  • the method includes grouping, using the data processing hardware, a first subset of the received data into a window, the window defining a sub- event time of the data subset, aggregating, using the data processing hardware, a first result of the first data subset for the window, and determining, using the data processing hardware, a trigger time to emit the first aggregated result of the first data subset.
  • the trigger time may include at least one of: when a watermark reaches an end of the window; every threshold number of seconds of a walltime; after receiving a punctuation record that terminates the window; every threshold number of records; after arbitrary user logic decides to trigger; or after an arbitrary combination of concrete triggers.
  • the method may include discarding, using the data processing hardware, the first aggregated result from use when aggregating results of later subsets of the received data.
  • the method may also include storing a copy of the first aggregated result in a persistent state within memory hardware in communication with the data processing hardware, and refining, by the data processing hardware, a next aggregate result of a later subset with the first aggregated result.
  • the method may further include storing a copy of the first aggregated result in a persistent state within memory hardware in communication with the data processing hardware.
  • the method may include emitting a retraction of the first aggregated result and emitting a combined session result for the window.
  • the method includes receiving, at the data processing hardware, a late data point after grouping the first data subset into the window, the late data point related to the window, and discarding, using the data processing hardware, the late data point.
  • the method may also include receiving, at the data processing hardware, a late data point after grouping the first data subset into the window, the late data point related to the window, and accumulating, using the data processing hardware, the late data point into the window to refine the first aggregated result with the late data point.
  • the method may further include receiving, at the data processing hardware, a late data point after grouping the first data subset into the window, the late data point related to the window, aggregating, using the data processing hardware, a combined result of the first data subset and the late data point, and emitting the combined result.
  • the system includes data processing hardware and memory hardware in communication with the data processing hardware.
  • the memory hardware stores instructions that when executed on the data processing hardware, cause the data processing hardware to perform operations.
  • the operations include: receiving data corresponding to one of streaming data or batch data; determining a content of the received data for computation; determining an event time of the data for slicing the data; determining a processing time to output results of the received data; and emitting at least a portion of the results of the received data based on the processing time and the event time.
  • the operations further include grouping the received data into windows based on the event time.
  • the windows include one of fixed windows defined by a static time period, sliding windows defined by a time period and a slide period, session windows defined by a timeout gap, or user-defined windows defined by a pair functions.
  • Each fixed window may be applied across all of the data within the associated time period
  • each sliding window may be applied across all of the data within the associated time period and associated with a start time separated from a start time of an immediately successive window by the slide period
  • each session window may be applied across a subset of the data occurring within a span of time less than the associated timeout gap.
  • the operations may further include assigning a mergeable window for each element of the received data, each element including an associated input timestamp and each mergeable window extending a predefined range of time beyond the input timestamp for the associated window.
  • the operations may also include merging two or more of the mergeable windows belonging to a same key that overlap into a single merged window and setting an associated output timestamp for each element to a value greater than or equal to an earliest time in the associated merged window or the associated mergeable window.
  • the single merged window may include an associated range of time greater than the predefined range of time.
  • the operations may further include grouping, using the data processing hardware, the streaming data into windows and setting, using the data processing hardware, an input timestamp on an element of the streaming data.
  • the operations may include determining, using the data processing hardware, the streaming data comprises late streaming data, and one of: dropping the late streaming data or allowing the late streaming data by creating a duplicate window in an output for the late streaming data.
  • the operations further include grouping a first subset of the received data into a window, the window defining a sub-event time of the data subset, aggregating a first result of the first data subset for the window, and determining a trigger time to emit the first aggregated result of the first data subset.
  • the trigger time may include at least one of: when a watermark reaches an end of the window; every threshold number of seconds of a walltime; after receiving a punctuation record that terminates the window; every threshold number of records; after arbitrary user logic decides to trigger; or after an arbitrary combination of concrete triggers.
  • the operations may include discarding the first aggregated result from use when aggregating results of later subsets of the received data.
  • the operations may also include storing a copy of the first aggregated result in a persistent state within memory hardware in communication with the data processing hardware, and refining a next aggregate result of a later subset with the first aggregated result.
  • the operations may further include storing a copy of the first aggregated result in a persistent state within memory hardware in communication with the data processing hardware.
  • the operations may include emitting a retraction of the first aggregated result and emitting a combined session result for the window.
  • the operations include receiving a late data point after grouping the first data subset into the window, the late data point related to the window, and discarding the late data point.
  • the operations may also include receiving a late data point after grouping the first data subset into the window, the late data point related to the window, and accumulating the late data point into the window to refine the first aggregated result with the late data point.
  • the operations may further include receiving a late data point after grouping the first data subset into the window, the late data point related to the window, aggregating a combined result of the first data subset and the late data point, and emitting the combined result.
  • FIGS. 1 A and IB are schematic views of an example streaming computation system.
  • FIG. 2 is a schematic view of an example windowing Application
  • API Programming Interface
  • FIG. 3 is an example of fixed, sliding, and session windows.
  • FIG. 4 is an example plot of a window time domain skew.
  • FIG. 5 is an example of a window merging operation.
  • FIG. 6A is an example plot of window time domain skew for data point inputs.
  • FIG. 6B is an example plot showing an output result within a single global window.
  • FIG. 6C is an example plot showing output results accumulating over regions of processing time.
  • FIG. 6D is an example plot showing output results from independent regions of processing time.
  • FIG. 6E is an example plot showing output results from independent regions of processing time.
  • FIG. 6F is an example plot showing data point inputs grouped within fixed windows and output results emitted from the fixed windows as a watermark advances.
  • FIG. 6G is an example plot showing data point inputs grouped within fixed windows and output results emitted from the fixed windows in successive micro-batches.
  • FIG. 6H is an example plot showing a late data point updating an output result of a fixed window.
  • FIG. 61 is an example plot showing output results based on processing-time- based triggers.
  • FIG. 6J is an example plot showing data point inputs grouped within session windows and combined output results emitted from combined session windows.
  • FIG. 7 is a schematic view of an example computing device executing any systems or methods described herein.
  • Batch data processing is the execution of programs (aka jobs) on a computer without manual intervention, i.e., without human intervention.
  • the program parameters are predefined via scripts, command-line arguments, control files, or job control language.
  • a program takes a set of data files as input, and then processes the data before producing a set of output files.
  • batch processing refers to input data collected into batches or sets of records and each batch is processed as a unit.
  • the output is also a batch that is reused for computations.
  • Chopping up the data stream into finite pieces yields the opportunity to calculate precise results in a streaming fashion.
  • the programmer also has to solve the problem of where to slice up the data stream and when to emit the results.
  • Most streaming systems take the approach of automatically chopping the data stream up into fixed windows based on the time the data arrives in the system (e.g., the programmer requests five-minute windows, and the programmer buffers up five minutes of data as it is received and then processes the data). This approach has two major downsides.
  • a first downside unlike the event-time based windows in most batch processing systems, which accurately reflect the times at which events happened, is that the walltime windows reflect only the time that data arrived in the system.
  • the programmer has no way of generating customer windows for subsets of the data, e.g., per-user sessions that capture bursts of activity for a specific user. So the programmer can only support a subset of the use cases the programmer could in batch.
  • MillWheel and now WindMill, the Dataflow streaming backend
  • WindMill the Dataflow streaming backend
  • MillWheel' s API allows the programmer to buffer data in arbitrary ways based on event time, emitting results whenever the programmer deems useful, including after periods of walltime like other systems, but also in a data driven manner (e.g., receipt of a punctuation record) or after the system believes all data up to a given event time has been received (watermarks/cursors).
  • the programmer can build a streaming data processing system with MillWheel that calculates exact results and completely replaces a batch system generating the same output, but with much lower latency.
  • the big downside of the MillWheel API is that it is very low level. It provides all the right building blocks, but does not abstract them in a way that makes it easy for a programmer to write new computations, or compose existing libraries to build new computations. Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Flume has a simple and flexible architecture based on streaming data flows. In addition, Flume architecture is much more high level than MillWheel architecture, making it very easy to link and compose computational building blocks into something powerful but understandable. However, the batch Flume API does not really fit well with the streaming paradigm, because it has no notion of how to chop up unbounded streams of data for processing. Therefore, there is a need for APIs that chop up unbounded streams of data for processing (and the underlying architecture that supports them).
  • a streaming computation system 100 includes an aggregation API 200, a windowing API 300, and a triggers API 400, where each API focuses on a separate portion of the streaming computation process.
  • the aggregation API 200 focuses on what the programmer is computing, for example, a sum, or a list of top N values.
  • the windowing API 300 focuses on where (in event time) the programmer chooses to slice up the unbounded stream of data 10 (e.g., fixed windows 330 or sessions 350 (FIG. 3)).
  • the triggers API 400 focuses on when (in processing time) the programmer chooses to emit the aggregate results 20 for a given window of data 10.
  • the aggregation API 200 is essentially the batch API that already exists in Flume.
  • the programmer defines what computation to perform as data 10 comes in, and generate a result 20 in response thereto.
  • the windowing API 300 allows the programmer to define which windows a given datum (from the entered data 10) falls into.
  • the windowing API 300 allows the programmer to merge windows, which allows the programmer to build up dynamic, data-driven windows like sessions.
  • the triggers API 400 then allows the programmer to define when the aggregate results 20 for a window are emitted.
  • Examples might be: when the watermark has reached the end of the window (the canonical time- based aggregation model in MillWheel); every N seconds of walltime (e.g., for a system that cares more about freshness than completeness in results 20); after receiving a punctuation record that terminates the window; every threshold number of records; after arbitrary user logic decides to trigger; or any arbitrary combination of concrete triggers (e.g., initially when the watermark reaches the end of the window, and then once every minute any time late data 20 behind the watermark arrives, allowing for data 20 to be updated or changed after the fact).
  • the canonical time- based aggregation model in MillWheel every N seconds of walltime (e.g., for a system that cares more about freshness than completeness in results 20); after receiving a punctuation record that terminates the window; every threshold number of records; after arbitrary user logic decides to trigger; or any arbitrary combination of concrete triggers (e.g., initially when the water
  • the streaming computation system 100 provides implementation clarity, because when implementing a function for one of the three APIs 200, 300, 400, the programmer focuses simply on the specific task at hand (Aggregation, Windowing, or Triggering), which is an improvement over prior system such as
  • the streaming computation system 100 may execute on data processing hardware 710 (FIG. 7) executing on a computing device 700 (FIG. 7).
  • the streaming computation system 100 provides composability, because the programmer can mix and match functions from the three APIs 200, 300, 400 to get the precise type of computation needed.
  • An aggregation function 210 to compute a sum can be used with a windowing function 310 to build sessions and a trigger function 410 to produce results 20 when the watermark reaches the end of the window.
  • the same aggregation function 210 can be used to calculate sums over fixed windows of time, each containing ten records, just by changing the windowing and trigger functions 310, 410. Therefore, the streaming computation system 100 (which works in batch mode), allows a programmer to build complex, yet understandable and maintainable systems, that precisely calculate the results 20 that the programmer wants. Therefore, the programmer can write a code using the streaming computation system 100, and allow the system 100 to execute in streaming mode to get low latency results, or in batch mode to do massive scale backfills or perform some one-off calculations.
  • the system 100 provides multiple benefits including, but not limited to, decomposition of the streaming computation into three axes of what (aggregation API 200), where in event time (windowing API 300), and when (triggers API 400), with attendant APIs and (non-trivial) implementations, and unification of batch and streaming semantics under one common umbrella.
  • Windowing API 300 [0045] Referring to FIG. 2, the windowing API 300 groups streaming data 10 into finite windows 22 (fixed windows 330, sessions 350, and sliding windows 340 (FIG. 3)) for further processing and aggregation.
  • the windowing API 300 may also group streaming data 10 into user-defined windows defined by a pair of functions.
  • the pair of functions may include (1) assign Windows to assign a given element to a set of windows; and (2) mergeWindows to optionally merge a specified subset of windows at grouping time. Windowing slices up a dataset 10 into finite chunks for processing as a group.
  • windowing is required for some operations (to delineate finite boundaries in most forms of grouping: aggregation, outer joins, time- bounded operations, etc.), and unnecessary for others (filtering, mapping, inner joins, etc.).
  • windowing is essentially optional, though still a semantically useful concept in many situations (e.g. back-filling large scale updates to portions of a previously computed unbounded data source).
  • Windowing is effectively always time based; while many systems support tuple-based windowing, this is essentially time-based windowing over a logical time domain where elements in order have successively increasing logical timestamps. Windows may be either aligned, i.e.
  • FIG. 3 highlights three of the major types of windows encountered when dealing with unbounded data.
  • Fixed windows 330 are defined by a static window size, e.g. hourly windows or daily windows. They are generally aligned, i.e. every window applies across all of the data 10 for the corresponding period of time. For the sake of spreading window completion load evenly across time, they are sometimes unaligned by phase shifting the windows for each key by some random value.
  • Sliding windows 320 are defined by a window size and slide period, e.g. hourly windows starting every minute. The period may be less than the size, which means the windows may overlap. Sliding windows are also typically aligned; even though the diagram is drawn to give a sense of sliding motion, all five windows would be applied to all three keys in the diagram, not just Window 3. Fixed windows are really a special case of sliding windows where size equals period.
  • Sessions 330 are windows that capture some period of activity over a subset of the data, in this case per key. Typically they are defined by a timeout gap. Any events that occur within a span of time less than the timeout are grouped together as a session. Sessions are unaligned windows. For example, Window 2 applies to Key 1 only, Window 3 to Key 2 only, and Windows 1 and 4 to Key 3 only.
  • Event Time is the time at which the event itself actually occurred, i.e. a record of system clock time (for whatever system generated the event) at the time of occurrence.
  • Processing Time is the time at which an event is observed at any given point during processing within the pipeline, i.e. the current time according to the system clock. Note that we make no assumptions about clock synchronization within a distributed system.
  • Event time for a given event essentially never changes, but processing time changes constantly for each event as it flows through the pipeline and time marches ever forward. This is an important distinction when it comes to robustly analyzing events in the context of when they occurred.
  • Watermarks do, however, provide a useful notion of when the system thinks it likely that all data up to a given point in event time have been observed, and thus find application in not only visualizing skew, but in monitoring overall system health and progress, as well as making decisions around progress that do not require complete accuracy, such as basic garbage collection policies.
  • FIG. 4 shows an example time domain skew where the X-axis denotes "event time” and the Y-axis denotes "processing time”.
  • the X-axis denotes "event time”
  • the Y-axis denotes "processing time”.
  • an actual watermark starts to skew more away from an ideal watermark as the pipeline lags, diving back close to the ideal water mark at event time around 12:02, then lagging behind again noticeably by the time 12:03 rolls around.
  • This dynamic variance in skew is very common in distributed data processing systems, and will play a big role in defining what functionality is necessary for providing correct, repeatable results.
  • ParDo is for generic parallel processing.
  • Each input element to be processed (which itself may be a finite collection) is provided to a user-defined function (called a DoFn in Dataflow), which can yield zero or more output elements per input.
  • DoFn in Dataflow a user-defined function
  • GroupByKey is for key-grouping (key, value) pairs. As for example
  • the ParDo operation operates element-wise on each input element, and thus translates naturally to unbounded data.
  • the GroupByKey operation collects all data for a given key before sending them downstream for reduction. If the input source is unbounded, we have no way of knowing when it will end. The common solution to this problem is to window the data.
  • Elements are provided to the system with event-time times-tamps (which may also be modified at any point in the pipeline), and are initially assigned to a default global window, covering all of event time, providing semantics that match the defaults in the standard batch model.
  • window assignment creates a new copy of the element in each of the windows to which it has been assigned. For example, consider windowing a dataset by sliding windows of two-minute width and one-minute period, as shown below (for brevity, timestamps are given in HH:MM format),
  • each of the two (key, value) pairs is duplicated to exist in both of the windows that overlapped the element's timestamp. Since windows are associated directly with the elements to which they belong, window assignment may happen anywhere in the pipeline before grouping is applied. This is important, as the grouping operation may be buried somewhere downstream inside a composite transformation (e.g. Sum. integersPerKey ()) .
  • Window merging occurs as part of the GroupByKey And- Window operation, and is best explained in the context of the example window merging operation of FIG. 5.
  • FIG. 5 uses window sessions 500 (also referred to as "session windowing") for four example data points, three for kl and one for k2, as they are windowed by session, with a 30-minute session timeout. All are initially placed in a default global window by the system.
  • the sessions implementation of Assign Windows puts each element into a single window that extends 30 minutes beyond its own times-tamp; this window denotes the range of time into which later events can fall if they are to be considered part of the same session.
  • the GroupByKey AndWindow operation may commence, which is really a five-part composite operation:
  • MergeWindows Merges the set of currently buffered windows for a key.
  • the actual merge logic is defined by the windowing strategy.
  • the windows for vl and v4 overlap, so the sessions windowing strategy merges them into a single new, larger session, as indicated in bold.
  • Group AlsoBy Window For each key, groups values by window. After merging in the prior step, vl and v4 are now in identical windows, and thus are grouped together at this step.
  • ExpandToElements Expands per-key, per- window groups of values into (key, value, event time, window) tuples, with new per- window timestamps. In this example, the timestamp is set to the end of the window, but any timestamp greater than or equal to the timestamp of the earliest event in the window is valid with respect to watermark correctness.
  • a second example can be accomplished using windowed sessions with a 30- minute timeout as in FIG. 5 using a single window.into call before imitating the summation as per the example below.
  • the windowing API 300 supports Cloud Dataflow for both streaming and batch modes.
  • Windowing API semantics may include a high-level model of windowing such as, but not limited to, Window.into which assigns elements into a set of windows, and GroupByKey that treats the windows on the input elements as secondary keys, and so groups by (key, window) pairs.
  • Window.into assigns elements into a set of windows
  • GroupByKey that treats the windows on the input elements as secondary keys, and so groups by (key, window) pairs.
  • G is the global window
  • GBF is the global WindowingFn
  • [t 1 , t 2 ) is an Intervalbucket representing that time interval.
  • mapping is:
  • the windowing API 300 includes a windowing interface 320.
  • the windowing interface 320 includes a timestamp setter function 322 and a window accessor function 324.
  • the timestamp setter function 322 updates the timestamp in step context before outputting the element.
  • An example of the timestamp setter function 322 may include:
  • Option 1 322a ask a user to provide how much the timestamps will be shifted backward.
  • Option 2 322b force users to set OutputTimestampMode if
  • DoFn.getOutputTimestampMode returns UNBOUNDED PAST mode, which is not allowed in DoFnContext. outputWithTimestamp for streaming mode. class DoFn ⁇
  • the window accessor function 324 (e.g., DoFn.ProcessContext.windows() ) is one way of accessing windows, but it really only makes sense to access them after a GroupByKey, and in that case, each element will only be in a single window.
  • the windowing API 300 uses triggers to handle late data. Without triggers, the windowing API 300 uses two possible methods to handle the late data.
  • the windowing API 300 can drop late data that would not be grouped into the correct window, or the windowing API 300 can allow late data to create duplicate windows in the output of GroupByKey AndWindows.
  • the windowing API 300 can either pick one of the options, or allows the options to be configurable at either the pipeline level or on Window transforms (essentially resulting in a very poor approximation/subset of triggers).
  • a WindowingFn will be deterministic if whenever a window is ready to be emitted, any windows that it might merge with must already be known, and it must merge with all of them.
  • the system 100 provides batch support through shuffle.
  • the system 100 processes all KVs for a given key on the same worker following the logical time order of the element. Then, the worker can leverage the current streaming code, and process the data as if they are coming from streaming.
  • the system 100 performs the following to support the batch through shuffle: 1)
  • ShuffleSink encodes timestamp and windows into ShuffleEntry. value, and use the timestamp as the keys (sorting key). 2) Create a SortedShuffleSource reads all KVs for the same key, and return the result with the following interface:
  • the ability to build unaligned, event-time windows is an improvement, but two more shortcomings need be addressed.
  • the system 100 needs to provide support for tuple- and processing-time-based windows, otherwise windowing semantics will regress relative to other systems in existence.
  • the system 100 must know when to emit the results 20 for a window. Since the data 10 including multiple data points are unordered with respect to event time, the system 100 requires some other signal to notify when the window ends.
  • the problem of tuple- and processing-time-based windows is address below, after the system 100 builds up a solution to the window completeness problem. As to window completeness, an initial inclination for solving it might be to use some sort of global event-time progress metric, such as watermarks. However, watermarks themselves have two major shortcomings with respect to correctness.
  • the watermark can be held back for the entire pipeline by a single slow datum. And even for healthy pipelines with little variability in event-time skew, the baseline level of skew may still be multiple minutes or more, depending upon the input source. As a result, using watermarks as the sole signal for emitting window results 20 is likely to yield higher latency of overall results than, for example, a comparable Lambda Architecture pipeline.
  • the system 100 postulates that watermarks alone are insufficient.
  • a useful insight in addressing the completeness problem is that the Lambda Architecture effectively sidesteps the issue: it does not solve the completeness problem by somehow providing correct answers faster; it simply provides the best low-latency estimate of a result that the streaming pipeline can provide, with the promise of eventual consistency and correctness once the batch pipeline runs.
  • Output from the batch job is only correct if input data 10 is complete by the time the batch job runs; if data 10 evolves over time, this must be detected and the batch jobs re-executed. From within a single pipeline (regardless of execution engine), then the system 100 will need a feature to provide multiple answers (or panes) for any given window. This feature includes triggers or trigger times that allow the specification of when to trigger the output results 20 for a given window.
  • Triggers are a mechanism for stimulating the production of
  • GroupByKeyAndWindow results 20 in response to internal or external signals. They are complementary to the windowing model, in that they each affect system behavior along a different axis of time. Windowing determines where in event time data 10 are grouped together for processing. Triggering determines when in processing time the results 20 of groupings are emitted as panes. Specific triggers, such as watermark triggers, make use of event time in the functionality they provide, but their effects within the pipeline are still realized in the processing time axis.
  • the system 100 provides predefined trigger implementations for triggering at completion estimates (e.g. watermarks, including percentile watermarks, which provide useful semantics for dealing with stragglers in both batch and streaming execution engines when processing a minimum percentage of the input data 10 quickly is more desirable than processing every last piece of it), at points in processing time, and in response to data 10 arriving (counts, bytes, data punctuations, pattern matching, etc.).
  • the system 100 supports composing triggers into logical combinations (and, or, etc.), loops, sequences, and other such constructions.
  • users may define their own triggers utilizing both the underlying primitives of the execution runtime (e.g. watermark timers, processing-time timers, data arrival, composition support) and any other relevant external signals (data injection requests, external progress metrics, RPC completion callbacks, etc.).
  • the triggers API 400 provides a way to control how multiple panes for the same window relate to each other, via three different refinement modes:
  • the first refinement mode is discarding: Upon triggering, window contents are discarded, and later results 20 bear no relation to previous results 20. This mode is useful in cases where the downstream consumer of the data (either internal or external to the pipeline) expects the values from various trigger fires to be independent (e.g. when injecting into a system that generates a sum of the values injected). It is also the most efficient in terms of amount of data 20 buffered, though for associative and commutative operations which can be modeled as a Dataflow Combiner, the efficiency delta will often be minimal. For the video sessions use case, this is not sufficient, since it is impractical to require downstream consumers of the data 10 to stitch together partial sessions.
  • the second refinement mode is accumulating: Upon triggering, window contents are left intact in persistent state, and later results 20 become a refinement of previous results 20. This is useful when the downstream consumer expects to overwrite old values with new ones when receiving multiple results 20 for the same window, and is effectively the mode used in Lambda Architecture systems, where the streaming pipeline produces low-latency results, which are then overwritten in the future by the results 20 from the batch pipeline. For video sessions, this might be sufficient if the system 100 is simply calculating sessions and then immediately writing them to some output source that supports updates (e.g. a database or key/value store).
  • some output source e.g. a database or key/value store
  • the third refinement mode is accumulating & retracting: Upon triggering, in addition to the Accumulating semantics, a copy of the emitted value is also stored in persistent state. When the window triggers again in the future, a retraction for the previous value will be emitted first, followed by the new value as a normal datum.
  • a simple implementation of retraction processing requires deterministic operations, but non-determinism may be supported with additional complexity and cost; we have seen use cases that require this, such as probabilistic modeling. Retractions are necessary in pipelines with multiple serial GroupByKeyAnd-Window operations, since the multiple results generated by a single window over subsequent trigger fires may end up on separate keys when grouped downstream.
  • Dataflow Combiner operations that are also reversible can support retractions efficiently via an uncombine method. For video sessions, this mode is the ideal. If the system 100 is performing aggregations downstream from session creation that depend on properties of the sessions themselves, for example, by detecting unpopular ads (such as those which are viewed for less than five seconds in a majority of sessions), initial results 20 may be invalidated as inputs that evolve over time, e.g. as a significant number of offline mobile viewers come back online and upload session data. Retractions provide a way for us to adapt to these types of changes in complex pipelines with multiple serial grouping stages. Some specific implementations of the Trigger system are discussed below.
  • the triggers API 400 provides a structured, composable a way of expressing when (in processing time) the results 20 of an aggregation should be emitted within Dataflow/Streaming Flume.
  • the triggers API 400 works in conjunction with the aggregation API 200 and the windowing API 300, which respectively allow the expression of what the results 20 of an aggregation are, and where (in event time) the aggregations are performed.
  • the triggers API 400 aims to address a number of shortcomings in the existing Streaming Flume/Dataflow APIs relative to standard MillWheel. Some of these shortcomings include:
  • Late Data - Streaming Flume users are not able to manage late data (i.e. data that arrives behind the watermark).
  • Current systems just drop the late data, which is impractical, even in the short-term.
  • Walltime timers provide a way to provide regular updates containing whatever data has been received thus far, despite how fast or slow the rest of the pipeline may be operating currently.
  • Data-Driven Aggregations Another class of aggregations that does not require watermarks is those driven by the data themselves, e.g. hash joins or byte-limited aggregations. Many of these patterns are supported using the existing Streaming Flume APIs (via custom WindowFns and/or the State API), but it may be desirable to incorporate them with a generalized aggregation trigger API, since this would open the possibility of composing data-driven triggers with other triggers (e.g. a hash join that times out after a walltime delay; currently you can only use a streamtime delay).
  • a hash join that times out after a walltime delay; currently you can only use a streamtime delay.
  • MillWheel provides watermarks (or cursors) as way of reasoning about completeness of data in a streaming pipeline. By default, watermarks estimate the point in time up to which all data for a given stream has been received or processed. This allows time-boundary aggregations to be performed only once the system 100 believes it has seen all the relevant data.
  • Percentile Watermark MillWheel also supports the notion of percentile watermarks, which give you a watermark estimating the time up to which some specific subset of the data (e.g. 95%) the system 100 has processed.
  • the system 100 may use percentile watermarks instead of the normal watermark to provide speculative results. This can be used to provide results faster, with some decreased amount of confidence.
  • a given computation can currently only make use of only one type of cursor (100% or a single, cell-specific percentile). So providing a complex, tiered set of speculative results is laborious from a configuration perspective, and currently impossible beyond two tiers.
  • Walltime Aggregation While watermarks are the most common way of triggering aggregations in MillWheel, there are cases where other types of triggers are more practical. In cases where timeliness of data is more important that any specific notion of completeness, walltime timers may be used to provide periodic updates of the data aggregated thus far. This ensures that a programmer gets timely updates, even in the face of watermark lags due to a small portion of the data being noticeably behind the rest.
  • Data-Driven Aggregation Moreover, there exists a whole class of non-time- based aggregations. Examples are hash-joins, aggregations bounded by a number of records or bytes, or aggregations triggered on some feature of the data themselves (e.g. a specific field of a datum having a certain value).
  • Composite Aggregation In some examples, it is fairly common to want to compose multiple types of aggregation. Often times, a hash join will have a timeout, in such example, the current system 100, Streaming Flume with streamtime timeouts, may be used, but not walltime. In some examples, the programmer wants to receive a single initial aggregation when the watermark reaches 100%, then periodic (based on walltime) updates when late data arrive. Speculative data is essentially another type of composite aggregation (one each for the desired percentile watermark values).
  • the problem of composing aggregations be it for late data 10, speculative data, or some other custom composition, then begs the question: how do you provide refinements to the results of an aggregation as your notion of a dataset changes over time?
  • Option 1 Provide multiple versions of aggregations and ways to manage them. When providing multiple versions, there are two modes the system 100 may support. In a first mode, the subsequent aggregations incorporate all the data 10 seen thus far. In this case, new aggregates 20 would simply replace old aggregates 20. In a second mode, subsequent aggregations 20 incorporate only new data 10 since the last aggregate 20. In this case, new aggregates 20 would have to be manually combined with previous aggregates 20, if desired and/or feasible.
  • the first and second options include cleaning services that have pros and cons.
  • the pros may include, but not limited to: the API sating clean (different versions of the aggregate still have the same type); the user specifies their aggregation logic once, and the system takes care of applying it multiple times as needed; since the system already provides for multiple versions of aggregations (differentiated by timestamp) with windowing in Streaming Flume, so extending versions to a new dimension is relatively natural: (1A) updated aggregates 20 are immediately usable with no extra work from the user; and (IB) no need to keep aggregation state around for some late data horizon. Cons include (1 A) the aggregation state must be kept around until late data is no longer allowed. For log sources, this would be two days until goldenization to be 100% correct.
  • Non-combiner aggregations require storing the entire input data set up until the time horizon. This yields an overall data storage size of:
  • option 2 provides an initial aggregation and access the initial aggregation to raw subsequent data 10 (i.e. "deltas").
  • This option includes pros such as, but not limited to: the aggregation state does not have to be kept around. As for cons, the API is more complicated; aggregate and delta may have different types. Is your output from the operation now a Pair ⁇ Aggregate, Delta>? Or do you require the user to fork their code paths? This kills atomicity; user must specify their aggregation logic once for the initial aggregate, then a second time for incorporating delta updates. Many types of aggregations do not support updates via deltas, and thus would not work with this scheme.
  • options #1 A and #1B are solutions that the system 100 may execute for triggering:
  • the system 100 modifies the window the call Window.into to allow users to also specify the triggers that dictate when aggregates 20 are emitted, as well as the way subsequent aggregates 20 relate to previous aggregates 20:
  • TriggerStrategy object is essentially a tuple of named values:
  • Accumulation mode Dictates whether later aggregates 20 include data 10 from the previous aggregates 20 or not (i.e. whether the contents of a window are cleared when the window is triggered).
  • APIs, High level the system 100 provides a high-level way to describe when aggregates 20 should be produced during windowing within a GroupByKey operation, as well as how multiple versions of an aggregate 20 relate to each other and whether incremental updates would be performed, via modified windowBy/Window.into operations:
  • TriggerStrategy is a roughly Tuple ⁇ Trigger, AccumlationMode,
  • IncrementalMode> .
  • a trigger is essentially a DoFn-like class with methods that are called by the system 100 at specific points during windowing. Those methods take various arguments about the window(s) and values in question as input, may manipulate per- window-and- trigger persistent state and times, and may emit trigger signals to indicate when a window's contents should be emitted downstream. More details on the API for implementing Triggers are included below in the implementation section.
  • the triggers library contains both simple and composite triggers (though the distinction between them is largely semantic).
  • Example simple triggers include:
  • watermark percentile is reached for the end of the window, with percentile in (0.0, 100.0]. Under the covers, these would be implemented via watermark timers. Note that late windows would by definition not fire this type of Trigger.
  • TimeDomain can be STREAM TFME or WALL TIME. Under the covers, these would be implemented via watermark or walltime timers.
  • TimeDomain can be STREAM TEVIE or WALL TIME. Under the covers, these would be implemented via watermark or walltime timers.
  • Example composite triggers include:
  • HashJoinTrigger() // Implements hash join logic as a Trigger. new AfterDelayQ, TimeUnit.HOURS, TimeDomain. WALL TFME)));
  • the AccumulationMode enum may have four possible values:
  • IncrementalMode supports values ENABLED or DISABLED. If enabled, the system would support reversing the effects of previous aggregate values in downstream aggregations via anti-data (e.g. data that are flagged as being used to reverse effects from previously emitted aggregates). This feature is complex enough it warrants its own design doc, and not be included in any of the initial Dataflow or Flume implementations.
  • the results of the GroupByKey may include multiple versions of any given aggregate. These versions would be distinguishable by their production time values, as well as the associated trigger that generated them (as described further in the Low-level API section below).
  • windowBy The one-parameter version of windowBy would be deprecated in an attempt to force the user to explicitly think about when it is appropriate for their aggregations to be emitted. While it remained, it would be implemented in such a way as to provide the original semantics of emitting only at the 100% watermark, with all subsequent late data dropped, e.g.:
  • Processing Context API The standard ExecutionContext/ProcessingContext classes may gain some new methods to provide low-level, per-value metrics to reason about multiple versions of aggregates.
  • ⁇ Integer ExecutionContext.getWatermarkPercentile() Provides the watermark percentile for any value in the system. This will be an integer in [0, 100], or null if the value was produced behind the 100% output watermark (i.e. the value is late) .
  • Trigger ExecutionContext.getTrigger() Provides the Trigger (if any) that
  • the full aggregate value of the window may be accessed via Window.
  • peekValue() which may be expensive if not using an AggrFn.
  • CLEAR onMerge - Called immediately after window merging has occurred.
  • onTimer - Called when a timer set by the trigger fires Provided with the window and the timer tag, instant, and domain. May read/write per-tag state and for the window. May inspect the current time in all time domains. May set/delete per-tag timers for the window. May trigger and clear the window value. May mark the trigger done. void onTimer(OnTimerContext ctx);
  • CompositeTrigger class which provides a superset of the functionality of Trigger (and indeed is an actual superclass of it).
  • Each context in CompositeTrigger would support one or two additional operations:
  • invokeChild Invokes the current callback on the given child trigger. Available in all operations (onDatum, onMerge, onTimer, reset). Under the covers, keeps track of the lineage up to the current child, using that lineage to provide unique namespaces for all state and timers manipulated by any given child. Also allows void invokeChild(Trigger trigger);
  • triggerHi story Return the sequence of child triggers within which the ctx.trigger() method has been invoked during the lifetime of this callback as a list of TriggerEvent objects (which capture the invoking trigger and whether a clear was requested). Available in all operations whose context class includes a trigger method (onDatum, onMerge, onTimer). Note that the triggers returned by triggerHi story are strictly among the direct descendants of this specific trigger (e.g. grandchild triggers will never directly show up in the results of this function call, though they may result in a child trigger showing up).
  • Composite Trigger provides a fourth callback that allows a parent to hook into a child's timer callback, since timers are scoped to a specific trigger, but may have implications for a parent:
  • the child trigger, and the timer tag, instant, and domain. May read/write its own per-tag state and for the window. May inspect the current time in all time domains. May inspect/set/delete its own per-tag timers for the window. May trigger and clear the window value. May mark the trigger done. May invoke the child timer. May inspect any trigger calls made by the child,
  • On-disk state Triggers store the following on-disk state.
  • the system 100 may follow the directions of the accumulation mode setting for the current TriggerStrategy when deciding whether to automatically clear the window value on trigger calls and whether to obey clear calls from the Trigger implementations.
  • Incremental Mode Anti-data consisting of the previous value for a window would be generated any time a window is triggered.
  • Trigger metadata will be added when a trigger fires.
  • Anti-data are tagged as such when emitted.
  • TriggerSets containing both watermark and walltime timers.
  • WindMill is built with support for multiple timer managers, and should be able to support the watermark+walltime feature out of the box. Support for multiple watermark percentiles shouldn't be too much more difficult. MillWheel may need a refactoring of the timer manager code to support either feature.
  • T> extends Trigger ⁇ B, T> ⁇
  • threshold threshold
  • T> extends Trigger ⁇ B, T> ⁇
  • Timer timer getSourceTimer( window, "delay”);
  • AtWatermark class AtWatermark ⁇ B, T> extends Trigger ⁇ B, T> ⁇
  • ResultldOdd class ResultIsOdd ⁇ B, T extends Long> extends Trigger ⁇ B, T> ⁇ public ResultIsOdd() ⁇ ⁇
  • FirstOf class FirstOf extends CompositeTrigger ⁇
  • triggers triggers
  • SequenceOf class SequenceOf ⁇ B, D> extends CompositeTrigger ⁇ B, D> ⁇ private final TriggerFn[] triggers; public SequenceOf(TriggerFn... triggers) ⁇
  • TriggerHi story history ctx.triggerHistory()
  • Integer index ctx.lookupState("index", INT CODER);
  • FIGS. 6A-6I show example plots 600, 600a-i that highlight a plurality of useful output patterns supported by the system 100.
  • the example plot 600 is illustrated in the context of the integer summation pipeline.
  • each data point associated with a small integer value and analyzed by the system 100 in the context of both bounded and unbounded data sources.
  • FIG. 6 A is an example plot 600 showing a window time domain skew for the data point inputs of the received data 10.
  • the X axis plots the data 10 in event time (i.e. when the events actually occurred), while the Y axis plots the data 10 in processing time (i.e. when the pipeline observes them). All the plots 600, 600a-i assume execution on the streaming engine unless otherwise specified.
  • plots 600 will also depend on watermarks when included in the plots 600.
  • the plots 600 show an ideal watermark and an example actual watermark.
  • skew is a common occurrence; this is exemplified by the meandering path the actual watermark takes from the ideal watermark, as shown in the plot 600a of FIG. 6A.
  • the heuristic nature of this watermark is exemplified by the single "late" datum (e.g., data point) with value 9 that appears behind the watermark.
  • the system 100 would wait for all the data 10 to arrive, group the data 10 together into one bundle (since these data points are all for the same key), and sum their values to arrive at a total result of 51.
  • the plot 600b of FIG. 6B shows this result represented by the darkened rectangle, whereat the area covers the ranges of event time and processing time included in the sum (with the top of the rectangle denoting when in processing time the result was materialized). Since classic batch processing is event-time agnostic, the result 20 is contained within a single global window covering all of event time. And since outputs are only calculated once all inputs (e.g., data 10) are received, the result 20 covers all of processing time for the execution.
  • the system converts the pipeline to run over an unbounded data source.
  • the default triggering semantics are to emit windows when the watermark passes them. But when using the global window with an unbounded input source, the triggering semantics will not emit windows when the watermark passes since the global window covers all of event time. As such, the system 100 needs to either trigger by something other than the default trigger, or window by something other than the global window. Otherwise, the system 100 will not produce an output result 20.
  • changing the trigger allows the system 100 to generate conceptually identical outputs (a global per-key sum over all time), but with periodic updates.
  • the system 100 applies a Window.trigger operation that repeatedly fires on one-minute periodic processing-time boundaries.
  • the system 100 may specify the Accumulating mode so that the global sum will be refined over time (this assumes the system 100 includes an output sink into which the system 100 may overwrite previous results for the key with new results, e.g. a database or key/value store).
  • the system 100 generates updated global sums once per minute of processing time. Note how the semi-transparent output rectangles (e.g., windows) overlap, since Accumulating panes build upon prior results by incorporating overlapping regions of processing time:
  • plot 600d of FIG. 6D shows the system 100 generating the delta in sums once per minute by switching to the Discarding mode. Note that by switching to the Discarding mode, the system 100 effectively gives the processing-time windowing semantics provided by many streaming systems. The output panes no longer overlap, since their results incorporate data from independent regions of processing time.
  • the system 100 may consider one more changes to the triggers for this pipeline.
  • the system 100 may model tuple-based windows by simply changing the trigger to fire after a certain number of data arrive, say two.
  • the plot 600e shows five output results from independent regions of processing time. For instance, each output results contains the sum of two adjacent (by processing time) data point inputs.
  • More sophisticated tuple-based windowing schemes e.g. sliding tuple-based windows
  • system 100 may window (e.g., via the Windowing API 300) the data 10 into fixed, two-minute Accumulating windows:
  • the watermark trigger fires when the watermark passes the end of the window in question.
  • Both batch and streaming engines implement watermarks, as detailed below.
  • the Repeat call in the trigger is used to handle late data; should any data arrive after the watermark, they will instantiate the repeated watermark trigger, which will fire immediately since the watermark has already passed.
  • the plots 600f-600h each characterize this pipeline on a different type of runtime engine.
  • the system 100 first observes what execution of this pipeline would look like on a batch engine.
  • the data source would have to be a bounded one, so as with the classic batch example above, the system 100 would wait for all data 10 in the batch to arrive. Thereafter, the system 100 would then process the data in event-time order by emitting windows as the simulated watermark advances, as in the example plot 600f of FIG. 6F.
  • the system 100 When executing a micro-batch engine over this data source with one minute micro-batches, the system 100 would gather input data 10 for one minute, process the data 10, and repeat. Each time, the watermark for the current batch would start at the beginning of time and advance to the end of time (technically jumping from the end time of the batch to the end of time instantaneously, since no data would exist for that period). The system 100 ends up with a new watermark for every micro-batch round, and corresponding outputs for all windows whose contents had changed since the last round. This provides a very nice mix of latency and eventual correctness, as in the example plot 600g of FIG. 6G.
  • the plot 600h of FIG. 6H shows a late data point updating an output result of a fixed window. While most windows emit their associated data points when the water mark passes, the system 100 receives datum (e.g., data point) with value 9 late relative to the watermark. For whatever reason (mobile input source being offline, network partition, etc.), the system 100 did not realize that the datum with value 9 had not yet been injected, and thus, having observed the datum with value 5 associated with the same window (for event-time range [12:00, 12:02]), allowed the watermark to proceed past the point in event time that would eventually be occupied by the datum with value 9. Hence, once the datum with value 9 finally arrives, it causes the first window (for event-time range [12:00, 12:02]) to retrigger with an updated sum.
  • datum e.g., data point
  • plot 600i shows output results based on processing-time-based triggers to yield somewhat better latency than the micro-batch pipeline of plot 600h, since the data points of the received data accumulate in windows as they arrive instead of being processed in small batches.
  • the choice between them really becomes just a matter of latency versus cost, which is exactly one of the goals the system 100 may achieve based on the following model.
  • the plot 600j shows the data points of the received data 10 grouped within session windows and combined output results emitted from combined window sessions.
  • the system 100 may satisfy the video sessions requirements (modulo the use of summation as the aggregation operation, maintained for diagrammatic consistency; switching to another aggregation would be trivial), by updating to session windowing with a one minute timeout and enabling retractions.
  • the system 100 outputs initial singleton sessions for values 5 and 7 at the first one-minute processing-time boundary.
  • the system 100 outputs a third session with value 10, built up from the values 3, 4, and 3.
  • the value of 8 joins the two sessions with values 7 and 10.
  • the system 100 emits retractions for the values 7 and 10 sessions, as well as a normal datum for the new session with value 25.
  • the datum with value 9 arrives (late), it joins the session with value 5 to the session with value 25.
  • the repeated watermark trigger then immediately emits retractions for both value 5 and value 25, followed by a combined session of value 39.
  • a similar execution occurs for the data points with values 3, 8, and 1, ultimately ending with a retraction for an initial value 3 session, followed by a combined session value 12.
  • FlumeJava may implement the system 100, with MillWheel used as the underlying execution engine for streaming mode; additionally, an external
  • MillWheel pipelines calculate aggregate statistics (e.g. latency averages). For them, 100% accuracy is not required, but having a largely complete view of their data in a reasonable amount of time is. Given the high level of accuracy we achieve with watermarks for structured input sources like log files, such customers find watermarks very effective in triggering a single, highly-accurate aggregate per window. Watermark triggers are highlighted in the plot 600h of FIG. 6H.
  • a number of abuse detection pipelines run on MillWheel. Abuse detection is another example of a use case where processing a majority of the data quickly is much more useful than processing 100% of the data more slowly. As such, they are heavy users of MillWheel's percentile watermarks, and were a strong motivating case for being able to support percentile watermark triggers in the model.
  • Flume Java has a custom feature that allows for early termination of a job based on overall progress.
  • One of the benefits of the unified model for batch mode is that this sort of early termination criteria is now naturally expressible using the standard triggers mechanism, rather than requiring a custom feature.
  • Another pipeline considered building trees of user activity (essentially session trees) across multiple systems. These trees were then used to build recommendations tailored to users' interests.
  • the pipeline was noteworthy in that it used processing-time timers to drive its output. This was due to the fact that, for their system, having regularly updated, partial views on the data was much more valuable than waiting until mostly complete views were ready once the watermark passed the end of the session. It also meant that lags in watermark progress due to a small amount of slow data would not affect timeliness of output for the rest of the data. This pipeline thus motivated inclusion of processing-time triggers shown in the plots 600c and 600d of FIGS. 6C and 6D, respectively.
  • AtCount trigger used in the plot 600e of FIG. 6E exemplified data-driven triggers; while the plots 600f-600j of FIGS 6F-6J utilized composite triggers.
  • the system 100 Based on many years of experience with real-world, massive-scale, unbounded data processing, the system 100 set forth above is a good step in that direction.
  • the system 100 supports the unaligned, event-time-ordered windows modern data consumers require, while providing flexible triggering and integrated accumulation and retraction, and refocusing the approach from one of finding completeness in data to one of adapting to the ever present changes manifest in real-world datasets.
  • the system 100 abstracts away the distinction of batch vs. micro-batch vs. streaming, allowing pipeline builders a more fluid choice between them, while shielding them from the system-specific constructs that inevitably creep into models targeted at a single underlying system.
  • the overall flexibility of the system 100 allows pipeline builders to appropriately balance the dimensions of correctness, latency, and cost to fit their use case, which is critical given the diversity of needs in existence. And lastly, the system 100 clarifies pipeline implementations by separating the notions of what results are being computed, where in event time they are being computed, when in processing time they are materialized, and how earlier results relate to later refinements.
  • a software application may refer to computer software that causes a computing device to perform a task.
  • a software application may be referred to as an "application,” an "app,” or a "program.”
  • Example applications include, but are not limited to, system diagnostic applications, system management applications, system maintenance applications, word processing
  • the non-transitory memory may be physical devices used to store programs (e.g., sequences of instructions) or data (e.g., program state information) on a temporary or permanent basis for use by a computing device.
  • the non-transitory memory may be volatile and/or non-volatile addressable semiconductor memory. Examples of non- volatile memory include, but are not limited to, flash memory and read-only memory
  • ROM read-only memory
  • PROM programmable read-only memory
  • EPROM erasable programmable read-only memory
  • EEPROM e.g., typically used for firmware, such as boot programs.
  • volatile memory include, but are not limited to, random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), phase change memory (PCM) as well as disks or tapes.
  • RAM random access memory
  • DRAM dynamic random access memory
  • SRAM static random access memory
  • PCM phase change memory
  • FIG. 7 is a schematic view of an example computing device 700 that may be used to implement the systems and methods described in this document.
  • the computing device 700 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers.
  • the components shown here, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed in this document.
  • the computing device 700 includes a processor 710 (e.g., data processing hardware), memory 720, a storage device 730, a high-speed interface/controller 740 connecting to the memory 720 and high-speed expansion ports 750, and a low speed interface/controller 760 connecting to low speed bus 770 and storage device 730.
  • a processor 710 e.g., data processing hardware
  • memory 720 e.g., RAM
  • storage device 730 e.g., RAM
  • high-speed interface/controller 740 connecting to the memory 720 and high-speed expansion ports 750
  • a low speed interface/controller 760 connecting to low speed bus 770 and storage device 730.
  • Each of the components 710, 720, 730, 740, 750, and 760 are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate.
  • the processor 710 can process instructions for execution within the computing device 700, including instructions stored in the memory 720 or on the storage device 730 to display graphical information for a graphical user interface (GUI) on an external input/output device, such as display 780 coupled to high speed interface 740.
  • GUI graphical user interface
  • multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory.
  • multiple computing devices 700 may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi -processor system).
  • the data storage hardware 710 e.g., processor
  • the memory 720 (e.g., memory hardware) stores information non-transitorily within the computing device 700.
  • the memory 720 may be a computer-readable medium, a volatile memory unit(s), or non-volatile memory unit(s).
  • the non-transitory memory 720 may be physical devices used to store programs (e.g., sequences of instructions) or data (e.g., program state information) on a temporary or permanent basis for use by the computing device 700.
  • non-volatile memory examples include, but are not limited to, flash memory and read-only memory (ROM) / programmable read-only memory (PROM) / erasable programmable read-only memory (EPROM) / electronically erasable programmable read-only memory (EEPROM) (e.g., typically used for firmware, such as boot programs).
  • volatile memory examples include, but are not limited to, random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), phase change memory (PCM) as well as disks or tapes.
  • the storage device 730 is capable of providing mass storage for the computing device 700.
  • the storage device 730 is a computer- readable medium.
  • the storage device 730 may be a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations.
  • a computer program product is tangibly embodied in an information carrier.
  • the computer program product contains instructions that, when executed, perform one or more methods, such as those described above.
  • the information carrier is a computer- or machine-readable medium, such as the memory 720, the storage device 730, or memory on processor 710.
  • the high speed controller 740 manages bandwidth-intensive operations for the computing device 700, while the low speed controller 760 manages lower bandwidth- intensive operations. Such allocation of duties is exemplary only. In some
  • the high-speed controller 740 is coupled to the memory 720, the display 780 (e.g., through a graphics processor or accelerator), and to the high-speed expansion ports 750, which may accept various expansion cards (not shown).
  • the memory 720 e.g., a graphics processor or accelerator
  • the high-speed expansion ports 750 which may accept various expansion cards (not shown).
  • the low-speed controller 760 is coupled to the storage device 730 and low-speed expansion port 770.
  • the low-speed expansion port 770 which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet), may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter.
  • the computing device 700 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 700a or multiple times in a group of such servers 700a, as a laptop computer 700b, or as part of a rack server system 700c.
  • a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
  • a programmable processor which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
  • Implementations of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
  • subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus.
  • the computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them.
  • data processing apparatus encompass all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.
  • the apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
  • a propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus.
  • a computer program (also known as an application, program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • a computer program does not necessarily correspond to a file in a file system.
  • a program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code).
  • a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
  • the processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output.
  • the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
  • processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.
  • a processor will receive instructions and data from a read only memory or a random access memory or both.
  • the essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
  • mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
  • a computer need not have such devices.
  • a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio player, a Global Positioning System (GPS) receiver, to name just a few.
  • Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks.
  • the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • one or more aspects of the disclosure can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube), LCD (liquid crystal display) monitor, or touch screen for displaying information to the user and optionally a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube), LCD (liquid crystal display) monitor, or touch screen for displaying information to the user and optionally a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
  • Other kinds of devices can be used to provide interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input
  • a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's client device in response to requests received from the web browser.
  • a backend component e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a frontend
  • the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network.
  • communication networks include a local area network ("LAN”) and a wide area network (“WAN”), an inter-network (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks).
  • the computing system can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
  • a server transmits data (e.g., an HTML page) to a client device (e.g., for purposes of displaying data to and receiving user input from a user interacting with the client device).
  • client device e.g., for purposes of displaying data to and receiving user input from a user interacting with the client device.
  • Data generated at the client device e.g., a result of the user interaction

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
EP16741415.0A 2015-08-05 2016-06-17 Data flow windowing and triggering Withdrawn EP3215963A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201562201441P 2015-08-05 2015-08-05
US14/931,006 US10037187B2 (en) 2014-11-03 2015-11-03 Data flow windowing and triggering
PCT/US2016/038131 WO2017023432A1 (en) 2015-08-05 2016-06-17 Data flow windowing and triggering

Publications (1)

Publication Number Publication Date
EP3215963A1 true EP3215963A1 (en) 2017-09-13

Family

ID=57944030

Family Applications (1)

Application Number Title Priority Date Filing Date
EP16741415.0A Withdrawn EP3215963A1 (en) 2015-08-05 2016-06-17 Data flow windowing and triggering

Country Status (4)

Country Link
EP (1) EP3215963A1 (zh)
CN (1) CN107209673B (zh)
DE (1) DE202016007901U1 (zh)
WO (1) WO2017023432A1 (zh)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10514952B2 (en) * 2016-09-15 2019-12-24 Oracle International Corporation Processing timestamps and heartbeat events for automatic time progression
CN108228356B (zh) * 2017-12-29 2021-01-15 华中科技大学 一种流数据的分布式动态处理方法
US10909182B2 (en) * 2018-03-26 2021-02-02 Splunk Inc. Journey instance generation based on one or more pivot identifiers and one or more step identifiers
CN109617648A (zh) * 2018-10-29 2019-04-12 青岛民航凯亚系统集成有限公司 一种可变时间滑动窗口计算方法
CN109871248A (zh) * 2018-12-29 2019-06-11 天津南大通用数据技术股份有限公司 一种可变间隔的去除重复流数据的会话窗口设计方法
CN110209685B (zh) * 2019-06-12 2020-04-21 北京九章云极科技有限公司 一种数据实时处理方法及系统
CN110850825B (zh) * 2019-11-13 2021-06-08 武汉恒力华振科技有限公司 基于事件时间的工业过程数据处理方法
CN113127512B (zh) * 2020-01-15 2023-09-29 百度在线网络技术(北京)有限公司 多数据流的数据拼接触发方法、装置、电子设备和介质
CN111478949B (zh) * 2020-03-25 2022-05-24 中国建设银行股份有限公司 数据处理方法和装置
CN111831383A (zh) * 2020-07-20 2020-10-27 北京百度网讯科技有限公司 窗口拼接方法、装置、设备以及存储介质
CN111858368B (zh) * 2020-07-27 2022-11-25 成都新潮传媒集团有限公司 数据处理方法、装置及存储介质
CN113742004B (zh) * 2020-08-26 2024-04-12 北京沃东天骏信息技术有限公司 一种基于flink框架的数据处理方法和装置
JP2022151355A (ja) * 2021-03-26 2022-10-07 富士通株式会社 データ処理プログラム、データ処理方法及びデータ処理システム
WO2024031461A1 (zh) * 2022-08-10 2024-02-15 华为技术有限公司 流数据处理方法及相关设备
CN115080156B (zh) * 2022-08-23 2022-11-11 卓望数码技术(深圳)有限公司 基于流批一体的大数据批量计算的优化计算方法及装置
CN116974876B (zh) * 2023-09-20 2024-02-23 云筑信息科技(成都)有限公司 一种基于实时流框架实现毫秒级监控告警的方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070100757A1 (en) * 1999-05-19 2007-05-03 Rhoads Geoffrey B Content Protection Arrangements
US7080386B2 (en) * 2000-01-25 2006-07-18 Texas Instruments Incorporated Architecture with digital signal processor plug-ins for general purpose processor media frameworks
US6934756B2 (en) * 2000-11-01 2005-08-23 International Business Machines Corporation Conversational networking via transport, coding and control conversational protocols
CN102662642B (zh) * 2012-04-20 2016-05-25 浪潮电子信息产业股份有限公司 一种基于嵌套滑动窗口和遗传算法的并行处理方法

Also Published As

Publication number Publication date
CN107209673B (zh) 2020-11-06
DE202016007901U1 (de) 2017-04-03
WO2017023432A1 (en) 2017-02-09
CN107209673A (zh) 2017-09-26

Similar Documents

Publication Publication Date Title
US10732928B1 (en) Data flow windowing and triggering
WO2017023432A1 (en) Data flow windowing and triggering
Isah et al. A survey of distributed data stream processing frameworks
Jayalath et al. From the cloud to the atmosphere: Running MapReduce across data centers
US9437024B2 (en) Transformation function insertion for dynamically displayed tracer data
CN107690616B (zh) 受限的存储器环境中的流式传输联接
Akidau et al. The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing
De Oliveira et al. Scicumulus: A lightweight cloud middleware to explore many task computing paradigm in scientific workflows
Mattoso et al. Dynamic steering of HPC scientific workflows: A survey
US9262478B2 (en) Compile-time grouping of tuples in a streaming application
US20150347628A1 (en) Force Directed Graph With Time Series Data
US20140019879A1 (en) Dynamic Visualization of Message Passing Computation
US20130232433A1 (en) Controlling Application Tracing using Dynamic Visualization
US20130232174A1 (en) Highlighting of Time Series Data on Force Directed Graph
KR20170048373A (ko) 이벤트 스트림 변환 기법
Gurusamy et al. The real time big data processing framework: Advantages and limitations
Papadimitriou et al. End-to-end online performance data capture and analysis for scientific workflows
Cheng et al. Optimal alignments between large event logs and process models over distributed systems: An approach based on Petri nets
Herodotou Automatic tuning of data-intensive analytical workloads
Bersani et al. Verifying big data topologies by-design: a semi-automated approach
US11846970B2 (en) Performing data correlation to optimize continuous integration environments
VRSAJKOV Design and implementation of an efficient distributed stream processing system with MPI
Sandstede Online Analysis of Distributed Dataflows with Timely Dataflow
Fahimimoghaddam A Customizable On-Demand Big Data Health Analytics Platform Using Cloud and Container Technologies
Kalim Satisfying service level objectives in stream processing systems

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20170609

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: GOOGLE LLC

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
17Q First examination report despatched

Effective date: 20181218

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20190312

P01 Opt-out of the competence of the unified patent court (upc) registered

Effective date: 20230519