CN109871248A - A kind of removal of variable interval repeats the session window design method of flow data - Google Patents
A kind of removal of variable interval repeats the session window design method of flow data Download PDFInfo
- Publication number
- CN109871248A CN109871248A CN201811643214.5A CN201811643214A CN109871248A CN 109871248 A CN109871248 A CN 109871248A CN 201811643214 A CN201811643214 A CN 201811643214A CN 109871248 A CN109871248 A CN 109871248A
- Authority
- CN
- China
- Prior art keywords
- window
- removal
- data
- session
- variable interval
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The present invention provides the session window design methods that a kind of removal of variable interval repeats flow data, including following content: building distributor, for creating window and being window allocation elements;Driver is constructed for each window, driver is for operating window;Follower is constructed, for according to the element in preset rules output window;Merging mechanism is created for window.The removal that the present invention realizes the repeated data of window by specifically merging windowing mechanism.
Description
Technical field
The invention belongs to stream calculation technical fields, and in particular to the design of session window in a kind of stream calculation system, especially
It is a kind of duplicate flow data session window design method of removal of variable interval.
Background technique
Flow data can regard one group of group discrete event aggregate as and endlessly be held by thousands of data source
Continuous to generate, the data flow of generation is transmitted in a manner of log (system log in non-traditional meaning).There are four features for flow data tool:
1) data reach in real time;2) data reach order independence, are not controlled by application system;3) data scale is grand and cannot predict
Its maximum value;4) data are once processing, unless specially saving, otherwise cannot be handled by taking-up again, or extract data again
It costs dearly.
Stream calculation generation i.e. from for flow data hardness ag(e)ing harsh demand: data business value with
The loss of time and reduce rapidly, therefore data generation after must it be calculated and be handled as early as possible.Typically, flowmeter
Calculator is for three categories feature: 1) real-time and unbounded data flow;2) continue and efficiently calculate 3) streaming and real-time data set
At.
In stream process application, data are that continuously, therefore we can not arrive and just open until all data
Beginning processing.When sometimes, we need to do the processing of some polymeric types, and converging operation can only act on specific data set, namely
On the data set of bounded.Therefore need to select the number of bounded by specific semanteme from unbounded data set by certain mode
According to.Window is a kind of mode on very common setup algorithm boundary.Window can be time driving, be also possible to data drive
Dynamic.A kind of window classification of classics is segmented into: rolling window, rolling window and session window.
Session window be for analysis user one section of interactive behavior requirements, event, by the flow of event of user according to
" session " is grouped.Session refer to one section continue it is active during, separated by enlivening gap.Interval between message
Less than timeout threshold (sessionGap), then it is assigned to the same window, interval is greater than threshold value, is then assigned to difference
Window.
In general, window is to define a limited element set on unlimited stream.This set can be base
In the time, element number, what time and number combined, it is session gap or customized.The selection of window with set
Meter is the basis of flow data processing, has important influence for subsequent processing analysis.
In the prior art, for general session window, after element is assigned to window, these windows are fixed
Will not change, and will not interact between window.There are problems that Data duplication redundancy for certain scenes, such as
The case where summing to the data in window is needed, or needs removal repeated data early period to reduce the calculating of processing function and deposit
The case where resource needed for storage.Therefore, (gap) same access in being spaced at the same time for same element (Key)
Operation repeats multiple situation, needs to propose a kind of simplicity, efficient duplicate removal session window design method.
Summary of the invention
In view of this, the present invention is directed to propose a kind of removal of variable interval repeats the session window design side of flow data
Method, to solve the problems, such as in flow data processing Data duplication redundancy in specific application scene.
In order to achieve the above objectives, the technical scheme of the present invention is realized as follows:
A kind of removal of variable interval repeats the session window design method of flow data, including following content:
Step 1, building distributor (assigner), for creating window and being window allocation elements;
Step 2 constructs driver (trigger) for each window, for operating to window;
Step 3, building follower (evictor), for according to the element in pre-set business rules output window;
Step 4 creates merging mechanism for session window (session window).
Further, the component of all windows is all located in an operator (operator), data flow continuously into
Enter operator, each element reached can be given distributor.Distributor reaches each according to the rule preset
Element is put into one or more window (window), and creates new window as desired.
Further, in order to save space, window itself is an ID identifier, the other first numbers of storage inside
According to, as at the beginning of window and the end time, but can't element in memory window.
Further, each window is owned by an one's own driver, and each driver includes a timing
Device, for determining when a window can be calculated or be removed.Whenever there is element to be added to the window, or register before
Timer expiry, then driver will be called.The returning the result of driver can be continue and (not be any behaviour
Make), fire (processing window data), purge (removing the data in window and window) or fire+purge (processing window
Window is destroyed after data).The call result of one driver can only handle the data in window if only fire
Element simultaneously retains window as former state, that is to say, that the data in window still retain constant, the operation of the next driver of waiting.One
Window can be repeated calculating repeatedly until it is by purge.Before purge, window can occupy always memory.
Further, when processor calculates the data in window, the element set in window will give output
Device;Follower is mainly used to the element list in cycling among windows, and according to business rule, removes or filter out inactive elements,
It determines to need to be removed into how many elements of window at first.Remaining element can give the function that user specifies and carry out window
Calculating.If without follower, all elements in window can give together function and be calculated.Function is calculated to receive
The element of window (by the filtering of follower), and the end value of window is calculated, and be sent to downstream.The end value of window
It is either one or more.
Further, the distributor of session window can be one window of Elemental partition of each entrance, and the window is with element
Timestamp as starting point, timestamp add timeout duration of session be the end time.For example, the two element quilts first reached
It is assigned in two independent windows, two windows are non-intersecting at present.When third element enters, the window that is assigned to and existing
Two windows having are superimposed.Due to supporting the merging of window, session window distributor can merge these windows.It
Existing window can be traversed, and which window of system is told to need to be merged into new window.
Merge the main contents include two parts: (1) need the merging of the bottom state of combined window (namely in window
Data of caching, or be a polymerizing value for polymerizing windows) (2) need combined window trigger merging
(for example will be deleted the timer of old window registration, and register the timer of new window).For the element of each new entrance, all
The window for belonging to the element can be distributed, can all check and merges existing window.Before triggering window calculation, each time
It will check whether the window can merge with other windows, it, can should after driver trigger sending purge order
Window is removed from window list.
Further, the distributor can be one window of Elemental partition newly entered, and window is made with the timestamp of element
For starting point, timestamp adds timeout duration of session to be the end time;When existing window in queue, newly enter a member every time
After plain distributor establishes window, distributor can traverse existing window, by the timestamp of this element and a upper earth window
The timestamp of first element compare, if difference between the two is more than preset interval, regard as two
Window;Otherwise merge the element in two windows sequentially in time, and every kind of element is kept only to occur once.
Compared with the existing technology, present invention has the advantage that
The present invention solves the problems, such as in flow data processing Data duplication redundancy in specific application scene, passes through session window
The design of fabric reduces the memory source of session window occupancy.And it just completes to repeated data in the window
Removal does not need to cross filter data again in subsequent calculating function, computing resource is not only greatly saved and decreases data
The memory source of storage.
Detailed description of the invention
The attached drawing for constituting a part of the invention is used to provide further understanding of the present invention, schematic reality of the invention
It applies example and its explanation is used to explain the present invention, do not constitute improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is stream calculation of embodiment of the present invention system windows mechanism and building work flow chart;
Fig. 2 is stream calculation of embodiment of the present invention session window schematic diagram;
Fig. 3 is that session window of the embodiment of the present invention merges schematic diagram.
Specific embodiment
It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the present invention can phase
Mutually combination.
The present invention will be described in detail below with reference to the accompanying drawings and embodiments.
The present embodiment is by taking traffic block port flow monitoring scene as an example, and vehicle will not repeat more in a short time under normal circumstances
It is secondary to pass through a bayonet, if vehicle repeated data occurs in the short time, near possible vehicle has been parked in, or there is congestion.?
In this case, it is necessary to which vehicle flowrate (flow is greatly reduced suddenly) situation could really be reflected by removing repeated data, in order to real-time
Duplicate removal (before counting flow duplicate removal, rather than duplicate removal again after being put in storage) and saving storage resource, provide a kind of going for variable interval
Except the session window design method for repeating flow data, as shown in Figures 1 to 3, including following content:
Distributor assigner is constructed, for creating window and being window allocation elements;Distributor assigner, which is received, to be handed over
Identified information of vehicles, the information of each vehicle are each new entrance as an individual element in logical video recording
Element create a new window, save the metadata such as storage location, the entry time of this element, storage is specific
Information of vehicles.
Driver trigger is distributed for each newly-built window to be specifically responsible for for operating window to the window
The information of vehicles stored in mouthful is handled and is destroyed.
Merging mechanism is created for session window session window, it is new every time to enter when existing window in queue
One element is simultaneously established the timestamp phase of the timestamp of this element and first element of a upper earth window after window
Compare, if difference between the two is more than preset interval (gap), regards as two windows.Otherwise according to the time
Sequence merges the element in two windows, and every kind of element is kept only to occur once, carries out deduplication operation to repeat element.
Specifically, being operated by driver trigger to two windows when merging window, needing to merge the bottom of window
Layer state (data namely cached in window) and window driver trigger (delete the timer of old window registration, and
Register the timer of new window).
Follower evictor is constructed, for according to the element in pre-set business rules output window;Follower evictor
Element list in cycling among windows filters out inactive elements according to business rule, that is, determines the how many members at first into window
Element needs to be removed.Remaining element can give the calculating that the function that user specifies carries out window.For in each window
Element, due to just having had been removed the repetition in gap in processing step, subsequent calculating function only needs to count current
Element in window, so that it may count the current magnitude of traffic flow.Or the information of vehicles in more continuous window judges whether there is
Abnormal vehicle.
Session window of the present invention can be convenient, is efficiently completed the sampling of flow data under special scenes, saves for subsequent processing
Plenty of time and calculation resources.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (6)
1. a kind of removal of variable interval repeats the session window design method of flow data, it is characterised in that including following content:
1) distributor is constructed, for creating window and being window allocation elements;
2) driver is constructed for each window, driver is for operating window;
3) follower is constructed, for according to the element in preset rules output window;
4) merging mechanism is created for window.
2. a kind of removal of variable interval according to claim 1 repeats the session window design method of flow data, special
Sign is: described window itself is an ID identifier.
3. a kind of removal of variable interval according to claim 1 repeats the session window design method of flow data, special
Sign is: each driver includes a timer, for determining when a window can be calculated or be removed;Driver
It returns the result including being sold after not doing the data in any operation, processing window data, removal window and window, processing window data
Ruin window.
4. a kind of removal of variable interval according to claim 1 repeats the session window design method of flow data, special
Sign is: the element list in the follower cycling among windows, and determines to be moved into several elements needs of window at first
It removes, remaining element can be transferred to the calculating that subsequent processing function carries out window.
5. a kind of removal of variable interval according to claim 1 repeats the session window design method of flow data, special
Sign is: the merging mechanism of the window is by setting merging condition, merging the bottom state of qualified window and driving
Dynamic device, Lai Shixian variable interval and removal repeated data.
6. a kind of removal of variable interval according to claim 5 repeats the session window design method of flow data, special
Sign is: the distributor can be one window of Elemental partition for newly entering, window using the timestamp of element as starting point, when
Between stamp plus timeout duration of session be the end time;When existing window in queue, newly enters an Elemental partition device every time and build
After vertical window, distributor can traverse existing window, by first member of the timestamp of this element and a upper earth window
The timestamp of element compares, if difference between the two is more than preset interval, regards as two windows;Otherwise it presses
Merge the element in two windows according to time sequencing, and every kind of element is kept only to occur once.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811643214.5A CN109871248A (en) | 2018-12-29 | 2018-12-29 | A kind of removal of variable interval repeats the session window design method of flow data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811643214.5A CN109871248A (en) | 2018-12-29 | 2018-12-29 | A kind of removal of variable interval repeats the session window design method of flow data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109871248A true CN109871248A (en) | 2019-06-11 |
Family
ID=66917342
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811643214.5A Pending CN109871248A (en) | 2018-12-29 | 2018-12-29 | A kind of removal of variable interval repeats the session window design method of flow data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109871248A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649119A (en) * | 2016-12-28 | 2017-05-10 | 深圳市华傲数据技术有限公司 | Stream computing engine testing method and device |
CN107209673A (en) * | 2015-08-05 | 2017-09-26 | 谷歌公司 | Data flow adding window and triggering |
-
2018
- 2018-12-29 CN CN201811643214.5A patent/CN109871248A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107209673A (en) * | 2015-08-05 | 2017-09-26 | 谷歌公司 | Data flow adding window and triggering |
CN106649119A (en) * | 2016-12-28 | 2017-05-10 | 深圳市华傲数据技术有限公司 | Stream computing engine testing method and device |
Non-Patent Citations (1)
Title |
---|
APACHE FLINK: "Apache Flink", 《HTTPS://CI.APACHE.ORG/PROJECTS/FLINK/FLINK-DOCS-RELEASE-1.3/DEV/WINDOWS.HTML》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108833184A (en) | Service fault localization method, device, computer equipment and storage medium | |
CN112650762B (en) | Data quality monitoring method and device, electronic equipment and storage medium | |
CN108900374B (en) | Data processing method and device applied to DPI equipment | |
CN111064634B (en) | Method and device for monitoring mass Internet of things terminal online state | |
CN105813047B (en) | A kind of flow control method, apparatus and system | |
CN110764936A (en) | Data acquisition method and device | |
WO2023109806A1 (en) | Method and apparatus for processing active data for internet of things device, and storage medium | |
CN114647684A (en) | Traffic prediction method and device based on stacking algorithm and related equipment | |
CN109871248A (en) | A kind of removal of variable interval repeats the session window design method of flow data | |
CN103488695A (en) | Data synchronizing device and data synchronizing method | |
CN105426425A (en) | Big data marketing method based on mobile signaling | |
CN110987083A (en) | Method and equipment for monitoring vehicle emission data based on Internet of vehicles | |
CN114095032A (en) | Data stream compression method based on Flink and RVR, edge computing system and storage medium | |
CN107402874A (en) | A kind of storage device performance statistical system and method | |
CN112182289B (en) | Data deduplication method and device based on Flink frame | |
CN112148779A (en) | Method, device and storage medium for determining service index | |
Li et al. | Community based parking: Finding and predicting available parking spaces based on the Internet of Things and crowdsensing | |
CN107426012A (en) | A kind of fault recovery method and its device based on super fusion architecture | |
CN109446200B (en) | Data processing method and device | |
CN114758515B (en) | Traffic light timing determination method, device, equipment and storage medium | |
CN115062002A (en) | Streaming data processing method and device | |
CN108596381A (en) | Method of Urban Parking Demand Forecasting based on OD data | |
CN111198884B (en) | Method and system for processing information of first entering city of vehicle | |
CN107710165B (en) | Method and device for storage node synchronization service request | |
CN114170741A (en) | Transaction efficiency monitoring method, ATM front-end system and self-service business control and management system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190611 |
|
RJ01 | Rejection of invention patent application after publication |