CN109871248A - A kind of removal of variable interval repeats the session window design method of flow data - Google Patents

A kind of removal of variable interval repeats the session window design method of flow data Download PDF

Info

Publication number
CN109871248A
CN109871248A CN201811643214.5A CN201811643214A CN109871248A CN 109871248 A CN109871248 A CN 109871248A CN 201811643214 A CN201811643214 A CN 201811643214A CN 109871248 A CN109871248 A CN 109871248A
Authority
CN
China
Prior art keywords
window
removal
data
session
variable interval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811643214.5A
Other languages
Chinese (zh)
Inventor
何江
于伟
武新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Original Assignee
TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd filed Critical TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Priority to CN201811643214.5A priority Critical patent/CN109871248A/en
Publication of CN109871248A publication Critical patent/CN109871248A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present invention provides the session window design methods that a kind of removal of variable interval repeats flow data, including following content: building distributor, for creating window and being window allocation elements;Driver is constructed for each window, driver is for operating window;Follower is constructed, for according to the element in preset rules output window;Merging mechanism is created for window.The removal that the present invention realizes the repeated data of window by specifically merging windowing mechanism.

Description

A kind of removal of variable interval repeats the session window design method of flow data
Technical field
The invention belongs to stream calculation technical fields, and in particular to the design of session window in a kind of stream calculation system, especially It is a kind of duplicate flow data session window design method of removal of variable interval.
Background technique
Flow data can regard one group of group discrete event aggregate as and endlessly be held by thousands of data source Continuous to generate, the data flow of generation is transmitted in a manner of log (system log in non-traditional meaning).There are four features for flow data tool: 1) data reach in real time;2) data reach order independence, are not controlled by application system;3) data scale is grand and cannot predict Its maximum value;4) data are once processing, unless specially saving, otherwise cannot be handled by taking-up again, or extract data again It costs dearly.
Stream calculation generation i.e. from for flow data hardness ag(e)ing harsh demand: data business value with The loss of time and reduce rapidly, therefore data generation after must it be calculated and be handled as early as possible.Typically, flowmeter Calculator is for three categories feature: 1) real-time and unbounded data flow;2) continue and efficiently calculate 3) streaming and real-time data set At.
In stream process application, data are that continuously, therefore we can not arrive and just open until all data Beginning processing.When sometimes, we need to do the processing of some polymeric types, and converging operation can only act on specific data set, namely On the data set of bounded.Therefore need to select the number of bounded by specific semanteme from unbounded data set by certain mode According to.Window is a kind of mode on very common setup algorithm boundary.Window can be time driving, be also possible to data drive Dynamic.A kind of window classification of classics is segmented into: rolling window, rolling window and session window.
Session window be for analysis user one section of interactive behavior requirements, event, by the flow of event of user according to " session " is grouped.Session refer to one section continue it is active during, separated by enlivening gap.Interval between message Less than timeout threshold (sessionGap), then it is assigned to the same window, interval is greater than threshold value, is then assigned to difference Window.
In general, window is to define a limited element set on unlimited stream.This set can be base In the time, element number, what time and number combined, it is session gap or customized.The selection of window with set Meter is the basis of flow data processing, has important influence for subsequent processing analysis.
In the prior art, for general session window, after element is assigned to window, these windows are fixed Will not change, and will not interact between window.There are problems that Data duplication redundancy for certain scenes, such as The case where summing to the data in window is needed, or needs removal repeated data early period to reduce the calculating of processing function and deposit The case where resource needed for storage.Therefore, (gap) same access in being spaced at the same time for same element (Key) Operation repeats multiple situation, needs to propose a kind of simplicity, efficient duplicate removal session window design method.
Summary of the invention
In view of this, the present invention is directed to propose a kind of removal of variable interval repeats the session window design side of flow data Method, to solve the problems, such as in flow data processing Data duplication redundancy in specific application scene.
In order to achieve the above objectives, the technical scheme of the present invention is realized as follows:
A kind of removal of variable interval repeats the session window design method of flow data, including following content:
Step 1, building distributor (assigner), for creating window and being window allocation elements;
Step 2 constructs driver (trigger) for each window, for operating to window;
Step 3, building follower (evictor), for according to the element in pre-set business rules output window;
Step 4 creates merging mechanism for session window (session window).
Further, the component of all windows is all located in an operator (operator), data flow continuously into Enter operator, each element reached can be given distributor.Distributor reaches each according to the rule preset Element is put into one or more window (window), and creates new window as desired.
Further, in order to save space, window itself is an ID identifier, the other first numbers of storage inside According to, as at the beginning of window and the end time, but can't element in memory window.
Further, each window is owned by an one's own driver, and each driver includes a timing Device, for determining when a window can be calculated or be removed.Whenever there is element to be added to the window, or register before Timer expiry, then driver will be called.The returning the result of driver can be continue and (not be any behaviour Make), fire (processing window data), purge (removing the data in window and window) or fire+purge (processing window Window is destroyed after data).The call result of one driver can only handle the data in window if only fire Element simultaneously retains window as former state, that is to say, that the data in window still retain constant, the operation of the next driver of waiting.One Window can be repeated calculating repeatedly until it is by purge.Before purge, window can occupy always memory.
Further, when processor calculates the data in window, the element set in window will give output Device;Follower is mainly used to the element list in cycling among windows, and according to business rule, removes or filter out inactive elements, It determines to need to be removed into how many elements of window at first.Remaining element can give the function that user specifies and carry out window Calculating.If without follower, all elements in window can give together function and be calculated.Function is calculated to receive The element of window (by the filtering of follower), and the end value of window is calculated, and be sent to downstream.The end value of window It is either one or more.
Further, the distributor of session window can be one window of Elemental partition of each entrance, and the window is with element Timestamp as starting point, timestamp add timeout duration of session be the end time.For example, the two element quilts first reached It is assigned in two independent windows, two windows are non-intersecting at present.When third element enters, the window that is assigned to and existing Two windows having are superimposed.Due to supporting the merging of window, session window distributor can merge these windows.It Existing window can be traversed, and which window of system is told to need to be merged into new window.
Merge the main contents include two parts: (1) need the merging of the bottom state of combined window (namely in window Data of caching, or be a polymerizing value for polymerizing windows) (2) need combined window trigger merging (for example will be deleted the timer of old window registration, and register the timer of new window).For the element of each new entrance, all The window for belonging to the element can be distributed, can all check and merges existing window.Before triggering window calculation, each time It will check whether the window can merge with other windows, it, can should after driver trigger sending purge order Window is removed from window list.
Further, the distributor can be one window of Elemental partition newly entered, and window is made with the timestamp of element For starting point, timestamp adds timeout duration of session to be the end time;When existing window in queue, newly enter a member every time After plain distributor establishes window, distributor can traverse existing window, by the timestamp of this element and a upper earth window The timestamp of first element compare, if difference between the two is more than preset interval, regard as two Window;Otherwise merge the element in two windows sequentially in time, and every kind of element is kept only to occur once.
Compared with the existing technology, present invention has the advantage that
The present invention solves the problems, such as in flow data processing Data duplication redundancy in specific application scene, passes through session window The design of fabric reduces the memory source of session window occupancy.And it just completes to repeated data in the window Removal does not need to cross filter data again in subsequent calculating function, computing resource is not only greatly saved and decreases data The memory source of storage.
Detailed description of the invention
The attached drawing for constituting a part of the invention is used to provide further understanding of the present invention, schematic reality of the invention It applies example and its explanation is used to explain the present invention, do not constitute improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is stream calculation of embodiment of the present invention system windows mechanism and building work flow chart;
Fig. 2 is stream calculation of embodiment of the present invention session window schematic diagram;
Fig. 3 is that session window of the embodiment of the present invention merges schematic diagram.
Specific embodiment
It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the present invention can phase Mutually combination.
The present invention will be described in detail below with reference to the accompanying drawings and embodiments.
The present embodiment is by taking traffic block port flow monitoring scene as an example, and vehicle will not repeat more in a short time under normal circumstances It is secondary to pass through a bayonet, if vehicle repeated data occurs in the short time, near possible vehicle has been parked in, or there is congestion.? In this case, it is necessary to which vehicle flowrate (flow is greatly reduced suddenly) situation could really be reflected by removing repeated data, in order to real-time Duplicate removal (before counting flow duplicate removal, rather than duplicate removal again after being put in storage) and saving storage resource, provide a kind of going for variable interval Except the session window design method for repeating flow data, as shown in Figures 1 to 3, including following content:
Distributor assigner is constructed, for creating window and being window allocation elements;Distributor assigner, which is received, to be handed over Identified information of vehicles, the information of each vehicle are each new entrance as an individual element in logical video recording Element create a new window, save the metadata such as storage location, the entry time of this element, storage is specific Information of vehicles.
Driver trigger is distributed for each newly-built window to be specifically responsible for for operating window to the window The information of vehicles stored in mouthful is handled and is destroyed.
Merging mechanism is created for session window session window, it is new every time to enter when existing window in queue One element is simultaneously established the timestamp phase of the timestamp of this element and first element of a upper earth window after window Compare, if difference between the two is more than preset interval (gap), regards as two windows.Otherwise according to the time Sequence merges the element in two windows, and every kind of element is kept only to occur once, carries out deduplication operation to repeat element.
Specifically, being operated by driver trigger to two windows when merging window, needing to merge the bottom of window Layer state (data namely cached in window) and window driver trigger (delete the timer of old window registration, and Register the timer of new window).
Follower evictor is constructed, for according to the element in pre-set business rules output window;Follower evictor Element list in cycling among windows filters out inactive elements according to business rule, that is, determines the how many members at first into window Element needs to be removed.Remaining element can give the calculating that the function that user specifies carries out window.For in each window Element, due to just having had been removed the repetition in gap in processing step, subsequent calculating function only needs to count current Element in window, so that it may count the current magnitude of traffic flow.Or the information of vehicles in more continuous window judges whether there is Abnormal vehicle.
Session window of the present invention can be convenient, is efficiently completed the sampling of flow data under special scenes, saves for subsequent processing Plenty of time and calculation resources.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (6)

1. a kind of removal of variable interval repeats the session window design method of flow data, it is characterised in that including following content:
1) distributor is constructed, for creating window and being window allocation elements;
2) driver is constructed for each window, driver is for operating window;
3) follower is constructed, for according to the element in preset rules output window;
4) merging mechanism is created for window.
2. a kind of removal of variable interval according to claim 1 repeats the session window design method of flow data, special Sign is: described window itself is an ID identifier.
3. a kind of removal of variable interval according to claim 1 repeats the session window design method of flow data, special Sign is: each driver includes a timer, for determining when a window can be calculated or be removed;Driver It returns the result including being sold after not doing the data in any operation, processing window data, removal window and window, processing window data Ruin window.
4. a kind of removal of variable interval according to claim 1 repeats the session window design method of flow data, special Sign is: the element list in the follower cycling among windows, and determines to be moved into several elements needs of window at first It removes, remaining element can be transferred to the calculating that subsequent processing function carries out window.
5. a kind of removal of variable interval according to claim 1 repeats the session window design method of flow data, special Sign is: the merging mechanism of the window is by setting merging condition, merging the bottom state of qualified window and driving Dynamic device, Lai Shixian variable interval and removal repeated data.
6. a kind of removal of variable interval according to claim 5 repeats the session window design method of flow data, special Sign is: the distributor can be one window of Elemental partition for newly entering, window using the timestamp of element as starting point, when Between stamp plus timeout duration of session be the end time;When existing window in queue, newly enters an Elemental partition device every time and build After vertical window, distributor can traverse existing window, by first member of the timestamp of this element and a upper earth window The timestamp of element compares, if difference between the two is more than preset interval, regards as two windows;Otherwise it presses Merge the element in two windows according to time sequencing, and every kind of element is kept only to occur once.
CN201811643214.5A 2018-12-29 2018-12-29 A kind of removal of variable interval repeats the session window design method of flow data Pending CN109871248A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811643214.5A CN109871248A (en) 2018-12-29 2018-12-29 A kind of removal of variable interval repeats the session window design method of flow data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811643214.5A CN109871248A (en) 2018-12-29 2018-12-29 A kind of removal of variable interval repeats the session window design method of flow data

Publications (1)

Publication Number Publication Date
CN109871248A true CN109871248A (en) 2019-06-11

Family

ID=66917342

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811643214.5A Pending CN109871248A (en) 2018-12-29 2018-12-29 A kind of removal of variable interval repeats the session window design method of flow data

Country Status (1)

Country Link
CN (1) CN109871248A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649119A (en) * 2016-12-28 2017-05-10 深圳市华傲数据技术有限公司 Stream computing engine testing method and device
CN107209673A (en) * 2015-08-05 2017-09-26 谷歌公司 Data flow adding window and triggering

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107209673A (en) * 2015-08-05 2017-09-26 谷歌公司 Data flow adding window and triggering
CN106649119A (en) * 2016-12-28 2017-05-10 深圳市华傲数据技术有限公司 Stream computing engine testing method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
APACHE FLINK: "Apache Flink", 《HTTPS://CI.APACHE.ORG/PROJECTS/FLINK/FLINK-DOCS-RELEASE-1.3/DEV/WINDOWS.HTML》 *

Similar Documents

Publication Publication Date Title
CN108833184A (en) Service fault localization method, device, computer equipment and storage medium
CN112650762B (en) Data quality monitoring method and device, electronic equipment and storage medium
CN108900374B (en) Data processing method and device applied to DPI equipment
CN111064634B (en) Method and device for monitoring mass Internet of things terminal online state
CN105813047B (en) A kind of flow control method, apparatus and system
CN110764936A (en) Data acquisition method and device
WO2023109806A1 (en) Method and apparatus for processing active data for internet of things device, and storage medium
CN114647684A (en) Traffic prediction method and device based on stacking algorithm and related equipment
CN109871248A (en) A kind of removal of variable interval repeats the session window design method of flow data
CN103488695A (en) Data synchronizing device and data synchronizing method
CN105426425A (en) Big data marketing method based on mobile signaling
CN110987083A (en) Method and equipment for monitoring vehicle emission data based on Internet of vehicles
CN114095032A (en) Data stream compression method based on Flink and RVR, edge computing system and storage medium
CN107402874A (en) A kind of storage device performance statistical system and method
CN112182289B (en) Data deduplication method and device based on Flink frame
CN112148779A (en) Method, device and storage medium for determining service index
Li et al. Community based parking: Finding and predicting available parking spaces based on the Internet of Things and crowdsensing
CN107426012A (en) A kind of fault recovery method and its device based on super fusion architecture
CN109446200B (en) Data processing method and device
CN114758515B (en) Traffic light timing determination method, device, equipment and storage medium
CN115062002A (en) Streaming data processing method and device
CN108596381A (en) Method of Urban Parking Demand Forecasting based on OD data
CN111198884B (en) Method and system for processing information of first entering city of vehicle
CN107710165B (en) Method and device for storage node synchronization service request
CN114170741A (en) Transaction efficiency monitoring method, ATM front-end system and self-service business control and management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190611

RJ01 Rejection of invention patent application after publication