CN109299159A - A kind of more continuous-query methods of data flow based on window - Google Patents
A kind of more continuous-query methods of data flow based on window Download PDFInfo
- Publication number
- CN109299159A CN109299159A CN201811215219.8A CN201811215219A CN109299159A CN 109299159 A CN109299159 A CN 109299159A CN 201811215219 A CN201811215219 A CN 201811215219A CN 109299159 A CN109299159 A CN 109299159A
- Authority
- CN
- China
- Prior art keywords
- window
- length
- query time
- time window
- feed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The present invention relates to database technical fields, disclose a kind of more continuous-query methods of the data flow based on window, comprising the following steps: S1, carry out modulus calculating to the pending query time window of institute, and according to sorting from small to large;S2, basic shared cell are externally stated;S3, two class inner windows are generated;S4, suitable window filling window to be treated is searched in statement window, after window filling is completed, externally statement at once;S5, when distance of the window for the condition that meets between the front position of the terminal position in query time window to be treated to query time window is the integral multiple of step-length, generate third class inner window, and externally state;S6, step S4, S5 is repeated, searches for suitable window in the window of current all statements, fill and complete until window, the more continuous-query methods of this data flow based on window can effectively reduces memory footprint with the increase of inquiry quantity.
Description
Technical field
The present invention relates to database technical field, in particular to the more continuous-query methods of a kind of data flow based on window.
Background technique
With the high speed development of Internet of Things, sensor network, internet and all kinds of smart machines, many industries are (such as: stock
The application scenarios such as ticket, medical treatment, cyberspace and all kinds of monitor-types) all continuously generate mass data flow.For data
Real-time analysis and the excavation of stream are one of hot issues of current research, and analysis and mining data stream are significant, because with
The passage of time implies rule, the feature that a large amount of measurands change over time in data flow, passes through analysis and mining algorithm
It can be very good that value information is allowed to show.
The ultimate challenge that current data stream processing system faces is that the data volume that various monitor-type applications generate is increasing,
Frequency is higher and higher, this, which means that, needs to handle more data in unit time window.Inquiry, analysis for data flow
It is quickly grown with the technology of excavation, but under distributed traffic processing environment, there is also much the key solved is needed to ask
Topic.How one of difficulties share the result of more continuous-queries aiming at different time window.Next, passing through
One representative instance illustrates more continuous-query sharing problems.
In many typical application scenarios, user wishes the value that real-time update is obtained from continuously data flow
Information, such as: the maximum value of certain a period of time, minimum value, mean value etc..For the statistical query on this class data flow, most often
Mode is the data analysis based on sliding window.So-called sliding window, simply is defined as: inquiry l (length) is nearest
Tuple, renewal frequency are the query window of p (step-length).For example, the data that 2 Hertzian pressure sensors are continually generated are (every
One data of generation in 1/2 second) regard data flow as.The sensor needs to monitor nearest 5 minutes average pressure values, and every 10
It second updates primary.It can regard this inquiry as the average value (aggregation) based on sliding window inquired, length of window l is 1/
2*60*5=150, step-length p are 1/2*10=5;Another is 500 for the statistical window length l of pressure mean values, and step-length is
40.For the pressure sensor, it is understood that there may be multiple similar query statistic windows, if the generalization scene, just
There is multiple Aggregation Queries based on sliding window on the data streams.
Since data flow is unlimited (unbounded), the computer of limited memory can not load down all arrivals
Data, analysis is also impossible to scan excessive data in real time, it is therefore desirable to the data area of limitation storage and analysis.One often
Means are nearest some data only to be saved that is, in memory, when new number using sliding window (sliding window)
After arrival, data will be dropped earlier.
In this case, flow data processing just needs more efficient strategy, reduces and is repeated calculating with one piece of data
Number.The present invention uses more inquiry environment of same polymeric function by being directed in same data flow, by these inquiries
The mode of upper shared intermediate polymerization result, to reduce their actual total computing costs.
To solve the continuous-query sharing problem based on window, the invention proposes be based on step-length and base under more inquiry environment
In the window reuse algorithm MCQA (Multiple Continuous Queries Algorithm) of result, realize for same
The aggregate operation of one data flow reuses the calculated result of different query windows, improves computational efficiency.Based on Strom stream process
Frame realizes algorithm, and experiment shows the increase with inquiry quantity, and MCQA is than current most typical method TriWeave performance
It increases, and memory footprint can be effectively reduced.
Summary of the invention
The present invention provides a kind of more continuous-query methods of the data flow based on window, and result weight is reused and be based on based on step-length
Multi-query optimization strategy can solve the above problem in the prior art.
The present invention provides a kind of more continuous-query methods of data flow based on window, comprising the following steps:
S1:S1: when the step-length of query time window to be processed is identical, with the length of query time window to be processed
W carries out modulus to all query time windows and calculates w mod p, result s is obtained, to institute divided by the step-length p of query time window
Have after result s duplicate removal that sequence forms results set from small to large, step-length p is added in last in results set, is gathered
R:(s1、s2…sn, p), when queried between window step-length it is not identical when, first according to the step sizes of query time window carry out
Grouping, is divided into one group for the query time window of same size step-length, takes again to the time window of every group polling after grouping
Mould calculates, and acquisition forms results set as a result, sorting from small to large after every group of all modulus calculated result duplicate removals after grouping, ties
Last addition of fruit set step-length p obtains set R;
S2: basic shared window is externally stated, to be subscribed to by required query time window;
The basic unit that length is 1, step-length is 1 is externally stated as basic shared window, basic shared window is most
Wicket, externally statement is so as to by the subscription of required query time window;
S3: two class inner windows are generated according to query time window to be processed, and are externally stated, referred to as feed, one
Class is referred to as " F ", another kind of to be referred to as " C ";
First kind inner window " F ": being added 0 in results set R, to results set R again according to arranging from small to large
After form R ', to number all in R ', form first kind inner window with two adjacent numbers, with relative displacement, length and
Step-length expression, are as follows: F1 (0, s1, p), F2 (s1, s2-s1, p) ... Fn (sn, p-sn, p);
Second class inner window " C ": to number all in R ', the second class inner window is formed relative to 0 with every number, is used
Absolute offset values, length and step-length expression, are as follows: C1 (0, s1, p), C2 (0, s2, p) ... Cn (0, Sn, p), Cn+1 (0, p, p);
S4: suitable window is searched in basic shared window and two class inner windows and fills query time to be treated
Window;
In basic shared window and two class inner windows, the window for meeting condition, condition are first looked for are as follows: p0modp1=
0,s2modp1=s1modp1, wherein s2Refer to relative displacement of the feed in query time window to be treated, s1Refer to
Be the own offset amount it is stated that feed, p1 refers to the step-length it is stated that feed, the inquiry to be treated that p0 refers to
The step-length of time window, first with the condition that meets and the longest window of length is filled query time window to be treated,
If cannot fill completely, the longest window of length is continually looked in the window that residue meets condition, unfilled part is continued
Filling so recycles, until query time window to be treated is filled full, the window pair that will be filled immediately after filling is full
Outer statement generates new feed, to be subscribed to by query time window to be treated later;
S5: successively handling query time window to be treated according to step S4, when a certain feed for the condition that meets is in need
Terminal position in query time window to be processed to the distance between the front position of query time window be precisely step-length
Integral multiple when, generating third class feed referred to as " R " is indicated with relative displacement, length and step-length, and external sound immediately
It is bright, to be subscribed to by query time window to be treated later;
S6, step S4, S5 is repeated, searches for suitable feed in current declared all windows, until institute is in need
The query time window of processing has been processed.
Compared with prior art, the beneficial effects of the present invention are:
The present invention generates multiple shared windows by carrying out processing to query time window, removes repetition window, by full
The shared window of sufficient condition handles the time window of inquiry according to length one by one, states immediately after having handled, and is later window
It is ordered, when shared window is before the terminal position in the query time handled to the query time window handled
It when the distance between end position is precisely the integral multiple of step-length, is reversely accumulated, generates new shared window, and immediately externally
Statement, experiment show that memory footprint can be effectively reduced with the increase of inquiry quantity.
The invention proposes the window reuse algorithm MCQA under more inquiry environment based on step-length and based on result, realize needle
To the aggregate operation of same data flow, the calculated result of different query windows is reused, computational efficiency is improved.By to window
The method for dividing and establishing " agent window " reduces the number that data are repeated polymerization;It is direct by establishing window polymerization result
Shared mechanism reduces the number that coalescing element is needed in window;Experiment shows the increase with inquiry quantity, and MCQA ratio is worked as
Preceding most typical method TriWeave performance increases, and can effectively reduce memory footprint.
Detailed description of the invention
Fig. 1 is flow chart of the invention.
Fig. 2 is that window provided in an embodiment of the present invention constructs schematic diagram.
Fig. 3 is the window information schematic diagram of statement provided in an embodiment of the present invention.
Fig. 4 is the exemplary diagram provided in an embodiment of the present invention reversely accumulated.
Fig. 5 is the result set after length of window provided in an embodiment of the present invention and step-length modulo operation.
Fig. 6 is the schematic diagram of F class inner window provided in an embodiment of the present invention.
Fig. 7 is the schematic diagram of C class inner window provided in an embodiment of the present invention.
Fig. 8 is present invention splicing window flow algorithm.
Fig. 9 is the flow algorithm of present invention statement window.
Specific embodiment
The specific embodiment of the present invention is described in detail in 1-9 with reference to the accompanying drawing, it is to be understood that this hair
Bright protection scope is not limited by the specific implementation.
Assuming that query time window are as follows: 50,65,95 and 190, step-length is all 30.If necessary to handle the step-length of window
Pace is not identical, then is grouped, and grouping post-processing approach is the same.
S1: modulo operation is carried out to the time window of all inquiries, and is sorted from small to large according to modulus result;
Initialization carries out modulo operation to all query time windows, is 50mod 30=20,65mod 30=5 respectively,
95mod 30=5,190mod 30=10, as shown in figure 5, calculated result is sorted from small to large and obtains R=after deduplication
(5,10,20,30), last in R is step-length pace, and the step-length if necessary to the window of processing is different, then first carries out
Grouping, grouping post-processing approach are identical:
S2: generating basic shared cell, externally states basic shared cell, to be subscribed to by required window;
To guarantee the smallest fills unit, generate a length be 1, the basic unit that step-length is 1, in Fig. 2Just
It is minimum unit;It is stated at once later to be subscribed to by required window, such as the w in Fig. 30;
S3: two class inner windows, referred to as feed are generated according to R, one kind is referred to as " F ", another kind of to be referred to as " C ";
The first kind generates " F ", as shown in fig. 6, indicated respectively with relative displacement, length and step-length, specifically F1:(0,
5,30), F2:(5,5,30), F4:(10,10,30), F6:(20,10,30);
Second class generates " C ", as shown in fig. 7, be F1:(0 respectively, 5,30), C3 (0,10,30), C5 (0,20,30), C7
(0,30,30) after removing repetition (0,5,30), generates inner window, such as Fig. 3This is arrived, initialization is completed, subsequent step
Start to process query time window;
S4: when searching for suitable feed filling inquiry to be treated in the inner window of generation and basic shared cell
Between window;
First window w1 (50,30) is handled, is searched in the inner window of generation, looks for the condition of satisfaction longest first
Feed, condition are p0modp1=0, s2modp1=s1modp1Wherein, s1Refer to the own offset amount of issued feed, s2
The feed for meeting condition referred to is being the query time window w handled1In relative displacement, P1Refer to it is stated that
Feed step-length, P0The step-length of the query time window handled referred to, due to s2mod p1=0mod 30=0 meets
Condition p0modp1=0, s2modp1=s1modp1All windows be F1, C3, C5, C7, but 30 longest of length of C7, so
C7 is selected, since the length of window w1 is 50, is not filled full, therefore continue to look for backward from the length of C7, C5 length 20
Longest, so C5 is selected, w1 window is filled, as shown in Fig. 2, obtaining window (0,30,30) and w1(0,50,30), at once
Externally statement, for w1For, after removing repetition window (0,30,30), only (0,50,30) is externally issued, that is, Fig. 3
W1;
S5: after the completion of window filling, external publication feed, generates new feed, so as to other window sharings at once;
As handled w1The same method handles second window w2(65,30), firstly, looking for the longest feed of length, length
Longest C8 meets condition, but fills up window w not yet2, so continue to look for from the length of C8 backward, while considering window
w2This body length is 65, w2Relative displacement s2For 50,50mod 30=20, so looking for (s1) mod 30=20 matching
Feed, F6 meet condition, simultaneously as F6 is in query time window w to be treated2In terminal position to query time window
The distance between the front position of mouth 10+50=60 is precisely the integral multiple of step-length 30, therefore carries out a reversed accumulation, referred to as
Third class feed is denoted as R (0,60,30), and externally states.Window w2There are no fillings to complete, and continues to search using same method
Rope, F1 meet condition, window w2It is filled, as shown in Fig. 2, the feed C10 (0,65,30) of accumulation is declared, reversed accumulation side
If method as shown in figure 4, current terminal position of the feed in query time window to be treated to query time window
The distance between front position is precisely the integral multiple of step-length, is just accumulated forward, and issue, the step-length of Fig. 4 example is 10, institute
To be accumulated forward from integral multiple 10,30, then state, for w2, a reversed accumulation (0,60,30) has just been obtained, from F6's
Position is accumulated forward, is gone to here, and 3 class feed, subsequent window w are generated3、w4It calculates and uses same method.
S6: repeating step S4, S5, searches for suitable feed in current all feed, fills and complete until window;
Calculation window w3, the longest feed C10 of length is inserted into window first, then window inserts second feed, the
Relative displacement s of two feed in the query time window handled2It is 65, due to 65mod 30=5, to all
Removal search inside the window of statement, only F2 meets, so F2 is filled into.Since window is not filled full, the is continually looked for
The relative displacement s of three feed, third feed in the query time window handled2For 70,70mod 30=10,
That meet condition is F4, and F4 is inserted window.Similarly, 80mod 30=20, only F6, are in this way filled into F6, then fill out F1.
After window filling is completed, feed is externally issued at once, is longest accumulation (0,95,30) first.Then it looks for and is needing to handle
Query time window in terminal position to the distance between the front position of query time window be precisely the whole of step-length 30
Several times start reversely to accumulate, that is, F6 terminal position forward.(70,20,30) first, followed by (65,25,30) and
(0,90,30).Calculation window w4.Such as Fig. 2, c14 is had found first, then 95mod 30=5, it is clear that R12 meets condition, so
120mod 30=0 afterwards, this energy is matched very much, but to look for the longest matching of length, and c10 meets condition, the last one is
F2.This is arrived, entire window calculation is completed, and externally issues feed at once.
The present invention generates multiple shared windows by carrying out processing to query time window, removes repetition window, by full
The shared window of sufficient condition handles the time window of inquiry according to length one by one, states immediately after having handled, and is later window
It is ordered, when shared window is before the terminal position in the query time handled to the query time window handled
It when the distance between end position is precisely the integral multiple of step-length, is reversely accumulated, generates new shared window, and immediately externally
Statement, experiment show that memory footprint can be effectively reduced with the increase of inquiry quantity.
The invention proposes the window reuse algorithm MCQA under more inquiry environment based on step-length and based on result, realize needle
To the aggregate operation of same data flow, the calculated result of different query windows is reused, computational efficiency is improved.By to window
The method for dividing and establishing " agent window " reduces the number that data are repeated polymerization;It is direct by establishing window polymerization result
Shared mechanism reduces the number that coalescing element is needed in window;Experiment shows the increase with inquiry quantity, and MCQA ratio is worked as
Preceding most typical method TriWeave performance increases, and can effectively reduce memory footprint.
Disclosed above is only several specific embodiments of the invention, and still, the embodiment of the present invention is not limited to this, is appointed
What what those skilled in the art can think variation should all fall into protection scope of the present invention.
Claims (1)
1. a kind of more continuous-query methods of data flow based on window, which comprises the following steps:
S1: when the step-length of query time window to be processed is identical, length w with query time window to be processed is divided by looking into
The step-length p for asking time window carries out modulus to all query time windows and calculates w mod p, result s obtained, to all result s
Sequence forms results set from small to large after duplicate removal, and step-length p is added in last in results set, obtains set R:(s1、
s2…sn, p), when queried between window step-length it is not identical when, first be grouped according to the step sizes of query time window, will
The query time window of same size step-length is divided into one group, carries out modulus calculating to the time window of every group polling again after grouping,
It obtains as a result, sequence forms results set, results set from small to large after every group of all modulus calculated result duplicate removals after grouping
Last is added step-length p and obtains set R;
S2: basic shared window is externally stated, to be subscribed to by required query time window;
The basic unit that length is 1, step-length is 1 is externally stated as basic shared window, basic shared window is min window
Mouthful, externally statement is so as to by the subscription of required query time window;
S3: two class inner windows are generated according to query time window to be processed, and are externally stated, referred to as feed, Yi Leijian
Referred to as " F ", it is another kind of to be referred to as " C ";
First kind inner window " F ": being added 0 in results set R, to results set R again according to shape after arranging from small to large
At R ', to number all in R ', first kind inner window is formed with two adjacent numbers, with relative displacement, length and step-length
It indicates, are as follows: F1 (0, s1, p), F2 (s1, s2-s1, p) ... Fn (sn, p-sn, p);
Second class inner window " C ": to number all in R ', forming the second class inner window relative to 0 with every number, with absolute
Offset, length and step-length expression, are as follows: C1 (0, s1, p), C2 (0, s2, p) ... Cn (0, Sn, p), Cn+1 (0, p, p);
S4: suitable window is searched in basic shared window and two class inner windows and fills query time window to be treated
Mouthful;
In basic shared window and two class inner windows, the window for meeting condition, condition are first looked for are as follows: p0modp1=0,
s2modp1=s1modp1, wherein s2Refer to relative displacement of the feed in query time window to be treated, s1Refer to
It is the own offset amount it is stated that feed, p1 refers to the step-length it is stated that feed, when the inquiry to be treated that p0 refers to
Between window step-length, first with the condition that meets and the longest window of length is filled query time window to be treated, such as
Fruit cannot fill completely, continually look for the longest window of length in the window that residue meets condition and continue to fill out to unfilled part
It fills, so recycles, until query time window to be treated is filled full, filling is external by the window being filled immediately after expiring
Statement, generates new feed, to be subscribed to by query time window to be treated later;
S5: successively handling query time window to be treated according to step S4, when a certain feed for the condition that meets is needing to locate
Terminal position in the query time window of reason to the distance between the front position of query time window be precisely the whole of step-length
When several times, generating third class feed referred to as " R " is indicated with relative displacement, length and step-length, and externally statement immediately, with
Just it is subscribed to by query time window to be treated later;
S6, step S4, S5 is repeated, searches for suitable feed in current declared all windows, until institute's processing in need
Query time window have been processed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811215219.8A CN109299159A (en) | 2018-10-18 | 2018-10-18 | A kind of more continuous-query methods of data flow based on window |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811215219.8A CN109299159A (en) | 2018-10-18 | 2018-10-18 | A kind of more continuous-query methods of data flow based on window |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109299159A true CN109299159A (en) | 2019-02-01 |
Family
ID=65157260
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811215219.8A Pending CN109299159A (en) | 2018-10-18 | 2018-10-18 | A kind of more continuous-query methods of data flow based on window |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109299159A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103250147A (en) * | 2010-10-14 | 2013-08-14 | 惠普发展公司,有限责任合伙企业 | Continuous querying of a data stream |
CN104885077A (en) * | 2012-09-28 | 2015-09-02 | 甲骨文国际公司 | Managing continuous queries with archived relations |
-
2018
- 2018-10-18 CN CN201811215219.8A patent/CN109299159A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103250147A (en) * | 2010-10-14 | 2013-08-14 | 惠普发展公司,有限责任合伙企业 | Continuous querying of a data stream |
CN104885077A (en) * | 2012-09-28 | 2015-09-02 | 甲骨文国际公司 | Managing continuous queries with archived relations |
Non-Patent Citations (3)
Title |
---|
WEN LIU 等: "An Efficient Approach of Processing Multiple Continuous Queries", 《JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY》 * |
刘文: "海量时间序列数据处理的关键技术研究", 《中国博士学位论文全文数据库 基础科学辑》 * |
吴亚娟 等: "一种基于变窗口的数据流连续查询方法", 《 佳木斯大学学报( 自然科学版)》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105224959B (en) | The training method and device of order models | |
Deliège et al. | Position list word aligned hybrid: optimizing space and performance for compressed bitmaps | |
CN104765749B (en) | A kind of date storage method and device | |
CN107291785A (en) | A kind of data search method and device | |
Patterson et al. | Distributed sparse signal recovery for sensor networks | |
CN106897280A (en) | Data query method and device | |
EP2583195A1 (en) | Method and server for handling database queries | |
CN106383830A (en) | Data retrieval method and equipment | |
Jo et al. | A progressive kd tree for approximate k-nearest neighbors | |
CN105701128B (en) | A kind of optimization method and device of query statement | |
KR101780534B1 (en) | Method and system for extracting image feature based on map-reduce for searching image | |
CN109299159A (en) | A kind of more continuous-query methods of data flow based on window | |
Coté et al. | Randomized k-server on hierarchical binary trees | |
CN106445960A (en) | Data clustering method and device | |
CN106598969B (en) | Data query method and apparatus | |
Cheng et al. | Adaptive point location in planar convex subdivisions | |
CN106294348B (en) | For the real-time sort method and device of real-time report data | |
Klinger et al. | Chemical similarity searching using a neural graph matcher. | |
Du et al. | To-flow: Efficient continuous normalizing flows with temporal optimization adjoint with moving speed | |
CN112465514A (en) | Block chain-based layered transaction parallel execution method and system | |
WO2021083481A1 (en) | Providing data streams to a consuming client | |
Zhou et al. | Density estimation over data stream | |
CN111506600B (en) | Paging query method and device and electronic equipment | |
Li et al. | A Combined Skyline Algorithm Based on Quickhull and BNL | |
Alexopoulos et al. | On the computation of the Kantorovich distance for images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190201 |
|
RJ01 | Rejection of invention patent application after publication |