CN109299159A - A kind of more continuous-query methods of data flow based on window - Google Patents

A kind of more continuous-query methods of data flow based on window Download PDF

Info

Publication number
CN109299159A
CN109299159A CN201811215219.8A CN201811215219A CN109299159A CN 109299159 A CN109299159 A CN 109299159A CN 201811215219 A CN201811215219 A CN 201811215219A CN 109299159 A CN109299159 A CN 109299159A
Authority
CN
China
Prior art keywords
window
length
query time
time window
feed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811215219.8A
Other languages
Chinese (zh)
Inventor
刘�文
刘俊霞
张土前
王思秀
张宁宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinjiang Institute of Engineering
Original Assignee
Xinjiang Institute of Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinjiang Institute of Engineering filed Critical Xinjiang Institute of Engineering
Priority to CN201811215219.8A priority Critical patent/CN109299159A/en
Publication of CN109299159A publication Critical patent/CN109299159A/en
Pending legal-status Critical Current

Links

Abstract

The present invention relates to database technical fields, disclose a kind of more continuous-query methods of the data flow based on window, comprising the following steps: S1, carry out modulus calculating to the pending query time window of institute, and according to sorting from small to large;S2, basic shared cell are externally stated;S3, two class inner windows are generated;S4, suitable window filling window to be treated is searched in statement window, after window filling is completed, externally statement at once;S5, when distance of the window for the condition that meets between the front position of the terminal position in query time window to be treated to query time window is the integral multiple of step-length, generate third class inner window, and externally state;S6, step S4, S5 is repeated, searches for suitable window in the window of current all statements, fill and complete until window, the more continuous-query methods of this data flow based on window can effectively reduces memory footprint with the increase of inquiry quantity.

Description

A kind of more continuous-query methods of data flow based on window
Technical field
The present invention relates to database technical field, in particular to the more continuous-query methods of a kind of data flow based on window.
Background technique
With the high speed development of Internet of Things, sensor network, internet and all kinds of smart machines, many industries are (such as: stock The application scenarios such as ticket, medical treatment, cyberspace and all kinds of monitor-types) all continuously generate mass data flow.For data Real-time analysis and the excavation of stream are one of hot issues of current research, and analysis and mining data stream are significant, because with The passage of time implies rule, the feature that a large amount of measurands change over time in data flow, passes through analysis and mining algorithm It can be very good that value information is allowed to show.
The ultimate challenge that current data stream processing system faces is that the data volume that various monitor-type applications generate is increasing, Frequency is higher and higher, this, which means that, needs to handle more data in unit time window.Inquiry, analysis for data flow It is quickly grown with the technology of excavation, but under distributed traffic processing environment, there is also much the key solved is needed to ask Topic.How one of difficulties share the result of more continuous-queries aiming at different time window.Next, passing through One representative instance illustrates more continuous-query sharing problems.
In many typical application scenarios, user wishes the value that real-time update is obtained from continuously data flow Information, such as: the maximum value of certain a period of time, minimum value, mean value etc..For the statistical query on this class data flow, most often Mode is the data analysis based on sliding window.So-called sliding window, simply is defined as: inquiry l (length) is nearest Tuple, renewal frequency are the query window of p (step-length).For example, the data that 2 Hertzian pressure sensors are continually generated are (every One data of generation in 1/2 second) regard data flow as.The sensor needs to monitor nearest 5 minutes average pressure values, and every 10 It second updates primary.It can regard this inquiry as the average value (aggregation) based on sliding window inquired, length of window l is 1/ 2*60*5=150, step-length p are 1/2*10=5;Another is 500 for the statistical window length l of pressure mean values, and step-length is 40.For the pressure sensor, it is understood that there may be multiple similar query statistic windows, if the generalization scene, just There is multiple Aggregation Queries based on sliding window on the data streams.
Since data flow is unlimited (unbounded), the computer of limited memory can not load down all arrivals Data, analysis is also impossible to scan excessive data in real time, it is therefore desirable to the data area of limitation storage and analysis.One often Means are nearest some data only to be saved that is, in memory, when new number using sliding window (sliding window) After arrival, data will be dropped earlier.
In this case, flow data processing just needs more efficient strategy, reduces and is repeated calculating with one piece of data Number.The present invention uses more inquiry environment of same polymeric function by being directed in same data flow, by these inquiries The mode of upper shared intermediate polymerization result, to reduce their actual total computing costs.
To solve the continuous-query sharing problem based on window, the invention proposes be based on step-length and base under more inquiry environment In the window reuse algorithm MCQA (Multiple Continuous Queries Algorithm) of result, realize for same The aggregate operation of one data flow reuses the calculated result of different query windows, improves computational efficiency.Based on Strom stream process Frame realizes algorithm, and experiment shows the increase with inquiry quantity, and MCQA is than current most typical method TriWeave performance It increases, and memory footprint can be effectively reduced.
Summary of the invention
The present invention provides a kind of more continuous-query methods of the data flow based on window, and result weight is reused and be based on based on step-length Multi-query optimization strategy can solve the above problem in the prior art.
The present invention provides a kind of more continuous-query methods of data flow based on window, comprising the following steps:
S1:S1: when the step-length of query time window to be processed is identical, with the length of query time window to be processed W carries out modulus to all query time windows and calculates w mod p, result s is obtained, to institute divided by the step-length p of query time window Have after result s duplicate removal that sequence forms results set from small to large, step-length p is added in last in results set, is gathered R:(s1、s2…sn, p), when queried between window step-length it is not identical when, first according to the step sizes of query time window carry out Grouping, is divided into one group for the query time window of same size step-length, takes again to the time window of every group polling after grouping Mould calculates, and acquisition forms results set as a result, sorting from small to large after every group of all modulus calculated result duplicate removals after grouping, ties Last addition of fruit set step-length p obtains set R;
S2: basic shared window is externally stated, to be subscribed to by required query time window;
The basic unit that length is 1, step-length is 1 is externally stated as basic shared window, basic shared window is most Wicket, externally statement is so as to by the subscription of required query time window;
S3: two class inner windows are generated according to query time window to be processed, and are externally stated, referred to as feed, one Class is referred to as " F ", another kind of to be referred to as " C ";
First kind inner window " F ": being added 0 in results set R, to results set R again according to arranging from small to large After form R ', to number all in R ', form first kind inner window with two adjacent numbers, with relative displacement, length and Step-length expression, are as follows: F1 (0, s1, p), F2 (s1, s2-s1, p) ... Fn (sn, p-sn, p);
Second class inner window " C ": to number all in R ', the second class inner window is formed relative to 0 with every number, is used Absolute offset values, length and step-length expression, are as follows: C1 (0, s1, p), C2 (0, s2, p) ... Cn (0, Sn, p), Cn+1 (0, p, p);
S4: suitable window is searched in basic shared window and two class inner windows and fills query time to be treated Window;
In basic shared window and two class inner windows, the window for meeting condition, condition are first looked for are as follows: p0modp1= 0,s2modp1=s1modp1, wherein s2Refer to relative displacement of the feed in query time window to be treated, s1Refer to Be the own offset amount it is stated that feed, p1 refers to the step-length it is stated that feed, the inquiry to be treated that p0 refers to The step-length of time window, first with the condition that meets and the longest window of length is filled query time window to be treated, If cannot fill completely, the longest window of length is continually looked in the window that residue meets condition, unfilled part is continued Filling so recycles, until query time window to be treated is filled full, the window pair that will be filled immediately after filling is full Outer statement generates new feed, to be subscribed to by query time window to be treated later;
S5: successively handling query time window to be treated according to step S4, when a certain feed for the condition that meets is in need Terminal position in query time window to be processed to the distance between the front position of query time window be precisely step-length Integral multiple when, generating third class feed referred to as " R " is indicated with relative displacement, length and step-length, and external sound immediately It is bright, to be subscribed to by query time window to be treated later;
S6, step S4, S5 is repeated, searches for suitable feed in current declared all windows, until institute is in need The query time window of processing has been processed.
Compared with prior art, the beneficial effects of the present invention are:
The present invention generates multiple shared windows by carrying out processing to query time window, removes repetition window, by full The shared window of sufficient condition handles the time window of inquiry according to length one by one, states immediately after having handled, and is later window It is ordered, when shared window is before the terminal position in the query time handled to the query time window handled It when the distance between end position is precisely the integral multiple of step-length, is reversely accumulated, generates new shared window, and immediately externally Statement, experiment show that memory footprint can be effectively reduced with the increase of inquiry quantity.
The invention proposes the window reuse algorithm MCQA under more inquiry environment based on step-length and based on result, realize needle To the aggregate operation of same data flow, the calculated result of different query windows is reused, computational efficiency is improved.By to window The method for dividing and establishing " agent window " reduces the number that data are repeated polymerization;It is direct by establishing window polymerization result Shared mechanism reduces the number that coalescing element is needed in window;Experiment shows the increase with inquiry quantity, and MCQA ratio is worked as Preceding most typical method TriWeave performance increases, and can effectively reduce memory footprint.
Detailed description of the invention
Fig. 1 is flow chart of the invention.
Fig. 2 is that window provided in an embodiment of the present invention constructs schematic diagram.
Fig. 3 is the window information schematic diagram of statement provided in an embodiment of the present invention.
Fig. 4 is the exemplary diagram provided in an embodiment of the present invention reversely accumulated.
Fig. 5 is the result set after length of window provided in an embodiment of the present invention and step-length modulo operation.
Fig. 6 is the schematic diagram of F class inner window provided in an embodiment of the present invention.
Fig. 7 is the schematic diagram of C class inner window provided in an embodiment of the present invention.
Fig. 8 is present invention splicing window flow algorithm.
Fig. 9 is the flow algorithm of present invention statement window.
Specific embodiment
The specific embodiment of the present invention is described in detail in 1-9 with reference to the accompanying drawing, it is to be understood that this hair Bright protection scope is not limited by the specific implementation.
Assuming that query time window are as follows: 50,65,95 and 190, step-length is all 30.If necessary to handle the step-length of window Pace is not identical, then is grouped, and grouping post-processing approach is the same.
S1: modulo operation is carried out to the time window of all inquiries, and is sorted from small to large according to modulus result;
Initialization carries out modulo operation to all query time windows, is 50mod 30=20,65mod 30=5 respectively, 95mod 30=5,190mod 30=10, as shown in figure 5, calculated result is sorted from small to large and obtains R=after deduplication (5,10,20,30), last in R is step-length pace, and the step-length if necessary to the window of processing is different, then first carries out Grouping, grouping post-processing approach are identical:
S2: generating basic shared cell, externally states basic shared cell, to be subscribed to by required window;
To guarantee the smallest fills unit, generate a length be 1, the basic unit that step-length is 1, in Fig. 2Just It is minimum unit;It is stated at once later to be subscribed to by required window, such as the w in Fig. 30
S3: two class inner windows, referred to as feed are generated according to R, one kind is referred to as " F ", another kind of to be referred to as " C ";
The first kind generates " F ", as shown in fig. 6, indicated respectively with relative displacement, length and step-length, specifically F1:(0, 5,30), F2:(5,5,30), F4:(10,10,30), F6:(20,10,30);
Second class generates " C ", as shown in fig. 7, be F1:(0 respectively, 5,30), C3 (0,10,30), C5 (0,20,30), C7 (0,30,30) after removing repetition (0,5,30), generates inner window, such as Fig. 3This is arrived, initialization is completed, subsequent step Start to process query time window;
S4: when searching for suitable feed filling inquiry to be treated in the inner window of generation and basic shared cell Between window;
First window w1 (50,30) is handled, is searched in the inner window of generation, looks for the condition of satisfaction longest first Feed, condition are p0modp1=0, s2modp1=s1modp1Wherein, s1Refer to the own offset amount of issued feed, s2 The feed for meeting condition referred to is being the query time window w handled1In relative displacement, P1Refer to it is stated that Feed step-length, P0The step-length of the query time window handled referred to, due to s2mod p1=0mod 30=0 meets Condition p0modp1=0, s2modp1=s1modp1All windows be F1, C3, C5, C7, but 30 longest of length of C7, so C7 is selected, since the length of window w1 is 50, is not filled full, therefore continue to look for backward from the length of C7, C5 length 20 Longest, so C5 is selected, w1 window is filled, as shown in Fig. 2, obtaining window (0,30,30) and w1(0,50,30), at once Externally statement, for w1For, after removing repetition window (0,30,30), only (0,50,30) is externally issued, that is, Fig. 3 W1
S5: after the completion of window filling, external publication feed, generates new feed, so as to other window sharings at once;
As handled w1The same method handles second window w2(65,30), firstly, looking for the longest feed of length, length Longest C8 meets condition, but fills up window w not yet2, so continue to look for from the length of C8 backward, while considering window w2This body length is 65, w2Relative displacement s2For 50,50mod 30=20, so looking for (s1) mod 30=20 matching Feed, F6 meet condition, simultaneously as F6 is in query time window w to be treated2In terminal position to query time window The distance between the front position of mouth 10+50=60 is precisely the integral multiple of step-length 30, therefore carries out a reversed accumulation, referred to as Third class feed is denoted as R (0,60,30), and externally states.Window w2There are no fillings to complete, and continues to search using same method Rope, F1 meet condition, window w2It is filled, as shown in Fig. 2, the feed C10 (0,65,30) of accumulation is declared, reversed accumulation side If method as shown in figure 4, current terminal position of the feed in query time window to be treated to query time window The distance between front position is precisely the integral multiple of step-length, is just accumulated forward, and issue, the step-length of Fig. 4 example is 10, institute To be accumulated forward from integral multiple 10,30, then state, for w2, a reversed accumulation (0,60,30) has just been obtained, from F6's Position is accumulated forward, is gone to here, and 3 class feed, subsequent window w are generated3、w4It calculates and uses same method.
S6: repeating step S4, S5, searches for suitable feed in current all feed, fills and complete until window;
Calculation window w3, the longest feed C10 of length is inserted into window first, then window inserts second feed, the Relative displacement s of two feed in the query time window handled2It is 65, due to 65mod 30=5, to all Removal search inside the window of statement, only F2 meets, so F2 is filled into.Since window is not filled full, the is continually looked for The relative displacement s of three feed, third feed in the query time window handled2For 70,70mod 30=10, That meet condition is F4, and F4 is inserted window.Similarly, 80mod 30=20, only F6, are in this way filled into F6, then fill out F1. After window filling is completed, feed is externally issued at once, is longest accumulation (0,95,30) first.Then it looks for and is needing to handle Query time window in terminal position to the distance between the front position of query time window be precisely the whole of step-length 30 Several times start reversely to accumulate, that is, F6 terminal position forward.(70,20,30) first, followed by (65,25,30) and (0,90,30).Calculation window w4.Such as Fig. 2, c14 is had found first, then 95mod 30=5, it is clear that R12 meets condition, so 120mod 30=0 afterwards, this energy is matched very much, but to look for the longest matching of length, and c10 meets condition, the last one is F2.This is arrived, entire window calculation is completed, and externally issues feed at once.
The present invention generates multiple shared windows by carrying out processing to query time window, removes repetition window, by full The shared window of sufficient condition handles the time window of inquiry according to length one by one, states immediately after having handled, and is later window It is ordered, when shared window is before the terminal position in the query time handled to the query time window handled It when the distance between end position is precisely the integral multiple of step-length, is reversely accumulated, generates new shared window, and immediately externally Statement, experiment show that memory footprint can be effectively reduced with the increase of inquiry quantity.
The invention proposes the window reuse algorithm MCQA under more inquiry environment based on step-length and based on result, realize needle To the aggregate operation of same data flow, the calculated result of different query windows is reused, computational efficiency is improved.By to window The method for dividing and establishing " agent window " reduces the number that data are repeated polymerization;It is direct by establishing window polymerization result Shared mechanism reduces the number that coalescing element is needed in window;Experiment shows the increase with inquiry quantity, and MCQA ratio is worked as Preceding most typical method TriWeave performance increases, and can effectively reduce memory footprint.
Disclosed above is only several specific embodiments of the invention, and still, the embodiment of the present invention is not limited to this, is appointed What what those skilled in the art can think variation should all fall into protection scope of the present invention.

Claims (1)

1. a kind of more continuous-query methods of data flow based on window, which comprises the following steps:
S1: when the step-length of query time window to be processed is identical, length w with query time window to be processed is divided by looking into The step-length p for asking time window carries out modulus to all query time windows and calculates w mod p, result s obtained, to all result s Sequence forms results set from small to large after duplicate removal, and step-length p is added in last in results set, obtains set R:(s1、 s2…sn, p), when queried between window step-length it is not identical when, first be grouped according to the step sizes of query time window, will The query time window of same size step-length is divided into one group, carries out modulus calculating to the time window of every group polling again after grouping, It obtains as a result, sequence forms results set, results set from small to large after every group of all modulus calculated result duplicate removals after grouping Last is added step-length p and obtains set R;
S2: basic shared window is externally stated, to be subscribed to by required query time window;
The basic unit that length is 1, step-length is 1 is externally stated as basic shared window, basic shared window is min window Mouthful, externally statement is so as to by the subscription of required query time window;
S3: two class inner windows are generated according to query time window to be processed, and are externally stated, referred to as feed, Yi Leijian Referred to as " F ", it is another kind of to be referred to as " C ";
First kind inner window " F ": being added 0 in results set R, to results set R again according to shape after arranging from small to large At R ', to number all in R ', first kind inner window is formed with two adjacent numbers, with relative displacement, length and step-length It indicates, are as follows: F1 (0, s1, p), F2 (s1, s2-s1, p) ... Fn (sn, p-sn, p);
Second class inner window " C ": to number all in R ', forming the second class inner window relative to 0 with every number, with absolute Offset, length and step-length expression, are as follows: C1 (0, s1, p), C2 (0, s2, p) ... Cn (0, Sn, p), Cn+1 (0, p, p);
S4: suitable window is searched in basic shared window and two class inner windows and fills query time window to be treated Mouthful;
In basic shared window and two class inner windows, the window for meeting condition, condition are first looked for are as follows: p0modp1=0, s2modp1=s1modp1, wherein s2Refer to relative displacement of the feed in query time window to be treated, s1Refer to It is the own offset amount it is stated that feed, p1 refers to the step-length it is stated that feed, when the inquiry to be treated that p0 refers to Between window step-length, first with the condition that meets and the longest window of length is filled query time window to be treated, such as Fruit cannot fill completely, continually look for the longest window of length in the window that residue meets condition and continue to fill out to unfilled part It fills, so recycles, until query time window to be treated is filled full, filling is external by the window being filled immediately after expiring Statement, generates new feed, to be subscribed to by query time window to be treated later;
S5: successively handling query time window to be treated according to step S4, when a certain feed for the condition that meets is needing to locate Terminal position in the query time window of reason to the distance between the front position of query time window be precisely the whole of step-length When several times, generating third class feed referred to as " R " is indicated with relative displacement, length and step-length, and externally statement immediately, with Just it is subscribed to by query time window to be treated later;
S6, step S4, S5 is repeated, searches for suitable feed in current declared all windows, until institute's processing in need Query time window have been processed.
CN201811215219.8A 2018-10-18 2018-10-18 A kind of more continuous-query methods of data flow based on window Pending CN109299159A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811215219.8A CN109299159A (en) 2018-10-18 2018-10-18 A kind of more continuous-query methods of data flow based on window

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811215219.8A CN109299159A (en) 2018-10-18 2018-10-18 A kind of more continuous-query methods of data flow based on window

Publications (1)

Publication Number Publication Date
CN109299159A true CN109299159A (en) 2019-02-01

Family

ID=65157260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811215219.8A Pending CN109299159A (en) 2018-10-18 2018-10-18 A kind of more continuous-query methods of data flow based on window

Country Status (1)

Country Link
CN (1) CN109299159A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103250147A (en) * 2010-10-14 2013-08-14 惠普发展公司,有限责任合伙企业 Continuous querying of a data stream
CN104885077A (en) * 2012-09-28 2015-09-02 甲骨文国际公司 Managing continuous queries with archived relations

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103250147A (en) * 2010-10-14 2013-08-14 惠普发展公司,有限责任合伙企业 Continuous querying of a data stream
CN104885077A (en) * 2012-09-28 2015-09-02 甲骨文国际公司 Managing continuous queries with archived relations

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
WEN LIU 等: "An Efficient Approach of Processing Multiple Continuous Queries", 《JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY》 *
刘文: "海量时间序列数据处理的关键技术研究", 《中国博士学位论文全文数据库 基础科学辑》 *
吴亚娟 等: "一种基于变窗口的数据流连续查询方法", 《 佳木斯大学学报( 自然科学版)》 *

Similar Documents

Publication Publication Date Title
CN105224959B (en) The training method and device of order models
Deliège et al. Position list word aligned hybrid: optimizing space and performance for compressed bitmaps
CN104765749B (en) A kind of date storage method and device
CN107291785A (en) A kind of data search method and device
Patterson et al. Distributed sparse signal recovery for sensor networks
CN106897280A (en) Data query method and device
EP2583195A1 (en) Method and server for handling database queries
CN106383830A (en) Data retrieval method and equipment
Jo et al. A progressive kd tree for approximate k-nearest neighbors
CN105701128B (en) A kind of optimization method and device of query statement
KR101780534B1 (en) Method and system for extracting image feature based on map-reduce for searching image
CN109299159A (en) A kind of more continuous-query methods of data flow based on window
Coté et al. Randomized k-server on hierarchical binary trees
CN106445960A (en) Data clustering method and device
CN106598969B (en) Data query method and apparatus
Cheng et al. Adaptive point location in planar convex subdivisions
CN106294348B (en) For the real-time sort method and device of real-time report data
Klinger et al. Chemical similarity searching using a neural graph matcher.
Du et al. To-flow: Efficient continuous normalizing flows with temporal optimization adjoint with moving speed
CN112465514A (en) Block chain-based layered transaction parallel execution method and system
WO2021083481A1 (en) Providing data streams to a consuming client
Zhou et al. Density estimation over data stream
CN111506600B (en) Paging query method and device and electronic equipment
Li et al. A Combined Skyline Algorithm Based on Quickhull and BNL
Alexopoulos et al. On the computation of the Kantorovich distance for images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190201

RJ01 Rejection of invention patent application after publication