CN111026784A - Uncertain data stream probability summation threshold query method - Google Patents
Uncertain data stream probability summation threshold query method Download PDFInfo
- Publication number
- CN111026784A CN111026784A CN201911106844.3A CN201911106844A CN111026784A CN 111026784 A CN111026784 A CN 111026784A CN 201911106844 A CN201911106844 A CN 201911106844A CN 111026784 A CN111026784 A CN 111026784A
- Authority
- CN
- China
- Prior art keywords
- query result
- sliding window
- variance
- probability
- sum
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000001914 filtration Methods 0.000 claims abstract description 28
- 238000004364 calculation method Methods 0.000 claims abstract description 19
- 239000000203 mixture Substances 0.000 claims abstract description 18
- 238000012545 processing Methods 0.000 claims abstract description 11
- 230000009191 jumping Effects 0.000 claims description 13
- 238000009826 distribution Methods 0.000 claims description 11
- 238000012544 monitoring process Methods 0.000 abstract description 5
- 238000013138 pruning Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a method for querying a probability summation threshold of an uncertain data stream, and belongs to the technical field of query processing of uncertain data streams. The method comprises the following steps: 1) initializing, including inquiring parameter setting and modeling uncertain data flow by using a Gaussian mixture model; 2) obtaining an upper bound or a lower bound of a result by using a filtering strategy based on the properties and probability theory of a Gaussian mixture model, thereby quickly making a judgment; 3) when the filtering strategy is invalid, a sliding window model is used for calculating an accurate value, and the calculation cost is reduced through incremental calculation. The method for querying the probability summation threshold of the uncertain data stream has wide application prospect in the fields of cluster monitoring, health monitoring, intelligent security and the like.
Description
Technical Field
The invention relates to the technical field of uncertain data stream query processing, in particular to a method for querying a probability summation threshold of an uncertain data stream.
Background
With the development of sensing and network technologies, data streams can be widely acquired. Data in a data stream is typically a probabilistic-based representation due to inherent errors in the device, interference from ambient noise, recovery of lost information through inference, etc. Simply computing the statistics (e.g., mean and variance) of these uncertain data will lose useful information and even draw incorrect conclusions. Uncertain data flow management can solve these problems by employing an uncertain data model to support probabilistic queries, where probabilistic summation queries (probabilistic query) is an important query type that takes a large amount of uncertain data (such as a probability distribution function) as input and returns a probability distribution as a result. In many monitoring applications, it is only necessary to know whether the result distribution exceeds a user-defined threshold. An example is given below.
Example 1: and (5) monitoring the temperature. Six sensors measure the temperature of an object simultaneously. Temperature readings can be subject to errors due to errors inherent in the sensors and interference from noise signals. The temperature readings of the six sensors are converted to a probability distribution using a data fusion technique, such as density estimation. Then, probability distributions at different time instants are aggregated to detect anomalies. To this end, the monitoring application programs devise the following queries:
and (3) inquiring: is the probability of the average temperature exceeding 60 degrees greater than 80% in the last 10 minutes?
When the query result is "true", an alarm will be triggered.
The above query explicitly considers the load fluctuations of the cluster as a whole in the last 10 minutes and introduces two thresholds into the probability summation query, one being the probability threshold and the other being the score threshold. The query is an uncertain data stream probability summation threshold query, and is an extension of the uncertain data stream probability summation query.
Although there has been a lot of research work on probabilistic summation queries on uncertain data streams, most of these methods focus on obtaining approximation results based on unbounded data stream models by proposing space and time efficient algorithms. Still other approaches implement incremental updating of results by processing both incoming and outgoing tuples through a sliding window model. In addition, in the existing probability threshold query method, although various filtering strategies (such as distance-based filtering and probability-based filtering) are designed, the filtering strategies of the queries are designed for specific query types, and threshold semantics of different query types are different in nature (for example, two thresholds in the probability range threshold query: a range threshold and a probability threshold; and two thresholds in the probability summation threshold query: a score threshold and a probability threshold). At present, no uncertain data stream probability summation threshold query method is available. A naive solution is to consider the threshold constraint after performing the probabilistic summation query to get the final result. The computational efficiency of this approach is very low (i.e., it is not necessary to compute the result distribution for any given sliding window) due to the separation of query processing and threshold computation.
Disclosure of Invention
The method aims to solve the technical problem of how to efficiently process the uncertain data stream probability summation threshold query. The invention provides a method for querying a probability summation threshold of an uncertain data stream.
The technical scheme of the invention is as follows:
a method for querying a probability summation threshold of an uncertain data stream, the method comprising the steps of:
(1) dividing continuous uncertain data into sliding windows and carrying out Gaussian mixture model modeling on random variables in each window, namely expressing the random variables by utilizing Gaussian distribution;
(2) performing two-time filtering judgment on random variables based on a first moment and a first-order variance, a second moment and a second-order variance of the sum of the random variables in a sliding window, outputting a query result and returning to the step (1) when a query result can be obtained by performing first filtering judgment according to the first moment and the first-order variance, performing second filtering judgment according to the second moment and the second-order variance when the query result cannot be obtained by performing first filtering judgment according to the first moment and the first-order variance, outputting the query result and returning to the step (1) when the query result can be obtained, and entering the step (3) when the query result cannot be obtained;
(3) and converting the random variable in the sliding window into a characteristic function, carrying out probability summation based on the characteristic function, judging whether the query result is 'yes' or 'no' according to the magnitude relation between the summed probability value and the score threshold value and the probability threshold value, and outputting the query result.
When the method is used for processing the query of the probability summation threshold of the uncertain data stream, the properties and the probability theory of the Gaussian mixture model are fully utilized, and the characteristic function, the pruning strategy and the incremental processing based on the sliding window are combined, so that the calculation efficiency is improved. Compared with the prior art, the method has the advantages that:
1) the uncertain data are modeled into a Gaussian mixture model, so that the method is more flexible and efficient.
2) And a pruning strategy based on the properties of a Gaussian mixture model and a probability theory is designed, so that unnecessary calculation is reduced.
3) In the accurate calculation stage, a characteristic function is introduced, so that the complexity of the algorithm is reduced, and meanwhile, the calculation efficiency is further improved by utilizing incremental processing.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flow chart of a method for querying a summation threshold of probabilities of an uncertain data stream according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a flowchart of a method for querying a probability summation threshold of an uncertain data stream according to an embodiment of the present invention. As shown in fig. 1, the uncertain data flow probability summation threshold query method provided by the embodiment uses a continuous random variable instead of a discrete random variable to represent uncertain data; a Gaussian mixture model is adopted as a basic model to improve the calculation efficiency and provide high flexibility; and integrating a filtering strategy and accurate calculation in query processing, quickly making judgment by using the filtering strategy based on the properties of a Gaussian mixture model and a probability theory, and performing incremental calculation on an accurate value by using a sliding window model when the filtering strategy is invalid. The method specifically comprises an initialization stage, a rapid judgment stage based on a filtering strategy and an accurate calculation stage based on a sliding window model. Each stage is explained in detail below.
Initialization phase
The initial stage is mainly used for dividing sliding windows and carrying out Gaussian mixture model modeling on random variables in each window, namely, the Gaussian distribution is used for representing the random variables, and the method specifically comprises the following steps:
s101, acquiring a new jth uncertain data t in an uncertain data streamjForm a sliding window with the latest w pieces of dataWherein w ∈ R+Is the length of the sliding window and takes a random variable XmRepresenting sliding windowsM-th tuple t in (1)j-w+m(1≤m≤w);
S102, setting a score threshold tau (tau epsilon R)+) And a probability threshold δ (δ ∈ (0,1)), the uncertain data stream probability sum threshold query can be expressed as: probability Pr (Y) of random variable Y being greater than tau>τ) is greater than δ, i.e. the inequality Pr (Y)>τ)>Whether or not δ is true. If the inequality is true, the query result is yes, otherwise, the query result is no.
S103, adopting a single-variable Gaussian mixture model to the random variable XmModeling, namely representing uncertain data by using continuous random variables, wherein the model comprises k Gaussian variablesAnd corresponding non-negative probability (p)1,p2,…,pk)。
The probability density function for the random variable X is:
thus, all data in each sliding window is represented by a gaussian mixture model through S101 to S103, and the gaussian mixture model is used as a basic model to improve the calculation efficiency and provide high flexibility.
Fast judging stage based on filtering strategy
The fast judging stage based on the filtering strategy is mainly used for carrying out twice filtering judgment on random variables based on a first moment and a first order variance, a second moment and a second order variance of the sum of the random variables in a sliding window, when a query result can be obtained by carrying out first filtering judgment according to the first moment and the first order variance, the query result is output and returned to the initializing stage, new uncertain data are obtained again, when the query result cannot be obtained by carrying out first filtering judgment according to the first moment and the first order variance, second filtering judgment is carried out according to the second moment and the second order variance, when the query result can be obtained, the query result is output and returned to the initializing stage, and when the query result cannot be obtained, the accurate calculating stage based on the sliding window model is entered. The method specifically comprises the following steps:
s201, calculating a first order moment, a second order moment, a first order variance and a second order variance of the sum of all random variables in a sliding window according to the expectation and the variance of the random variables;
s201 specifically includes the following steps:
s2011 calculates a random variable XmDesired e (x) and variance var (x);
in particular, according to the expectation of a Gaussian distributionSum varianceCalculating the expectation E (X), and the specific formula is as follows:
s2012, calculating the sum of all random variables in the sliding windowFirst order moment E (Y) and second order moment E (Y)2)。
In particular, E (X) as desiredm) Sum variance Var (X)m) Calculating the first order moment E (Y) and the second order moment E (Y) of the sum Y of all random variables in the sliding window2) The concrete formula is as follows:
s2013, calculating the variance Var (Y) of the sum Y of all random variables in the sliding window.
In particular, according to a first order moment E (Y) and a second order moment E (Y)2) Calculating the variance Var (Y) of the sum Y of all random variables in the sliding window, wherein the specific formula is as follows:
Var(Y)=E(Y2)-(E(Y))2(7)
s2014, calculating a fourth moment E (Y) of the sum Y of all random variables in the sliding window4) And a second order variance Var (Y)2)。
Specifically, according to the first order moment E (Y), the second order moment E (Y)2) And the first order variance Var (Y) calculating a fourth order moment E (Y) of the sum Y of all random variables in the sliding window4) And a second order variance Var (Y)2) The concrete formula is as follows:
E(Y4)=E(Y)4+6(E(Y))2Var(Y)+3(Var(Y))4(8)
Var(Y2)=E(Y4)-(E(Y2))2(9)
in order to reduce the calculation amount, the first fourth moment and the first second moment can utilize the result of the previous sliding window to realize incremental calculation. For new sliding windowVariable Y ═ Xj-w+2+Xj-w+3+…+Xj+1The first four moments of (a) can be calculated by the following formula:
E(Y′)=E(Y)-E(Xj-w+1)+E(Xj+1), (10)
E(Y′2)=E(Y2)-Var(Xj-w+1)+Var(Xj+1)+(E(Y′))2(11)
Var(Y′)=E(Y′2)-(E(Y′))2(12)
E(Y′4)=E(Y′)4+6(E(Y′))2Var(Y′)+3(Var(Y′))4(13)
Var(Y′2)=E(Y′4)-(E(Y′2))2(14)
s202, performing first filtering according to the first moment E (Y) and the first variance Var (Y) of the sum of all random variables in the sliding window and the size relationship between the score threshold and the probability threshold to judge a query result;
s202 specifically includes the following steps:
s2021, if tau is greater than E (Y) and delta is greater than 0.5, outputting a query result, and jumping to S101 in an initialization stage if the output query result is 'no';
because τ > E (Y), the probability Pr (Y > τ) that the random variable Y is greater than τ is less than Pr (Y ≧ E (Y)), and Pr (Y ≧ E (Y)), (Y)) is 0.5, the value of Pr (Y > τ) must be less than 0.5. Inequality Pr (Y > τ) > δ must not be true, so the output query result is no.
S2022, if τ>E (Y) and δ ≦ 0.5, when the condition is satisfied:if so, outputting a query result, if not, jumping to the S101 of the initialization stage;
obtained according to the unilateral chebyshev inequality:when the conditions are as follows:when satisfied, Pr (Y)>τ)>Since δ does not hold, the output query result is no.
S2023, if tau is less than or equal to E (Y) and delta is less than 0.5, outputting a query result, wherein the output query result is 'yes', and jumping to S101 in an initialization stage;
since Pr (Y > τ) > Pr (Y.gtoreq.E (Y)) >0.5, the value of Pr (Y > τ) must be 0.5 or more.
S2024, if tau is less than or equal to E (Y) and delta is more than or equal to 0.5, when the condition is met:if so, outputting the query result, and if so, outputting the query result as yes and jumping to the S101 in the initialization stage;
obtained according to the unilateral chebyshev inequality:when the condition is satisfied:pr (Y)>τ)>δ holds true.
S203, when the query result can not be output, according to the sliding windowSecond moment E (Y) of the sum of all random variables in the mouth2) And a second order variance Var (Y)2) Performing secondary filtering on the relationship between the score threshold and the probability threshold to judge a query result;
s203 specifically includes the following steps:
s2031, if τ2>E(Y2) And delta>0.5, outputting the query result, and jumping to the S101 in the initialization stage if the output query result is 'no';
s2032, if tau2>E(Y2) And delta is less than or equal to 0.5, when the condition is satisfied:if so, outputting a query result, if not, jumping to the S101 of the initialization stage;
Pr(Y>tau) is equivalently converted into Pr (Y)2>τ2). Obtained according to the unilateral chebyshev inequality:when the condition is satisfied: pr (Y)>τ)>δ does not hold.
S2033, if tau2≤E(Y2) And delta<0.5, the query result can be output, and the output query result is yes, and the step is shifted to S101 in the initialization stage;
s2034, if tau2≤E(Y2) And delta is more than or equal to 0.5, and when the condition is met:if so, outputting a query result, wherein the output query result is 'yes', and jumping to the S101 in the initialization stage;
obtained according to the unilateral chebyshev inequality:when the condition is satisfied:pr (Y)>τ)>δ holds true.
Accurate calculation phase based on sliding window model
The accurate calculation stage based on the sliding window model is mainly used for converting random variables in the sliding window into characteristic functions, carrying out probability summation based on the characteristic functions, and calculating the score threshold tau (tau belongs to R) according to the summed probability value+) And the size relation with the probability threshold value delta, judging whether the query result is 'yes' or 'no', and outputting the query result. The specific process is as follows:
s301, each random variable XmExpressed by a characteristic function;
random variable XmModeling as a Gaussian mixture model consisting of k expectation (μ)1,μ2,…,μk) Variance ofAnd a corresponding probability of (p)1,p2,…,pk) The Gaussian component of (1), then the random variable XmIs expressed as follows:
s302, representing the sum Y of all random variables of all uncertain data in a sliding window by using a characteristic function;
the sum of the random variables Y is w random variables (X)1,X2,…,Xw) Is a sum ofThen the sum of the random variables YCharacteristic function ofIs represented as follows:
as can be seen from equation (16), for a linear combination of a plurality of random variables, the computation using the feature function is very efficient, and the use of the probability density function requires multiple integrations, which consumes a large amount of computing resources.
S303, for the random variable in the current sliding window, based on the old sliding window and the old summation resultAnd incrementally updating the feature function value of the sum of the random variables in the current sliding window.
For data within a current sliding windowSliding windows based on ageCharacteristic function ofProcessing a new tuple tjNew resultsIt can be calculated incrementally as follows:
at the same time, the old tuple t is culledj-wNew resultsIt can be calculated incrementally as follows:
s304, according to the characteristic function of probability summationCalculating a probability Pr (Y) greater than a score threshold τ>τ), if Pr (Y)>τ)>If delta, outputting the query result as yes, otherwise, outputting no; and the query process of the current sliding window is finished, and the step jumps to the step S101 of the initialization stage.
Characteristic function of current sliding windowCan be expressed as a set of Gaussian components phicThen, there are:
wherein, Fc(τ) is the cumulative density function of the Gaussian distribution c. If Pr (Y)>τ)>And delta, outputting the query result as yes, otherwise, outputting no. And the query process of the current sliding window is finished, and the step jumps to the step S101 of the initialization stage.
In the method for querying the probability summation threshold of the uncertain data stream, the uncertain data is modeled into a Gaussian mixture model, so that the method is more flexible and efficient; meanwhile, a pruning strategy based on the properties of a Gaussian mixture model and a probability theory is designed, unnecessary calculation is reduced, in addition, a characteristic function is introduced in an accurate calculation stage, the complexity of an algorithm is reduced, and meanwhile, the calculation efficiency is further improved by utilizing incremental processing.
The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.
Claims (7)
1. A method for querying a probability summation threshold of an uncertain data stream, the method comprising the steps of:
(1) dividing continuous uncertain data into sliding windows and carrying out Gaussian mixture model modeling on random variables in each window, namely expressing the random variables by utilizing Gaussian distribution;
(2) performing two-time filtering judgment on random variables based on a first moment and a first-order variance, a second moment and a second-order variance of the sum of the random variables in a sliding window, outputting a query result and returning to the step (1) when a query result can be obtained by performing first filtering judgment according to the first moment and the first-order variance, performing second filtering judgment according to the second moment and the second-order variance when the query result cannot be obtained by performing first filtering judgment according to the first moment and the first-order variance, outputting the query result and returning to the step (1) when the query result can be obtained, and entering the step (3) when the query result cannot be obtained;
(3) and converting the random variable in the sliding window into a characteristic function, carrying out probability summation based on the characteristic function, judging whether the query result is 'yes' or 'no' according to the magnitude relation between the summed probability value and the score threshold value and the probability threshold value, and outputting the query result.
2. The uncertain data stream probability summation threshold query method according to claim 1, wherein in step (1), a new jth uncertain data t in uncertain data stream is obtainedjForm a sliding window with the latest w pieces of dataWherein w ∈ R+Is the length of the sliding window and takes a random variable XmRepresenting sliding windowsM-th tuple t in (1)j-w+m(1≤m≤w);
Using a mixture of univariates of the Gaussian model to the random variable XmModeling, namely representing uncertain data by using continuous random variables, wherein the model comprises k Gaussian variablesAnd corresponding non-negative probability (p)1,p2,…,pk)。
The probability density function for the random variable X is:
3. the uncertain data stream probability summation threshold query method according to claim 1, wherein the specific process of the step (2) is as follows:
(2-1) calculating a first order moment, a second order moment, a first order variance and a second order variance of the sum of all random variables within the sliding window according to the expectation and variance of the random variables;
(2-2) carrying out first filtering according to the first moment and the first variance of the sum of all random variables in the sliding window and the size relation between the score threshold and the probability threshold to judge a query result;
and (2-3) when the query result cannot be output, performing secondary filtering according to the second moment and the second variance of the sum of all random variables in the sliding window and the size relationship between the score threshold and the probability threshold to judge the query result.
4. The uncertain data stream probability summation threshold query method according to claim 3, wherein the step (2-1) specifically comprises the following steps:
(2-1-1) calculation of random variable XmDesired e (x) and variance var (x);
in particular, according to the expectation of a Gaussian distributionSum varianceCalculating the expectation E (X), and the specific formula is as follows:
(2-1-2) calculating the sum of all random variables within the sliding windowFirst order moment E (Y) and second order moment E (Y)2);
In particular, E (X) as desiredm) Sum variance Var (X)m) Calculating the first order moment E (Y) and the second order moment E (Y) of the sum Y of all random variables in the sliding window2) The concrete formula is as follows:
(2-1-3) calculating the variance Var (Y) of the sum Y of all random variables in the sliding window;
in particular, according to a first order moment E (Y) and a second order moment E (Y)2) Calculating the variance Var (Y) of the sum Y of all random variables in the sliding window, wherein the specific formula is as follows:
Var(Y)=E(Y2)-(E(Y))2
(2-1-4) calculating the fourth order moment E (Y) of the sum Y of all random variables in the sliding window4) And a second order variance Var (Y)2);
Specifically, according to the first order moment E (Y), the second order moment E (Y)2) And the first order variance Var (Y) calculating a fourth order moment E (Y) of the sum Y of all random variables in the sliding window4) And a second order variance Var (Y)2) The concrete formula is as follows:
E(Y4)=E(Y)4+6(E(Y))2Var(Y)+3(Var(Y))4
Var(Y2)=E(Y4)-(E(Y2))2
for new sliding windowVariable Y ═ Xj-w+2+Xj-w+3+…+Xj+1The first fourth moment and the first two variances can be calculated by the following formulas:
E(Y′)=E(Y)-E(Xj-w+1)+E(Xj+1)
E(Y′2)=E(Y2)-Var(Xj-w+1)+Var(Xj+1)+(E(Y′))2
Var(Y′)=E(Y′2)-(E(Y′))2
E(Y′4)=E(Y′)4+6(E(Y′))2Var(Y′)+3(Var(Y′))4
Var(Y′2)=E(Y′4)-(E(Y′2))2。
5. the uncertain data stream probability summation threshold query method according to claim 3, wherein the step (2-2) specifically comprises the following steps:
(2-2-1) if tau is greater than E (Y) and delta is greater than 0.5, outputting a query result, and jumping to the step (1) if the output query result is 'no';
(2-2-2) if τ>E (Y) and δ ≦ 0.5, when the condition is satisfied:if so, outputting the query result, if not, jumping to the step (1);
(2-2-3) if tau is less than or equal to E (Y) and delta is less than 0.5, outputting a query result, and skipping to the step (1) if the output query result is 'yes';
6. The uncertain data stream probability summation threshold query method according to claim 3, wherein the step (2-3) specifically comprises the following steps:
(2-3-1) if τ2>E(Y2) And delta>0.5, outputting the query result, and jumping to the step (1) if the output query result is 'no';
(2-3-2) if τ2>E(Y2) And delta is less than or equal to 0.5, when the condition is satisfied: if so, outputting the query result, if not, jumping to the step (1);
(2-3-3) if τ2≤E(Y2) And delta<0.5, the query result can be output, the output query result is yes, and the step (1) is skipped;
7. The uncertain data stream probability summation threshold query method according to claim 1, wherein the specific process of step (3) is as follows:
(3-1) Each random variable XmExpressed by a characteristic function;
random variable XmModeling as a Gaussian mixture model consisting of k expectation (μ)1,μ2,…,μk) Variance ofAnd a corresponding probability of (p)1,p2,…,pk) The Gaussian component of (1), then the random variable XmIs expressed as follows:
(3-2) representing the sum Y of all random variables of all uncertain data in the sliding window by using a characteristic function;
the sum of the random variables Y is w random variables (X)1,X2,…,Xw) Is a sum ofThen the characteristic function of the sum of random variables YIs represented as follows:
(3-3) for the random variable within the current sliding window, based on the old sliding window and the old summation resultIncrementally updating the characteristic function value of the sum of the random variables in the current sliding window;
for data within a current sliding windowSliding windows based on ageCharacteristic function ofProcessing a new tuple tjNew resultsIt can be calculated incrementally as follows:
at the same time, the old tuple t is culledj-wNew resultsIt can be calculated incrementally as follows:
s304, according to the characteristic function of probability summationCalculating a probability Pr (Y) greater than a score threshold τ>τ), if Pr (Y)>τ)>δ, outputting the query result as yes, otherwise, no; and (5) finishing the query process of the current sliding window and jumping to the step (1).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911106844.3A CN111026784B (en) | 2019-11-13 | 2019-11-13 | Uncertain data stream probability summation threshold query method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911106844.3A CN111026784B (en) | 2019-11-13 | 2019-11-13 | Uncertain data stream probability summation threshold query method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111026784A true CN111026784A (en) | 2020-04-17 |
CN111026784B CN111026784B (en) | 2022-05-03 |
Family
ID=70205457
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911106844.3A Active CN111026784B (en) | 2019-11-13 | 2019-11-13 | Uncertain data stream probability summation threshold query method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111026784B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114253265A (en) * | 2021-12-17 | 2022-03-29 | 成都朴为科技有限公司 | On-time arrival probability maximum path planning algorithm and system based on fourth-order moment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103593435A (en) * | 2013-11-12 | 2014-02-19 | 河海大学 | Approximate treatment system and method for uncertain data PT-TopK query |
CN104809185A (en) * | 2015-04-20 | 2015-07-29 | 西北工业大学 | Closed item set mining method facing uncertain data |
CN110362600A (en) * | 2019-07-22 | 2019-10-22 | 广西大学 | A kind of random ordering data flow distribution aggregate query method, system and medium |
-
2019
- 2019-11-13 CN CN201911106844.3A patent/CN111026784B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103593435A (en) * | 2013-11-12 | 2014-02-19 | 河海大学 | Approximate treatment system and method for uncertain data PT-TopK query |
CN104809185A (en) * | 2015-04-20 | 2015-07-29 | 西北工业大学 | Closed item set mining method facing uncertain data |
CN110362600A (en) * | 2019-07-22 | 2019-10-22 | 广西大学 | A kind of random ordering data flow distribution aggregate query method, system and medium |
Non-Patent Citations (2)
Title |
---|
卢印举等: "一种高效的不确定数据流Top-K查询算法", 《科学技术与工程》 * |
陈东辉等: "不确定性数据上聚合查询的近似算法", 《清华大学学报 (自然科学版)》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114253265A (en) * | 2021-12-17 | 2022-03-29 | 成都朴为科技有限公司 | On-time arrival probability maximum path planning algorithm and system based on fourth-order moment |
CN114253265B (en) * | 2021-12-17 | 2023-10-20 | 成都朴为科技有限公司 | On-time arrival probability maximum path planning algorithm and system based on fourth-order moment |
Also Published As
Publication number | Publication date |
---|---|
CN111026784B (en) | 2022-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Scalable gradients and variational inference for stochastic differential equations | |
Frühwirth-Schnatter et al. | Auxiliary mixture sampling for parameter-driven models of time series of counts with applications to state space modelling | |
Lv et al. | A unified approach to model selection and sparse recovery using regularized least squares | |
Dempster et al. | Maximum likelihood from incomplete data via the EM algorithm | |
Krishnamurthy | Bayesian sequential detection with phase-distributed change time and nonlinear penalty—A POMDP lattice programming approach | |
Liu et al. | Belief propagation for structured decision making | |
CN109447261B (en) | Network representation learning method based on multi-order proximity similarity | |
Zhang et al. | Asymptotically efficient recursive identification of FIR systems with binary-valued observations | |
CN111025914B (en) | Neural network system remote state estimation method and device based on communication limitation | |
CN111026784B (en) | Uncertain data stream probability summation threshold query method | |
Yang et al. | A feasible sequential linear equation method for inequality constrained optimization | |
Yip | A martingale estimating equation for a capture-recapture experiment in discrete time | |
Stanković et al. | Nonlinear robustified stochastic consensus seeking | |
Kaul et al. | Detection and estimation of parameters in high dimensional multiple change point regression models via $\ell_1/\ell_0 $ regularization and discrete optimization | |
Qi et al. | A smoothing Newton method for minimizing a sum of Euclidean norms | |
CN110794676A (en) | CSTR process nonlinear control method based on Hammerstein-Wiener model | |
Ma et al. | State estimation of nonlinear time-varying complex networks with time-varying sensor delay for unknown noise distributions | |
Yin et al. | Tracking and identification of regime-switching systems using binary sensors | |
Shao et al. | Recovering chaotic properties from small data | |
Decrouez et al. | A class of multifractal processes constructed using an embedded branching process | |
Lee et al. | Learning causal networks via additive faithfulness | |
Braunstein et al. | Loop corrections in spin models through density consistency | |
Cui | On asymptotics of t-type regression estimation in multiple linear model | |
Zhong et al. | An information geometry algorithm for distribution control | |
Nedich et al. | Lyapunov approach to consensus problems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |