CN111026784A - Uncertain data stream probability summation threshold query method - Google Patents

Uncertain data stream probability summation threshold query method Download PDF

Info

Publication number
CN111026784A
CN111026784A CN201911106844.3A CN201911106844A CN111026784A CN 111026784 A CN111026784 A CN 111026784A CN 201911106844 A CN201911106844 A CN 201911106844A CN 111026784 A CN111026784 A CN 111026784A
Authority
CN
China
Prior art keywords
query result
sliding window
variance
probability
sum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911106844.3A
Other languages
Chinese (zh)
Other versions
CN111026784B (en
Inventor
陈岭
陈东辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201911106844.3A priority Critical patent/CN111026784B/en
Publication of CN111026784A publication Critical patent/CN111026784A/en
Application granted granted Critical
Publication of CN111026784B publication Critical patent/CN111026784B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a method for querying a probability summation threshold of an uncertain data stream, and belongs to the technical field of query processing of uncertain data streams. The method comprises the following steps: 1) initializing, including inquiring parameter setting and modeling uncertain data flow by using a Gaussian mixture model; 2) obtaining an upper bound or a lower bound of a result by using a filtering strategy based on the properties and probability theory of a Gaussian mixture model, thereby quickly making a judgment; 3) when the filtering strategy is invalid, a sliding window model is used for calculating an accurate value, and the calculation cost is reduced through incremental calculation. The method for querying the probability summation threshold of the uncertain data stream has wide application prospect in the fields of cluster monitoring, health monitoring, intelligent security and the like.

Description

Uncertain data stream probability summation threshold query method
Technical Field
The invention relates to the technical field of uncertain data stream query processing, in particular to a method for querying a probability summation threshold of an uncertain data stream.
Background
With the development of sensing and network technologies, data streams can be widely acquired. Data in a data stream is typically a probabilistic-based representation due to inherent errors in the device, interference from ambient noise, recovery of lost information through inference, etc. Simply computing the statistics (e.g., mean and variance) of these uncertain data will lose useful information and even draw incorrect conclusions. Uncertain data flow management can solve these problems by employing an uncertain data model to support probabilistic queries, where probabilistic summation queries (probabilistic query) is an important query type that takes a large amount of uncertain data (such as a probability distribution function) as input and returns a probability distribution as a result. In many monitoring applications, it is only necessary to know whether the result distribution exceeds a user-defined threshold. An example is given below.
Example 1: and (5) monitoring the temperature. Six sensors measure the temperature of an object simultaneously. Temperature readings can be subject to errors due to errors inherent in the sensors and interference from noise signals. The temperature readings of the six sensors are converted to a probability distribution using a data fusion technique, such as density estimation. Then, probability distributions at different time instants are aggregated to detect anomalies. To this end, the monitoring application programs devise the following queries:
and (3) inquiring: is the probability of the average temperature exceeding 60 degrees greater than 80% in the last 10 minutes?
When the query result is "true", an alarm will be triggered.
The above query explicitly considers the load fluctuations of the cluster as a whole in the last 10 minutes and introduces two thresholds into the probability summation query, one being the probability threshold and the other being the score threshold. The query is an uncertain data stream probability summation threshold query, and is an extension of the uncertain data stream probability summation query.
Although there has been a lot of research work on probabilistic summation queries on uncertain data streams, most of these methods focus on obtaining approximation results based on unbounded data stream models by proposing space and time efficient algorithms. Still other approaches implement incremental updating of results by processing both incoming and outgoing tuples through a sliding window model. In addition, in the existing probability threshold query method, although various filtering strategies (such as distance-based filtering and probability-based filtering) are designed, the filtering strategies of the queries are designed for specific query types, and threshold semantics of different query types are different in nature (for example, two thresholds in the probability range threshold query: a range threshold and a probability threshold; and two thresholds in the probability summation threshold query: a score threshold and a probability threshold). At present, no uncertain data stream probability summation threshold query method is available. A naive solution is to consider the threshold constraint after performing the probabilistic summation query to get the final result. The computational efficiency of this approach is very low (i.e., it is not necessary to compute the result distribution for any given sliding window) due to the separation of query processing and threshold computation.
Disclosure of Invention
The method aims to solve the technical problem of how to efficiently process the uncertain data stream probability summation threshold query. The invention provides a method for querying a probability summation threshold of an uncertain data stream.
The technical scheme of the invention is as follows:
a method for querying a probability summation threshold of an uncertain data stream, the method comprising the steps of:
(1) dividing continuous uncertain data into sliding windows and carrying out Gaussian mixture model modeling on random variables in each window, namely expressing the random variables by utilizing Gaussian distribution;
(2) performing two-time filtering judgment on random variables based on a first moment and a first-order variance, a second moment and a second-order variance of the sum of the random variables in a sliding window, outputting a query result and returning to the step (1) when a query result can be obtained by performing first filtering judgment according to the first moment and the first-order variance, performing second filtering judgment according to the second moment and the second-order variance when the query result cannot be obtained by performing first filtering judgment according to the first moment and the first-order variance, outputting the query result and returning to the step (1) when the query result can be obtained, and entering the step (3) when the query result cannot be obtained;
(3) and converting the random variable in the sliding window into a characteristic function, carrying out probability summation based on the characteristic function, judging whether the query result is 'yes' or 'no' according to the magnitude relation between the summed probability value and the score threshold value and the probability threshold value, and outputting the query result.
When the method is used for processing the query of the probability summation threshold of the uncertain data stream, the properties and the probability theory of the Gaussian mixture model are fully utilized, and the characteristic function, the pruning strategy and the incremental processing based on the sliding window are combined, so that the calculation efficiency is improved. Compared with the prior art, the method has the advantages that:
1) the uncertain data are modeled into a Gaussian mixture model, so that the method is more flexible and efficient.
2) And a pruning strategy based on the properties of a Gaussian mixture model and a probability theory is designed, so that unnecessary calculation is reduced.
3) In the accurate calculation stage, a characteristic function is introduced, so that the complexity of the algorithm is reduced, and meanwhile, the calculation efficiency is further improved by utilizing incremental processing.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flow chart of a method for querying a summation threshold of probabilities of an uncertain data stream according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a flowchart of a method for querying a probability summation threshold of an uncertain data stream according to an embodiment of the present invention. As shown in fig. 1, the uncertain data flow probability summation threshold query method provided by the embodiment uses a continuous random variable instead of a discrete random variable to represent uncertain data; a Gaussian mixture model is adopted as a basic model to improve the calculation efficiency and provide high flexibility; and integrating a filtering strategy and accurate calculation in query processing, quickly making judgment by using the filtering strategy based on the properties of a Gaussian mixture model and a probability theory, and performing incremental calculation on an accurate value by using a sliding window model when the filtering strategy is invalid. The method specifically comprises an initialization stage, a rapid judgment stage based on a filtering strategy and an accurate calculation stage based on a sliding window model. Each stage is explained in detail below.
Initialization phase
The initial stage is mainly used for dividing sliding windows and carrying out Gaussian mixture model modeling on random variables in each window, namely, the Gaussian distribution is used for representing the random variables, and the method specifically comprises the following steps:
s101, acquiring a new jth uncertain data t in an uncertain data streamjForm a sliding window with the latest w pieces of data
Figure BDA0002271569470000041
Wherein w ∈ R+Is the length of the sliding window and takes a random variable XmRepresenting sliding windows
Figure BDA0002271569470000042
M-th tuple t in (1)j-w+m(1≤m≤w);
S102, setting a score threshold tau (tau epsilon R)+) And a probability threshold δ (δ ∈ (0,1)), the uncertain data stream probability sum threshold query can be expressed as: probability Pr (Y) of random variable Y being greater than tau>τ) is greater than δ, i.e. the inequality Pr (Y)>τ)>Whether or not δ is true. If the inequality is true, the query result is yes, otherwise, the query result is no.
S103, adopting a single-variable Gaussian mixture model to the random variable XmModeling, namely representing uncertain data by using continuous random variables, wherein the model comprises k Gaussian variables
Figure BDA0002271569470000051
And corresponding non-negative probability (p)1,p2,…,pk)。
The probability density function for the random variable X is:
Figure BDA0002271569470000052
wherein,
Figure BDA0002271569470000053
μiand σi 2Is a Gaussian variable
Figure BDA0002271569470000054
The expectation and variance of (c), i.e.:
Figure BDA0002271569470000055
thus, all data in each sliding window is represented by a gaussian mixture model through S101 to S103, and the gaussian mixture model is used as a basic model to improve the calculation efficiency and provide high flexibility.
Fast judging stage based on filtering strategy
The fast judging stage based on the filtering strategy is mainly used for carrying out twice filtering judgment on random variables based on a first moment and a first order variance, a second moment and a second order variance of the sum of the random variables in a sliding window, when a query result can be obtained by carrying out first filtering judgment according to the first moment and the first order variance, the query result is output and returned to the initializing stage, new uncertain data are obtained again, when the query result cannot be obtained by carrying out first filtering judgment according to the first moment and the first order variance, second filtering judgment is carried out according to the second moment and the second order variance, when the query result can be obtained, the query result is output and returned to the initializing stage, and when the query result cannot be obtained, the accurate calculating stage based on the sliding window model is entered. The method specifically comprises the following steps:
s201, calculating a first order moment, a second order moment, a first order variance and a second order variance of the sum of all random variables in a sliding window according to the expectation and the variance of the random variables;
s201 specifically includes the following steps:
s2011 calculates a random variable XmDesired e (x) and variance var (x);
in particular, according to the expectation of a Gaussian distribution
Figure BDA0002271569470000061
Sum variance
Figure BDA0002271569470000062
Calculating the expectation E (X), and the specific formula is as follows:
Figure BDA0002271569470000063
Figure BDA0002271569470000064
s2012, calculating the sum of all random variables in the sliding window
Figure BDA0002271569470000065
First order moment E (Y) and second order moment E (Y)2)。
In particular, E (X) as desiredm) Sum variance Var (X)m) Calculating the first order moment E (Y) and the second order moment E (Y) of the sum Y of all random variables in the sliding window2) The concrete formula is as follows:
Figure BDA0002271569470000066
Figure BDA0002271569470000067
s2013, calculating the variance Var (Y) of the sum Y of all random variables in the sliding window.
In particular, according to a first order moment E (Y) and a second order moment E (Y)2) Calculating the variance Var (Y) of the sum Y of all random variables in the sliding window, wherein the specific formula is as follows:
Var(Y)=E(Y2)-(E(Y))2(7)
s2014, calculating a fourth moment E (Y) of the sum Y of all random variables in the sliding window4) And a second order variance Var (Y)2)。
Specifically, according to the first order moment E (Y), the second order moment E (Y)2) And the first order variance Var (Y) calculating a fourth order moment E (Y) of the sum Y of all random variables in the sliding window4) And a second order variance Var (Y)2) The concrete formula is as follows:
E(Y4)=E(Y)4+6(E(Y))2Var(Y)+3(Var(Y))4(8)
Var(Y2)=E(Y4)-(E(Y2))2(9)
in order to reduce the calculation amount, the first fourth moment and the first second moment can utilize the result of the previous sliding window to realize incremental calculation. For new sliding window
Figure BDA0002271569470000068
Variable Y ═ Xj-w+2+Xj-w+3+…+Xj+1The first four moments of (a) can be calculated by the following formula:
E(Y′)=E(Y)-E(Xj-w+1)+E(Xj+1), (10)
E(Y′2)=E(Y2)-Var(Xj-w+1)+Var(Xj+1)+(E(Y))2(11)
Var(Y′)=E(Y′2)-(E(Y′))2(12)
E(Y′4)=E(Y′)4+6(E(Y′))2Var(Y′)+3(Var(Y′))4(13)
Var(Y′2)=E(Y′4)-(E(Y′2))2(14)
s202, performing first filtering according to the first moment E (Y) and the first variance Var (Y) of the sum of all random variables in the sliding window and the size relationship between the score threshold and the probability threshold to judge a query result;
s202 specifically includes the following steps:
s2021, if tau is greater than E (Y) and delta is greater than 0.5, outputting a query result, and jumping to S101 in an initialization stage if the output query result is 'no';
because τ > E (Y), the probability Pr (Y > τ) that the random variable Y is greater than τ is less than Pr (Y ≧ E (Y)), and Pr (Y ≧ E (Y)), (Y)) is 0.5, the value of Pr (Y > τ) must be less than 0.5. Inequality Pr (Y > τ) > δ must not be true, so the output query result is no.
S2022, if τ>E (Y) and δ ≦ 0.5, when the condition is satisfied:
Figure BDA0002271569470000071
if so, outputting a query result, if not, jumping to the S101 of the initialization stage;
obtained according to the unilateral chebyshev inequality:
Figure BDA0002271569470000072
when the conditions are as follows:
Figure BDA0002271569470000073
when satisfied, Pr (Y)>τ)>Since δ does not hold, the output query result is no.
S2023, if tau is less than or equal to E (Y) and delta is less than 0.5, outputting a query result, wherein the output query result is 'yes', and jumping to S101 in an initialization stage;
since Pr (Y > τ) > Pr (Y.gtoreq.E (Y)) >0.5, the value of Pr (Y > τ) must be 0.5 or more.
S2024, if tau is less than or equal to E (Y) and delta is more than or equal to 0.5, when the condition is met:
Figure BDA0002271569470000081
if so, outputting the query result, and if so, outputting the query result as yes and jumping to the S101 in the initialization stage;
obtained according to the unilateral chebyshev inequality:
Figure BDA0002271569470000082
when the condition is satisfied:
Figure BDA0002271569470000083
pr (Y)>τ)>δ holds true.
S203, when the query result can not be output, according to the sliding windowSecond moment E (Y) of the sum of all random variables in the mouth2) And a second order variance Var (Y)2) Performing secondary filtering on the relationship between the score threshold and the probability threshold to judge a query result;
s203 specifically includes the following steps:
s2031, if τ2>E(Y2) And delta>0.5, outputting the query result, and jumping to the S101 in the initialization stage if the output query result is 'no';
s2032, if tau2>E(Y2) And delta is less than or equal to 0.5, when the condition is satisfied:
Figure BDA0002271569470000084
if so, outputting a query result, if not, jumping to the S101 of the initialization stage;
Pr(Y>tau) is equivalently converted into Pr (Y)22). Obtained according to the unilateral chebyshev inequality:
Figure BDA0002271569470000085
when the condition is satisfied:
Figure BDA0002271569470000086
Figure BDA0002271569470000087
pr (Y)>τ)>δ does not hold.
S2033, if tau2≤E(Y2) And delta<0.5, the query result can be output, and the output query result is yes, and the step is shifted to S101 in the initialization stage;
s2034, if tau2≤E(Y2) And delta is more than or equal to 0.5, and when the condition is met:
Figure BDA0002271569470000091
if so, outputting a query result, wherein the output query result is 'yes', and jumping to the S101 in the initialization stage;
obtained according to the unilateral chebyshev inequality:
Figure BDA0002271569470000092
when the condition is satisfied:
Figure BDA0002271569470000093
pr (Y)>τ)>δ holds true.
Accurate calculation phase based on sliding window model
The accurate calculation stage based on the sliding window model is mainly used for converting random variables in the sliding window into characteristic functions, carrying out probability summation based on the characteristic functions, and calculating the score threshold tau (tau belongs to R) according to the summed probability value+) And the size relation with the probability threshold value delta, judging whether the query result is 'yes' or 'no', and outputting the query result. The specific process is as follows:
s301, each random variable XmExpressed by a characteristic function;
random variable XmModeling as a Gaussian mixture model consisting of k expectation (μ)12,…,μk) Variance of
Figure BDA0002271569470000094
And a corresponding probability of (p)1,p2,…,pk) The Gaussian component of (1), then the random variable XmIs expressed as follows:
Figure BDA0002271569470000095
wherein,
Figure BDA0002271569470000096
s302, representing the sum Y of all random variables of all uncertain data in a sliding window by using a characteristic function;
the sum of the random variables Y is w random variables (X)1,X2,…,Xw) Is a sum of
Figure BDA0002271569470000097
Then the sum of the random variables YCharacteristic function of
Figure BDA0002271569470000098
Is represented as follows:
Figure BDA0002271569470000099
as can be seen from equation (16), for a linear combination of a plurality of random variables, the computation using the feature function is very efficient, and the use of the probability density function requires multiple integrations, which consumes a large amount of computing resources.
S303, for the random variable in the current sliding window, based on the old sliding window and the old summation result
Figure BDA0002271569470000101
And incrementally updating the feature function value of the sum of the random variables in the current sliding window.
For data within a current sliding window
Figure BDA0002271569470000102
Sliding windows based on age
Figure BDA0002271569470000103
Characteristic function of
Figure BDA0002271569470000104
Processing a new tuple tjNew results
Figure BDA0002271569470000105
It can be calculated incrementally as follows:
Figure BDA0002271569470000106
at the same time, the old tuple t is culledj-wNew results
Figure BDA0002271569470000107
It can be calculated incrementally as follows:
Figure BDA0002271569470000108
s304, according to the characteristic function of probability summation
Figure BDA0002271569470000109
Calculating a probability Pr (Y) greater than a score threshold τ>τ), if Pr (Y)>τ)>If delta, outputting the query result as yes, otherwise, outputting no; and the query process of the current sliding window is finished, and the step jumps to the step S101 of the initialization stage.
Characteristic function of current sliding window
Figure BDA00022715694700001010
Can be expressed as a set of Gaussian components phicThen, there are:
Figure BDA00022715694700001011
wherein, Fc(τ) is the cumulative density function of the Gaussian distribution c. If Pr (Y)>τ)>And delta, outputting the query result as yes, otherwise, outputting no. And the query process of the current sliding window is finished, and the step jumps to the step S101 of the initialization stage.
In the method for querying the probability summation threshold of the uncertain data stream, the uncertain data is modeled into a Gaussian mixture model, so that the method is more flexible and efficient; meanwhile, a pruning strategy based on the properties of a Gaussian mixture model and a probability theory is designed, unnecessary calculation is reduced, in addition, a characteristic function is introduced in an accurate calculation stage, the complexity of an algorithm is reduced, and meanwhile, the calculation efficiency is further improved by utilizing incremental processing.
The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims (7)

1. A method for querying a probability summation threshold of an uncertain data stream, the method comprising the steps of:
(1) dividing continuous uncertain data into sliding windows and carrying out Gaussian mixture model modeling on random variables in each window, namely expressing the random variables by utilizing Gaussian distribution;
(2) performing two-time filtering judgment on random variables based on a first moment and a first-order variance, a second moment and a second-order variance of the sum of the random variables in a sliding window, outputting a query result and returning to the step (1) when a query result can be obtained by performing first filtering judgment according to the first moment and the first-order variance, performing second filtering judgment according to the second moment and the second-order variance when the query result cannot be obtained by performing first filtering judgment according to the first moment and the first-order variance, outputting the query result and returning to the step (1) when the query result can be obtained, and entering the step (3) when the query result cannot be obtained;
(3) and converting the random variable in the sliding window into a characteristic function, carrying out probability summation based on the characteristic function, judging whether the query result is 'yes' or 'no' according to the magnitude relation between the summed probability value and the score threshold value and the probability threshold value, and outputting the query result.
2. The uncertain data stream probability summation threshold query method according to claim 1, wherein in step (1), a new jth uncertain data t in uncertain data stream is obtainedjForm a sliding window with the latest w pieces of data
Figure FDA0002271569460000011
Wherein w ∈ R+Is the length of the sliding window and takes a random variable XmRepresenting sliding windows
Figure FDA0002271569460000012
M-th tuple t in (1)j-w+m(1≤m≤w);
Using a mixture of univariates of the Gaussian model to the random variable XmModeling, namely representing uncertain data by using continuous random variables, wherein the model comprises k Gaussian variables
Figure FDA0002271569460000013
And corresponding non-negative probability (p)1,p2,…,pk)。
The probability density function for the random variable X is:
Figure FDA0002271569460000021
wherein,
Figure FDA0002271569460000022
μiand σi 2Is a Gaussian variable
Figure FDA0002271569460000023
The expectation and variance of (c), i.e.:
Figure FDA0002271569460000024
3. the uncertain data stream probability summation threshold query method according to claim 1, wherein the specific process of the step (2) is as follows:
(2-1) calculating a first order moment, a second order moment, a first order variance and a second order variance of the sum of all random variables within the sliding window according to the expectation and variance of the random variables;
(2-2) carrying out first filtering according to the first moment and the first variance of the sum of all random variables in the sliding window and the size relation between the score threshold and the probability threshold to judge a query result;
and (2-3) when the query result cannot be output, performing secondary filtering according to the second moment and the second variance of the sum of all random variables in the sliding window and the size relationship between the score threshold and the probability threshold to judge the query result.
4. The uncertain data stream probability summation threshold query method according to claim 3, wherein the step (2-1) specifically comprises the following steps:
(2-1-1) calculation of random variable XmDesired e (x) and variance var (x);
in particular, according to the expectation of a Gaussian distribution
Figure FDA0002271569460000025
Sum variance
Figure FDA0002271569460000026
Calculating the expectation E (X), and the specific formula is as follows:
Figure FDA0002271569460000027
Figure FDA0002271569460000028
(2-1-2) calculating the sum of all random variables within the sliding window
Figure FDA0002271569460000029
First order moment E (Y) and second order moment E (Y)2);
In particular, E (X) as desiredm) Sum variance Var (X)m) Calculating the first order moment E (Y) and the second order moment E (Y) of the sum Y of all random variables in the sliding window2) The concrete formula is as follows:
Figure FDA0002271569460000031
Figure FDA0002271569460000032
(2-1-3) calculating the variance Var (Y) of the sum Y of all random variables in the sliding window;
in particular, according to a first order moment E (Y) and a second order moment E (Y)2) Calculating the variance Var (Y) of the sum Y of all random variables in the sliding window, wherein the specific formula is as follows:
Var(Y)=E(Y2)-(E(Y))2
(2-1-4) calculating the fourth order moment E (Y) of the sum Y of all random variables in the sliding window4) And a second order variance Var (Y)2);
Specifically, according to the first order moment E (Y), the second order moment E (Y)2) And the first order variance Var (Y) calculating a fourth order moment E (Y) of the sum Y of all random variables in the sliding window4) And a second order variance Var (Y)2) The concrete formula is as follows:
E(Y4)=E(Y)4+6(E(Y))2Var(Y)+3(Var(Y))4
Var(Y2)=E(Y4)-(E(Y2))2
for new sliding window
Figure FDA0002271569460000033
Variable Y ═ Xj-w+2+Xj-w+3+…+Xj+1The first fourth moment and the first two variances can be calculated by the following formulas:
E(Y′)=E(Y)-E(Xj-w+1)+E(Xj+1)
E(Y′2)=E(Y2)-Var(Xj-w+1)+Var(Xj+1)+(E(Y′))2
Var(Y′)=E(Y′2)-(E(Y′))2
E(Y′4)=E(Y′)4+6(E(Y′))2Var(Y′)+3(Var(Y′))4
Var(Y′2)=E(Y′4)-(E(Y′2))2
5. the uncertain data stream probability summation threshold query method according to claim 3, wherein the step (2-2) specifically comprises the following steps:
(2-2-1) if tau is greater than E (Y) and delta is greater than 0.5, outputting a query result, and jumping to the step (1) if the output query result is 'no';
(2-2-2) if τ>E (Y) and δ ≦ 0.5, when the condition is satisfied:
Figure FDA0002271569460000041
if so, outputting the query result, if not, jumping to the step (1);
(2-2-3) if tau is less than or equal to E (Y) and delta is less than 0.5, outputting a query result, and skipping to the step (1) if the output query result is 'yes';
(2-2-4) if τ is less than or equal to E (Y) and δ is more than or equal to 0.5, when the condition is satisfied:
Figure FDA0002271569460000042
and (3) if the query result can be output, outputting the query result as yes, and jumping to the step (1).
6. The uncertain data stream probability summation threshold query method according to claim 3, wherein the step (2-3) specifically comprises the following steps:
(2-3-1) if τ2>E(Y2) And delta>0.5, outputting the query result, and jumping to the step (1) if the output query result is 'no';
(2-3-2) if τ2>E(Y2) And delta is less than or equal to 0.5, when the condition is satisfied:
Figure FDA0002271569460000043
Figure FDA0002271569460000044
if so, outputting the query result, if not, jumping to the step (1);
(2-3-3) if τ2≤E(Y2) And delta<0.5, the query result can be output, the output query result is yes, and the step (1) is skipped;
(2-3-4) if τ2≤E(Y2) And delta is more than or equal to 0.5, and when the condition is met:
Figure FDA0002271569460000051
and (3) outputting the query result, wherein the output query result is 'yes', and skipping to the step (1).
7. The uncertain data stream probability summation threshold query method according to claim 1, wherein the specific process of step (3) is as follows:
(3-1) Each random variable XmExpressed by a characteristic function;
random variable XmModeling as a Gaussian mixture model consisting of k expectation (μ)12,…,μk) Variance of
Figure FDA0002271569460000052
And a corresponding probability of (p)1,p2,…,pk) The Gaussian component of (1), then the random variable XmIs expressed as follows:
Figure FDA0002271569460000053
wherein,
Figure FDA0002271569460000054
(3-2) representing the sum Y of all random variables of all uncertain data in the sliding window by using a characteristic function;
the sum of the random variables Y is w random variables (X)1,X2,…,Xw) Is a sum of
Figure FDA0002271569460000055
Then the characteristic function of the sum of random variables Y
Figure FDA0002271569460000056
Is represented as follows:
Figure FDA0002271569460000057
(3-3) for the random variable within the current sliding window, based on the old sliding window and the old summation result
Figure FDA0002271569460000058
Incrementally updating the characteristic function value of the sum of the random variables in the current sliding window;
for data within a current sliding window
Figure FDA0002271569460000059
Sliding windows based on age
Figure FDA00022715694600000510
Characteristic function of
Figure FDA00022715694600000511
Processing a new tuple tjNew results
Figure FDA00022715694600000512
It can be calculated incrementally as follows:
Figure FDA00022715694600000513
at the same time, the old tuple t is culledj-wNew results
Figure FDA0002271569460000061
It can be calculated incrementally as follows:
Figure FDA0002271569460000062
s304, according to the characteristic function of probability summation
Figure FDA0002271569460000063
Calculating a probability Pr (Y) greater than a score threshold τ>τ), if Pr (Y)>τ)>δ, outputting the query result as yes, otherwise, no; and (5) finishing the query process of the current sliding window and jumping to the step (1).
CN201911106844.3A 2019-11-13 2019-11-13 Uncertain data stream probability summation threshold query method Active CN111026784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911106844.3A CN111026784B (en) 2019-11-13 2019-11-13 Uncertain data stream probability summation threshold query method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911106844.3A CN111026784B (en) 2019-11-13 2019-11-13 Uncertain data stream probability summation threshold query method

Publications (2)

Publication Number Publication Date
CN111026784A true CN111026784A (en) 2020-04-17
CN111026784B CN111026784B (en) 2022-05-03

Family

ID=70205457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911106844.3A Active CN111026784B (en) 2019-11-13 2019-11-13 Uncertain data stream probability summation threshold query method

Country Status (1)

Country Link
CN (1) CN111026784B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114253265A (en) * 2021-12-17 2022-03-29 成都朴为科技有限公司 On-time arrival probability maximum path planning algorithm and system based on fourth-order moment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593435A (en) * 2013-11-12 2014-02-19 河海大学 Approximate treatment system and method for uncertain data PT-TopK query
CN104809185A (en) * 2015-04-20 2015-07-29 西北工业大学 Closed item set mining method facing uncertain data
CN110362600A (en) * 2019-07-22 2019-10-22 广西大学 A kind of random ordering data flow distribution aggregate query method, system and medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593435A (en) * 2013-11-12 2014-02-19 河海大学 Approximate treatment system and method for uncertain data PT-TopK query
CN104809185A (en) * 2015-04-20 2015-07-29 西北工业大学 Closed item set mining method facing uncertain data
CN110362600A (en) * 2019-07-22 2019-10-22 广西大学 A kind of random ordering data flow distribution aggregate query method, system and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
卢印举等: "一种高效的不确定数据流Top-K查询算法", 《科学技术与工程》 *
陈东辉等: "不确定性数据上聚合查询的近似算法", 《清华大学学报 (自然科学版)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114253265A (en) * 2021-12-17 2022-03-29 成都朴为科技有限公司 On-time arrival probability maximum path planning algorithm and system based on fourth-order moment
CN114253265B (en) * 2021-12-17 2023-10-20 成都朴为科技有限公司 On-time arrival probability maximum path planning algorithm and system based on fourth-order moment

Also Published As

Publication number Publication date
CN111026784B (en) 2022-05-03

Similar Documents

Publication Publication Date Title
Li et al. Scalable gradients and variational inference for stochastic differential equations
Frühwirth-Schnatter et al. Auxiliary mixture sampling for parameter-driven models of time series of counts with applications to state space modelling
Lv et al. A unified approach to model selection and sparse recovery using regularized least squares
Dempster et al. Maximum likelihood from incomplete data via the EM algorithm
Krishnamurthy Bayesian sequential detection with phase-distributed change time and nonlinear penalty—A POMDP lattice programming approach
Liu et al. Belief propagation for structured decision making
CN109447261B (en) Network representation learning method based on multi-order proximity similarity
Zhang et al. Asymptotically efficient recursive identification of FIR systems with binary-valued observations
CN111025914B (en) Neural network system remote state estimation method and device based on communication limitation
CN111026784B (en) Uncertain data stream probability summation threshold query method
Yang et al. A feasible sequential linear equation method for inequality constrained optimization
Yip A martingale estimating equation for a capture-recapture experiment in discrete time
Stanković et al. Nonlinear robustified stochastic consensus seeking
Kaul et al. Detection and estimation of parameters in high dimensional multiple change point regression models via $\ell_1/\ell_0 $ regularization and discrete optimization
Qi et al. A smoothing Newton method for minimizing a sum of Euclidean norms
CN110794676A (en) CSTR process nonlinear control method based on Hammerstein-Wiener model
Ma et al. State estimation of nonlinear time-varying complex networks with time-varying sensor delay for unknown noise distributions
Yin et al. Tracking and identification of regime-switching systems using binary sensors
Shao et al. Recovering chaotic properties from small data
Decrouez et al. A class of multifractal processes constructed using an embedded branching process
Lee et al. Learning causal networks via additive faithfulness
Braunstein et al. Loop corrections in spin models through density consistency
Cui On asymptotics of t-type regression estimation in multiple linear model
Zhong et al. An information geometry algorithm for distribution control
Nedich et al. Lyapunov approach to consensus problems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant