CN102609439A - Window-based probability query method for fuzzy data in high-dimensional environment - Google Patents

Window-based probability query method for fuzzy data in high-dimensional environment Download PDF

Info

Publication number
CN102609439A
CN102609439A CN2011104371365A CN201110437136A CN102609439A CN 102609439 A CN102609439 A CN 102609439A CN 2011104371365 A CN2011104371365 A CN 2011104371365A CN 201110437136 A CN201110437136 A CN 201110437136A CN 102609439 A CN102609439 A CN 102609439A
Authority
CN
China
Prior art keywords
probability
query
window
candidate
compressed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011104371365A
Other languages
Chinese (zh)
Inventor
胡天磊
寿黎但
陈刚
陈珂
马春洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN2011104371365A priority Critical patent/CN102609439A/en
Publication of CN102609439A publication Critical patent/CN102609439A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a window-based probability query method for fuzzy data in the high-dimensional environment, which includes: compressing information of the fuzzy region and information of the probability distribution function of each object by means of meshing, column charts and wavelet transformation; storing all compressed information of the object into an index file; inquiring by firstly calculating the upper limit of the probability that the object turns to the inquiry result according to all compressed information of the object and then pruning the unqualified object according to the upper limit of probability of each object so as to acquire a candidate answer set; and finally judging whether the candidate object is the real inquiry result or not according to the uncompressed information of each candidate object in the candidate answer set. On the basis of existing research and implement achievements on database and information retrieval and expansion and fusion of the existing compression methods, window-based probability query capacity for fuzzy data can be realized conveniently and quickly, dependence on the dimensionality of the fuzzy data is omitted, and the best performance can be achieved by the window-based probability query method.

Description

The probability window query method of fuzzy data in a kind of higher-dimension environment
Technical field
The present invention relates to the compression and the inquiring technology of Database Systems, information retrieval, higher-dimension fuzzy data, particularly relate to the probability window query method of fuzzy data in a kind of higher-dimension environment.
Background technology
In increasing application, data have all represented ambiguity.And the lot of fuzzy data all are in the middle of the higher-dimension environment.Such application data comprises multidimensional data in the sensor database, urban population census data and Flame Image Process data etc.In the application of this type, each object is represented by a fuzzy region and a probability distribution function.Probability distribution function can be continuous probability distribution function, also can be the probability distribution function that disperses.
In the middle of practical application, window query is the most basic and of paramount importance query type.In addition, window query also often is used as the strobe utility of multi-dimensional query when query processing of various complicacies.Probability window query is specified a query window and a probability threshold value, and it searches the probability that is in this query window all objects greater than given threshold value from database.
The existing method spininess that can handle the probability window query designs the data in the low dimension environment, can't when data dimension is higher, still keep good query performance.And traditional method of good query performance that can under the higher-dimension situation, still keep all can't be used on fuzzy data set.
In this case, a kind of index structure and the probability window query disposal route that can effectively manage various magnanimity high dimensional datas of design is very important.
Summary of the invention
The object of the present invention is to provide the probability window query method of fuzzy data in a kind of higher-dimension environment.
The step that the present invention solves the technical scheme that its technical matters adopts is following:
1) the fuzzy region information of object is compressed with the grid dividing method;
2) the probability distribution function information of object is compressed with the histogram method;
3) with step 2) in the information of histogram compress with small wave converting method;
4) the whole compressed informations with each object in step 1) and the step 3) are kept in the index file;
5) when query processing, utilize whole compressed informations of each object to calculate the upper bound that each object becomes the probability of Query Result;
6) utilize that the underproof object of bound pair carries out beta pruning on the probability of each object, thereby obtain a candidate answers set;
7) according to the not compressed fuzzy region information and the probability distribution function information of each candidate's object in the candidate answers set, judge whether each candidate's object is real Query Result.
Described step 1) utilizes the grid dividing method that the fuzzy region information of object is compressed, thereby uses bit value to represent the fuzzy region of object.
Described step 2) utilizes the histogram method that the probability distribution function information of object is compressed, obtain the sequence of a probability.
In the described step 3) to step 2) in the sequence of the probability that obtains carry out wavelet transformation, then from all wavelet coefficients that obtain the deletion absolute value greater than zero wavelet coefficient.
Whole compressed informations with each object in the described step 4) are kept in the index file, make that the storage order in the object indexed file is identical with the storage order of object in database.
Utilize whole compressed informations of each object in the described step 5), calculate the tightest upper bound of the probability in the query window that each object appears at the probability window query.
If the tightest upper bound of the probability of an object is less than the probability threshold value of probability window query appointment in the described step 6), then this object is defective object, will in this step, be fallen by beta pruning.
According to the not compressed fuzzy region information and the probability distribution function information of each candidate's object in the candidate answers set, calculate the accurate probability in the query window that each candidate's object appears at the probability window query in the described step 7); If the accurate probability of candidate's object is greater than the probability threshold value of probability window query appointment, then this candidate's object becomes final query result.
The beneficial effect that the present invention has is:
The present invention has made full use of the existing research of database and information retrieval and has realized achievement; Based on the expansion of existing compression method with merge the probability window query ability that fuzzy data can very conveniently be provided efficiently; And do not rely on the dimension of fuzzy data, be user's performance that offers the best.The present invention can be used for the management and the inquiry of various mass datas such as multidimensional sensor data, urban population census data and view data.
Description of drawings
Fig. 1 is a probability window query method synoptic diagram.
Fig. 2 is the synoptic diagram with grid dividing method compressed objects fuzzy region information.
Fig. 3 is the synoptic diagram with histogram method compressed objects probability distribution function information.
Embodiment
Below in conjunction with accompanying drawing and specific embodiment the present invention is described further.
Practical implementation process of the present invention and principle of work, as shown in Figure 1:
1) the fuzzy region information of object is compressed with the grid dividing method;
2) the probability distribution function information of object is compressed with the histogram method;
3) with step 2) in the information of histogram compress with small wave converting method;
4) the whole compressed informations with each object in step 1) and the step 3) are kept in the index file;
5) when query processing, utilize whole compressed informations of each object to calculate the upper bound that each object becomes the probability of Query Result;
6) utilize that the underproof object of bound pair carries out beta pruning on the probability of each object, thereby obtain a candidate answers set;
7) according to the not compressed fuzzy region information and the probability distribution function information of each candidate's object in the candidate answers set, judge whether each candidate's object is real Query Result.
As shown in Figure 2 in the step 1), each dimension of higher dimensional space is divided between several region, use a length to be b then iBit String come each interval in the mark dimension; Like this, i dimension has been divided into 2 BiIndividual interval.Make B equal the corresponding b of all dimensions iSum, then whole codomain spatial division becomes 2 BIndividual cell.And each cell can come mark with B bit value.For example, in Fig. 2, each cell can be by 6 bit value marks.Grid dividing well after, the bit value of correspondence with the cell at its place that four summits of a fuzzy region can be similar to is represented.
Step 2) as shown in Figure 3 in, adopt the histogram method that the probability distribution function information of object is compressed.The probability distribution function of a given object X, in each dimension, this method evenly is divided into H bucket with the fuzzy region of X in each dimension.Use the sequence S of a probability then X={ p 0, p 1..., p H 1Represent the probable value of X correspondence in each bucket of a histogram.For example, among Fig. 3, after with the histogram method probability distribution function information of object being compressed, obtained a probability sequence 0.07,0.05,0,0.2,0.1,0.1,0.3,0.18}.
In the step 3) with step 2) in the probability sequence S that obtains XCarry out the small echo variation with the Haar small echo, thereby obtain the sequence of a wavelet coefficient.The number and the S of coefficient in this wavelet coefficient sequence XThe number of middle probability is identical.In order to reach the purpose of compression, this method is deleted the null wavelet coefficient of all absolute values from the wavelet coefficient sequence.
The compressed information of each object becomes an independently directory entry in the step 4).Then, the directory entry that all objects are corresponding is stored in the middle of the index file.And the storage order in the directory entry indexed file of object is identical with the storage order of object in database.
Step 5) is each directory entry in the scanning index file one by one.According to the compressed information of object in the given query window of probability window query and each directory entry, this method is calculated the upper bound that each object is in the probability within the query window.Particularly, if the approximate fuzzy region of an object and query window do not occur simultaneously, then to be in the upper bound of the probability within the query window be zero to this object.Otherwise this method is calculated the tightest upper bound that this object is in the probability within the query window according to the compressed information of the probability distribution function of this object.
If the upper bound of the probability of an object is less than the probability threshold value of probability window query appointment in the step 6), then this object is defective object, will be fallen by beta pruning, can't in the query script of back, visit once more; If the upper bound of the probability of an object is more than or equal to the probability threshold value of probability window query appointment, then this object might become the result of inquiry, will be placed in the middle of the candidate answers set.
Step 7) is according to the not compressed fuzzy region information and the probability distribution function information of each candidate's object in the candidate answers set, calculates the accurate probability in the query window that each candidate's object appears at the probability window query; If the accurate probability of an object is more than or equal to the probability threshold value of probability window query appointment, then this object becomes final query result; If the accurate probability of an object is less than the probability threshold values of probability window query appointment, then this object is not a final query result.

Claims (8)

1. the probability window query method of fuzzy data in the higher-dimension environment is characterized in that, below the step of this method:
1) the fuzzy region information of object is compressed with the grid dividing method;
2) the probability distribution function information of object is compressed with the histogram method;
3) with step 2) in the information of histogram compress with small wave converting method;
4) the whole compressed informations with each object in step 1) and the step 3) are kept in the index file;
5) when query processing, utilize whole compressed informations of each object to calculate the upper bound that each object becomes the probability of Query Result;
6) utilize that the underproof object of bound pair carries out beta pruning on the probability of each object, thereby obtain a candidate answers set;
7) according to the not compressed fuzzy region information and the probability distribution function information of each candidate's object in the candidate answers set, judge whether each candidate's object is real Query Result.
2. the probability window query method of fuzzy data in a kind of higher-dimension environment according to claim 1; It is characterized in that: described step 1) utilizes the grid dividing method that the fuzzy region information of object is compressed, thereby uses bit value to represent the fuzzy region of object.
3. the probability window query method of fuzzy data in a kind of higher-dimension environment according to claim 1 is characterized in that: described step 2) utilize the histogram method that the probability distribution function information of object is compressed, obtain the sequence of a probability.
4. the probability window query method of fuzzy data in a kind of higher-dimension environment according to claim 1; It is characterized in that: in the described step 3) to step 2) in the sequence of the probability that obtains carry out wavelet transformation, then from all wavelet coefficients that obtain the deletion absolute value greater than zero wavelet coefficient.
5. the probability window query method of fuzzy data in a kind of higher-dimension environment according to claim 1; It is characterized in that: the whole compressed informations with each object in the described step 4) are kept in the index file, make that the storage order in the object indexed file is identical with the storage order of object in database.
6. the probability window query method of fuzzy data in a kind of higher-dimension environment according to claim 1; It is characterized in that: utilize whole compressed informations of each object in the described step 5), calculate the tightest upper bound of the probability in the query window that each object appears at the probability window query.
7. the probability window query method of fuzzy data in a kind of higher-dimension environment according to claim 1; It is characterized in that: if the tightest upper bound of the probability of an object is less than the probability threshold value of probability window query appointment in the described step 6); Then this object is defective object, will in this step, be fallen by beta pruning.
8. the probability window query method of fuzzy data in a kind of higher-dimension environment according to claim 1; It is characterized in that: according to the not compressed fuzzy region information and the probability distribution function information of each candidate's object in the candidate answers set, calculate the accurate probability in the query window that each candidate's object appears at the probability window query in the described step 7); If the accurate probability of candidate's object is greater than the probability threshold value of probability window query appointment, then this candidate's object becomes final query result.
CN2011104371365A 2011-12-23 2011-12-23 Window-based probability query method for fuzzy data in high-dimensional environment Pending CN102609439A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011104371365A CN102609439A (en) 2011-12-23 2011-12-23 Window-based probability query method for fuzzy data in high-dimensional environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011104371365A CN102609439A (en) 2011-12-23 2011-12-23 Window-based probability query method for fuzzy data in high-dimensional environment

Publications (1)

Publication Number Publication Date
CN102609439A true CN102609439A (en) 2012-07-25

Family

ID=46526814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011104371365A Pending CN102609439A (en) 2011-12-23 2011-12-23 Window-based probability query method for fuzzy data in high-dimensional environment

Country Status (1)

Country Link
CN (1) CN102609439A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105074724A (en) * 2013-01-15 2015-11-18 亚马逊科技公司 Efficient query processing using histograms in a columnar database
CN106528629A (en) * 2016-10-09 2017-03-22 深圳云天励飞技术有限公司 A vector fuzzy search method and system based on geometric space division

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101105801A (en) * 2007-04-20 2008-01-16 清华大学 Automatic positioning method of network key resource page
CN101436199A (en) * 2008-09-27 2009-05-20 复旦大学 Multiple-inquiry processing method of XML compressing data
CN101853287A (en) * 2010-05-24 2010-10-06 南京高普科技有限公司 Data compression quick retrieval file system and method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101105801A (en) * 2007-04-20 2008-01-16 清华大学 Automatic positioning method of network key resource page
CN101436199A (en) * 2008-09-27 2009-05-20 复旦大学 Multiple-inquiry processing method of XML compressing data
CN101853287A (en) * 2010-05-24 2010-10-06 南京高普科技有限公司 Data compression quick retrieval file system and method thereof

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105074724A (en) * 2013-01-15 2015-11-18 亚马逊科技公司 Efficient query processing using histograms in a columnar database
US9767174B2 (en) 2013-01-15 2017-09-19 Amazon Technologies, Inc. Efficient query processing using histograms in a columnar database
CN105074724B (en) * 2013-01-15 2017-12-05 亚马逊科技公司 Effective query processing is carried out using the histogram in columnar database
CN107766568A (en) * 2013-01-15 2018-03-06 亚马逊科技公司 Effective query processing is carried out using the histogram in columnar database
US10372723B2 (en) 2013-01-15 2019-08-06 Amazon Technologies, Inc. Efficient query processing using histograms in a columnar database
CN107766568B (en) * 2013-01-15 2021-11-26 亚马逊科技公司 Efficient query processing using histograms in columnar databases
CN106528629A (en) * 2016-10-09 2017-03-22 深圳云天励飞技术有限公司 A vector fuzzy search method and system based on geometric space division
CN106528629B (en) * 2016-10-09 2018-04-03 深圳云天励飞技术有限公司 A kind of vector based on geometric space division searches for method and system generally

Similar Documents

Publication Publication Date Title
Tao et al. Range aggregate processing in spatial databases
CN103366015B (en) A kind of OLAP data based on Hadoop stores and querying method
EP2263180B1 (en) Indexing large-scale gps tracks
EP1475725A3 (en) A system and method employing a grid index for location and precision encoding
CN102436465B (en) Telemetry data compression storage and rapid query method of ontrack spacecraft
CN106503223B (en) online house source searching method and device combining position and keyword information
Abdelguerfi et al. The 2-3TR-tree, a trajectory-oriented index structure for fully evolving valid-time spatio-temporal datasets
CN103488709A (en) Method and system for building indexes and method and system for retrieving indexes
EP1755054A3 (en) Method for intelligent browsing, storing, retrieving and indexing file structures of technical measurement data
CN102567497B (en) Inquiring method of best matching with fuzzy trajectory problems
CN108009265B (en) Spatial data indexing method in cloud computing environment
Xin et al. Computing iceberg cubes by top-down and bottom-up integration: The starcubing approach
CN102306202B (en) High-dimension vector rapid searching algorithm based on block distance
CN1924854B (en) Desktop searching method for intelligent mobile terminal
CN107273471A (en) A kind of binary electric power time series data index structuring method based on Geohash
CN108446357A (en) A kind of mass data spatial dimension querying method based on two-dimentional geographical location
CN103198157B (en) A kind of compression storage processing method of telluric electricity field data
CN112214485B (en) Power grid resource data organization planning method based on global subdivision grid
CN103473268B (en) Linear element spatial index structuring method, system and search method and system thereof
CN101692231B (en) Remote sensing image block sorting and storing method suitable for spatial query
CN105740428A (en) B+ tree-based high-dimensional disc indexing structure and image search method
CN109885570A (en) A kind of multi-Dimensional Range querying method of secondary index combination hash table
CN102609439A (en) Window-based probability query method for fuzzy data in high-dimensional environment
CN104102680A (en) Coding indexing mode for time sequences
Faloutsos et al. Analysis of n-dimensional quadtrees using the Hausdorff fractal dimension

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120725

WD01 Invention patent application deemed withdrawn after publication