CN103345491A - Method for quickly obtaining neighborhood by the utilization of Hash dividing barrels - Google Patents
Method for quickly obtaining neighborhood by the utilization of Hash dividing barrels Download PDFInfo
- Publication number
- CN103345491A CN103345491A CN2013102610816A CN201310261081A CN103345491A CN 103345491 A CN103345491 A CN 103345491A CN 2013102610816 A CN2013102610816 A CN 2013102610816A CN 201310261081 A CN201310261081 A CN 201310261081A CN 103345491 A CN103345491 A CN 103345491A
- Authority
- CN
- China
- Prior art keywords
- neighborhood
- hash
- bucket
- sample
- space
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a quick neighborhood calculation method with Hash dividing barrels being used for reducing neighborhood information particle searching space. According to the method, sample record sets are divided into barrels through the Hash method and according to the distances between sample records, and the searching space of neighborhood information particles of any sample record in the sets can be reduced to be within three adjacent barrels. Based on that and according to further observation, when the interactive method is adopted to search neighborhood space of sample records, the searching space of the adjacent information particles can be further reduced to be within two barrels according to the symmetry principle of a neighborhood system. The method is flexible and simple to realize, reduces comparing times and searching space of samples to a large degree and has advantages in processing large data.
Description
Technical field
The invention belongs to field of information processing, relating in particular to a kind of is tolerance with two norms distance, adopts Hash (Hash) to divide the method that bucket dwindles the quick acquisition neighborhood in neighborhood information particle search space.
Background technology
Along with developing rapidly and the widespread use of data base management system (DBMS) of infotech, the data of people's record are more and more.The data of increasing sharply are many important information under cover behind, and people wish and can carry out higher level analysis to it, in order to utilize these data better.
The concept that T.Y.Lin proposed neighbourhood model in 1988, he realizes granulation to domain by the usage space neighborhood, and spatial neighborhood is interpreted as basic information particle, then utilizes these essential information particles to describe other concepts in the domain.In 1998, Yao Yiyu taught and professor Wu Weizhi in 2002 the ultimate system character of neighborhood operator and neighborhood system has been carried out deep research respectively.Yao has discussed the relation between grain calculating and the Data Mining Tools such as rough set, the quotient space, and by adopting the logical decision language to describe granularity, the logical framework in the structure granularity world.Skowron has also described a grain language in the literature, and he regards the meaning collection of the logical formula that defines on the information table as information, and the syntax and semantics of this information has been discussed.On the basis of these researchs, Hu Qinghua is incorporated into neighbourhood model in the rough set, and the neighborhood rough set model has been carried out detailed definition, and designs the Algorithm for Reduction of the name of yojan simultaneously type, numeric type, mixed type data.
Along with the explosion type increase of data, when using neighbourhood model to handle big data, time efficiency just becomes the factor of overriding concern.The time that how to reduce search and calculate the neighborhood information particle is a considerable problem.
Neighborhood has two kinds of define methods: a kind of is to be decided by the quantity of contained object in the neighborhood, as the k-near neighbor method of classics; Another kind is to define according to the ultimate range that the centre of neighbourhood on a certain tolerance is put the border.Neighborhood involved in the present invention is the 2nd kind of method.
Nonempty finite set on the real number space is closed U={x
1, x
2, x
3..., x
n, for any object x on the U
i, its θ neighborhood is θ (x
i)={ x ∈ U, Δ (x, x
i)≤θ }, wherein, θ 〉=0, θ (x
i) be called by x
iThe θ neighborhood information particle that generates is called for short x
iNeighborhood particle, with regard to two-dimentional real number space, based on the neighborhood of 1 norm, 2 norms and infinite norm as shown in Figure 3, be respectively rhombus, circle and square area.The character of tolerance has: (1)
Because x
i∈ θ (x
i); (2) x
j∈ θ (x
i) → x
i∈ θ (x
j); (3)
{ θ (the x of neighborhood information particle family
i) | i=1,2 ... n} constitutes the covering of U.
Neighborhood information particle family is guided out a neighborhood relationships N on the U of domain space, and this relation can be represented M (N)=(r by a relational matrix
Ij)
N * nIf, x
j∈ θ (x
i), r then
Ij=1 otherwise r
Ij=0.
Summary of the invention
The objective of the invention is to a kind of Hash Hash that uses and divide the method that bucket obtains neighborhood fast, with the time that reduces search and calculate the neighborhood information particle, realize using neighbourhood model to handle the rapidity of big data.For this reason, the present invention is by the following technical solutions:
The concrete steps of the inventive method are as follows:
A kind of Hash Hash that uses divides the method that bucket obtains neighborhood fast, it is characterized in that it comprises the steps:
According to given neighborhood system NRS=<U, N, θ 〉, U is the set that whole sample record constitute, and N represents neighborhood relationships, and θ is the radius of neighbourhood;
For
Calculate
K is nonnegative integer.Set up the hash table with k as hash Key.Set up the sphere model of hash table in the space: with the radius of hash Key as sphere, the hash table is exactly a series of mutually nested balls in the space; Sample under a certain hash Key, being in hash Key is that the space between adjacent sphere is called described bucket, B in the space between the sphere of radius and the sphere that hashKey-1 is radius
1, B
2..., B
b, for by b hash Key value as b of radius gained bucket.
Bucket B
K-1, B
k, B
K+1Interior record, the neighborhood of acquisition sample x.
On the basis of adopting technique scheme, the present invention also can adopt following further technical scheme:
x
0Get initial point or minimum value is formed in N a proper vector.
When sample record obtains neighborhood in the search bucket, employing be alternative manner, only need search B
k, B
K+1The neighborhood that obtains sample x asked in record in the bucket.
The present invention is according to the distance between sample record, and the method for utilizing Hash is divided into bucket, any one sample record x in the set with the set of sample record
iThe search volume of neighborhood information particle with reduced to three adjacent bucket B
K-1, B
k, B
K+1In.On this basis, deeply observe and find, when the neighborhood space of search sample record adopt be alternative manner the time, according to the symmetry principle of neighborhood system, the search volume of neighborhood information particle further can be narrowed down to two bucket B
k, E
K+1Scope in.
Method of the present invention can obtain different branch bucket effects according to the big or small θ of different neighborhoods.Along with the θ value becomes big, the quantity of bucket will reduce, but the quantity of contained sample record can increase in each barrel, under minute bucket was continuous situation, the amplitude that the space of searching for when calculating the neighborhood information particle dwindles will reduce, and an effect of dividing bucket to bring will weaken, and divide the bucket be in discrete in, owing to may there not be adjacent bucket, search volume at this moment will reduce significantly, and the effect of dwindling neighborhood information particle search space of dividing bucket to bring strengthens.Along with the θ value diminishes, the quantity of bucket will become gradually and increase, and contained sample size will reduce in the bucket, may cause the quantity of information that comprises in each neighborhood information particle to tail off like this.
According to this above, according to neighborhood system NRS=<U, N, θ〉different situations, by the θ value of selecting to be fit to, give full play to the effect of dwindling neighborhood information particle search space that the branch bucket brings, the time that reduces search and calculate the neighborhood information particle.
Description of drawings
Fig. 1 is that the branch bucket in the inventive method is the synoptic diagram under the continuous situation.
Fig. 2 is that the branch bucket in the inventive method is the synoptic diagram under the discrete case.
Fig. 3 is in the two-dimentional real number space, based on the neighborhood of 1 norm, 2 norms and infinite norm.
Embodiment
Technical scheme for a better understanding of the present invention is further described below in conjunction with drawings and Examples.
According to given neighborhood system NRS=<U, N, θ 〉, ask the true origin x of branch bucket coordinate system
0Get N=CUD, neighborhood system NRS=<U then, N, θ〉become neighborhood decision system NRS=<U, C ∪ D, θ 〉, x
0Get the minimum property value of each attribute and form a proper vector.Sample set U={x
1, x
2, x
3..., x
n, the set C={a of sample attribute
1, a
2..., a
m), i attribute a of sample
i, sample decision attribute set D,
Then base is
x
0=(min{x
l(a
1),x
2(a
1),…x
n(a
1)},
min{x
1(a
2),x
2(a
2),…x
n(a
2)},…,min{x
1(a
1),x
2(a
1),…x
n(a
1)})
For
Calculate
K is nonnegative integer, sets up the hash table with k as hash Key.With the radius of hash Key as sphere, the hash table is exactly a series of mutually nested balls in the space.Sample under a certain hash Key, being in hash Key is that the space between adjacent sphere is called described bucket in the space between the sphere of radius and the sphere that hashKey-1 is radius.B
1, B
2..., B
bFor by b hash Key value as b of radius gained bucket.Then the part of records among the U in spatial distributions as shown in Figure 1 and Figure 2, Fig. 1 has provided a kind of more special situation, namely k is continuous; And in fact, for specific data, k may be discontinuous, and is namely shown in Figure 2.
Search bucket B
K-1, B
k, B
K+1Its neighborhood asked in interior record.When sample record is found the solution neighborhood in the search bucket, employing be alternative manner because the symmetry of neighborhood system only needs search B
k, B
K+1Its neighborhood found the solution in record in the bucket.As shown in fig. 1, to the sample in the bucket of k=2, when finding the solution neighborhood, can only search for the sample in k=2 and the k=3 bucket.As shown in Figure 2, under the discontinuous situation of bucket, for the sample in the k=2 bucket, owing to be empty in its adjacent k=3 bucket, the space of finding the solution of sample neighborhood has dwindled in its barrel.
The time complexity of using hash foundation bucket is O (n), n=|U|.Sample attribute set C={a
1, a
2..., a
mIn the number of sample attribute be m, the number of the branch bucket of foundation is b, under sample was distributed to situation in each barrel equably, the complexity that neighborhood calculates was
Leveling off at b | under the situation of u|, the complexity that neighborhood calculates will level off to O (m|U|).
Claims (3)
1. use the method that Hash Hash divides the quick acquisition of bucket neighborhood for one kind, it is characterized in that it comprises the steps:
Step 1 is asked the true origin x of branch bucket coordinate system
0,
According to given neighborhood system NRS=<U, N, θ 〉, U is the set that whole sample record constitute, and N represents neighborhood relationships, and θ is the radius of neighbourhood;
Step 2 is asked the distance of sample,
Step 3, according to the sample distance in the step 2, set up the search bucket with the Hash method:
For
Calculate
K is nonnegative integer.Set up the hash table with k as hash Key.Set up the sphere model of hash table in the space: with the radius of hash Key as sphere, the hash table is exactly a series of mutually nested balls in the space; Sample under a certain hash Key, being in hash Key is that the space between adjacent sphere is called described bucket, B in the space between the sphere of radius and the sphere that hashKey-1 is radius
1, B
2..., B
b, for by b hash Key value as b of radius gained bucket.
Step 4 obtains neighborhood:
2. a kind of Hash Hash that uses as claimed in claim 1 divides the method that bucket obtains neighborhood fast, it is characterized in that x
0Get initial point or minimum value is formed in N a proper vector.
3. a kind of Hash Hash that uses as claimed in claim 1 divides the method that bucket obtains neighborhood fast, it is characterized in that when sample record obtains neighborhood in the search bucket, employing be alternative manner, only need search for B
k, B
K+1Record in the bucket obtains the neighborhood of sample x.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310261081.6A CN103345491B (en) | 2013-06-26 | 2013-06-26 | A kind of method applying Hash Hash division bucket quickly to obtain neighborhood |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310261081.6A CN103345491B (en) | 2013-06-26 | 2013-06-26 | A kind of method applying Hash Hash division bucket quickly to obtain neighborhood |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103345491A true CN103345491A (en) | 2013-10-09 |
CN103345491B CN103345491B (en) | 2016-11-23 |
Family
ID=49280286
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310261081.6A Active CN103345491B (en) | 2013-06-26 | 2013-06-26 | A kind of method applying Hash Hash division bucket quickly to obtain neighborhood |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103345491B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111125892A (en) * | 2019-12-12 | 2020-05-08 | 北京科技大学 | Data storage and indexing method and system for molecular dynamics simulation program |
CN114490011A (en) * | 2020-11-12 | 2022-05-13 | 上海交通大学 | Parallel acceleration implementation method of N-body simulation in heterogeneous architecture |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103020321A (en) * | 2013-01-11 | 2013-04-03 | 广东图图搜网络科技有限公司 | Neighbor searching method and neighbor searching system |
-
2013
- 2013-06-26 CN CN201310261081.6A patent/CN103345491B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103020321A (en) * | 2013-01-11 | 2013-04-03 | 广东图图搜网络科技有限公司 | Neighbor searching method and neighbor searching system |
Non-Patent Citations (1)
Title |
---|
刘勇等: "异构平台上多维线性哈希的研究", 《计算机科学》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111125892A (en) * | 2019-12-12 | 2020-05-08 | 北京科技大学 | Data storage and indexing method and system for molecular dynamics simulation program |
CN111125892B (en) * | 2019-12-12 | 2021-10-12 | 北京科技大学 | Data storage and indexing method and system for molecular dynamics simulation program |
CN114490011A (en) * | 2020-11-12 | 2022-05-13 | 上海交通大学 | Parallel acceleration implementation method of N-body simulation in heterogeneous architecture |
Also Published As
Publication number | Publication date |
---|---|
CN103345491B (en) | 2016-11-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Esteves et al. | Learning so (3) equivariant representations with spherical cnns | |
US9613055B2 (en) | Querying spatial data in column stores using tree-order scans | |
Xu et al. | Taxi-RS: Taxi-hunting recommendation system based on taxi GPS data | |
Zhang et al. | Parallel online spatial and temporal aggregations on multi-core CPUs and many-core GPUs | |
CN104202816B (en) | Extensive node positioning method of the 3D wireless sensor networks based on convex division | |
Campora et al. | St-toolkit: A framework for trajectory data warehousing | |
Ghosh et al. | Traj-cloud: a trajectory cloud for enabling efficient mobility services | |
CN103345491A (en) | Method for quickly obtaining neighborhood by the utilization of Hash dividing barrels | |
Teng et al. | IDEAL: a vector-raster hybrid model for efficient spatial queries over complex polygons | |
Karim et al. | Spatiotemporal Aspects of Big Data. | |
Brakatsoulas et al. | Practical data management techniques for vehicle tracking data | |
Alvanaki et al. | GIS navigation boosted by column stores | |
Eldawy et al. | The era of big spatial data: Challenges and opportunities | |
Carniel | Spatial information retrieval in digital ecosystems: A comprehensive survey | |
Pant | Performance comparison of spatial indexing structures for different query types | |
Schön et al. | Storage, manipulation, and visualization of LiDAR data | |
Singh et al. | Strategies for geographical scoping and improving a gazetteer | |
Jin et al. | The research progress of spatial data mining technique | |
Lisowski et al. | Tools for the Storage and Analysis of Spatial Big Data | |
Lin et al. | A new directional query method for polygon dataset in spatial database | |
Schoier et al. | A clustering method for large spatial databases | |
Dong et al. | Processing probabilistic range queries over Gaussian-based uncertain data | |
Kontopoulos et al. | Benchmarking moving object functionalities of DBMSs using real-world spatiotemporal workload | |
Kufer | Effective and Efficient Summarization of Two-Dimensional Point Data: Approaches for Resource Description and Selection in Spatial Application Scenarios | |
Afshani et al. | (Approximate) uncertain skylines |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |