CN103345491B - A kind of method applying Hash Hash division bucket quickly to obtain neighborhood - Google Patents

A kind of method applying Hash Hash division bucket quickly to obtain neighborhood Download PDF

Info

Publication number
CN103345491B
CN103345491B CN201310261081.6A CN201310261081A CN103345491B CN 103345491 B CN103345491 B CN 103345491B CN 201310261081 A CN201310261081 A CN 201310261081A CN 103345491 B CN103345491 B CN 103345491B
Authority
CN
China
Prior art keywords
neighborhood
hash
bucket
sample
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310261081.6A
Other languages
Chinese (zh)
Other versions
CN103345491A (en
Inventor
蒋云良
曾志勇
刘勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201310261081.6A priority Critical patent/CN103345491B/en
Publication of CN103345491A publication Critical patent/CN103345491A/en
Application granted granted Critical
Publication of CN103345491B publication Critical patent/CN103345491B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of quick neighborhood computational methods applying Hash division bucket to reduce neighborhood information particle search space.The method, according to the distance between sample record, utilizes the method for Hash that the set of sample record is divided into bucket, and in set, the search volume of the neighborhood information particle of any one sample record will be narrowed in adjacent three bucket.On this basis, deeply it has been observed that when the neighborhood space searching for sample record uses alternative manner, according to the symmetry principle of neighborhood system, the search volume of neighborhood information particle can be narrowed down in the range of two buckets further.The inventive method realizes the simplest, greatly reduces number of comparisons and the search volume of sample, and the process to processing big data possesses advantage.

Description

A kind of method applying Hash Hash division bucket quickly to obtain neighborhood
Technical field
The invention belongs to field of information processing, particularly relate to a kind of with two norm distances for tolerance, use Hash (Hash) divide bucket and reduce the method quickly obtaining neighborhood in neighborhood information particle search space.
Background technology
Along with developing rapidly and the extensive application of data base management system, the data of people's record of information technology More and more.The most much important information of data increased sharply, it is desirable to carry out higher to it The analysis of level, in order to better profit from these data.
T.Y.Lin proposed the concept of neighbourhood model in 1988, and he realizes domain by using spatial neighborhood Granulation, and spatial neighborhood is interpreted as basic information particle, then utilizes these essential information particles to retouch Other concepts in review territory.In 1998, Yao Yiyu taught and professor Wu Weizhi in 2002 to neighborhood The fundamental system character of operator and neighborhood system conducts in-depth research respectively.Yao discusses Granule Computing with thick Relation between the Data Mining Tools such as rough collection, the quotient space, and by using logic decision making language to describe grain Degree, builds the logical framework of granular world.Skowron also illustrates a language in the literature, and he is by information Information regarded as by the meaning collection of the logical formula defined on table, and discusses the syntax and semantics of this information. On the basis of these are studied, neighbourhood model is incorporated in rough set by Hu Qinghua, enters neighborhood rough set model Go detailed definition, and design can yojan name type, numeric type, the Algorithm for Reduction of mixed type data simultaneously.
Along with the explosion type of data increases, when using neighbourhood model to process big data, time efficiency just becomes The factor of overriding concern.How to reduce search and calculate neighborhood information particle time, be one considerable Problem.
Neighborhood has two kinds of definition methods: a kind of be the quantity by object contained in neighborhood depending on, such as classical k- Near neighbor method;Another kind is that the ultimate range according to centre of neighbourhood point in a certain tolerance to border is defined. Neighborhood involved in the present invention is the 2nd kind of method.
Nonempty finite set U={x on real number space1, x2, x3..., xn, for any object on U xi, its θ neighborhood is θ (xi)={ x ∈ U, Δ (x, xi)≤θ }, wherein, θ >=0, θ (xi) be referred to as by xiThe θ neighborhood information particle generated, is called for short xiNeighborhood particle, with regard to two dimension real number space for, based on 1 model Number, 2 norms and Infinite Norm neighborhood as it is shown on figure 3, respectively rhombus, circle and square area.Degree The character of amount has: (1)Because xi∈θ(xi);(2)xj∈θ(xi)→xi∈θ(xj); (3)Neighborhood information particle race { θ (xi) | i=1,2 ... n} constitutes a covering of U.
Neighborhood relationships N that neighborhood information particle race is guided out on the U of domain space, this relation can be by a pass It is that matrix is to represent M (N)=(rij)n×nIf, xj∈θ(xi), then rij=1 otherwise rij=0。
Summary of the invention
It is an object of the invention to a kind of method applying Hash Hash division bucket quickly to obtain neighborhood, to reduce Search and the time of calculating neighborhood information particle, it is achieved use neighbourhood model to process the rapidity of big data.To this end, The present invention is by the following technical solutions:
Specifically comprising the following steps that of the inventive method
A kind of method applying Hash Hash division bucket quickly to obtain neighborhood, it is characterised in that it includes walking as follows Rapid:
Step one, seeks the zero x of point bucket coordinate system0,
According to given neighborhood system NRS=<U, N, θ>, U is the set that whole sample record is constituted, N Representing neighborhood relationships, θ is the radius of neighbourhood;
Step 2, seeks the distance of sample,
ForSeek the distance between sample | | xi-x0||;
Step 3, according to the sample distance in step 2, sets up by Hash method and searches for bucket:
For &ForAll; x i &Element; U , i = 1,2 , &CenterDot; &CenterDot; &CenterDot; , n , CalculateK is nonnegative integer. Hash table is set up as hash Key using k.Set up hash table Sphere Measurement Model in space: by hash Key As the radius of sphere, hash table is exactly a series of mutually nested balls in space;Under a certain hash Key Sample, is in the space between the sphere with hash Key as radius and the hashKey-1 sphere as radius, Space between adjacent sphere is referred to as described bucket, B1, B2..., Bb, for by b hash Key value as radius gained B bucket.
Step 4, it is thus achieved that neighborhood:
Bucket Bk-1, Bk, Bk+1Interior record, it is thus achieved that the neighborhood of sample x.
Using on the basis of technique scheme, the present invention also can use technical scheme further below:
x0Take initial point or in N minima composition a characteristic vector.
When searching for sample record acquisition neighborhood in bucket, use alternative manner, it is only necessary to search Bk, Bk+1The neighborhood obtaining sample x sought in record in Tong.
The present invention, according to the distance between sample record, utilizes the method for Hash the set of sample record to be divided into Bucket, any one sample record x in setiThe search volume of neighborhood information particle will be narrowed to adjacent three Individual bucket Bk-1, Bk, Bk+1In.On this basis, deeply it has been observed that the neighborhood working as search sample record is empty Between when using alternative manner, according to the symmetry principle of neighborhood system, can searching neighborhood information particle Rope space narrows down to two bucket B furtherk, Ek+1In the range of.
The method of the present invention can obtain different point bucket effects according to size θ of different neighborhoods.Along with θ value Becoming big, the quantity of bucket will reduce, but the quantity of sample record contained in each barrel can increase, when a point bucket In the case of being continuous print, when calculating neighborhood information particle, the amplitude of the reduced space of search will reduce, point bucket institute The effect brought will weaken, and divided bucket is in discrete when, owing to adjacent bucket may not be there is, At this moment search volume will significantly reduce, the effect reducing neighborhood information particle search space that point bucket is brought Fruit strengthens.Along with θ value diminishes, the quantity of bucket will become and is gradually increased, and sample size contained in bucket will subtract Few, being so likely to result in the quantity of information included in each neighborhood information particle will tail off.
According to the above, according to the different situations of neighborhood system NRS=<U, N, θ>, suitable by selecting The θ value closed, gives full play to the effect reducing neighborhood information particle search space that point bucket is brought, and minimizing is searched Rope and the time of calculating neighborhood information particle.
Accompanying drawing explanation
Fig. 1 be point bucket in the inventive method be continuously in the case of schematic diagram.
Fig. 2 be point bucket in the inventive method be the schematic diagram under discrete case.
In Fig. 3 is two dimension real number space, based on 1 norm, 2 norms and the neighborhood of Infinite Norm.
Detailed description of the invention
In order to be better understood from technical scheme, it is further described below in conjunction with drawings and Examples.
Step one, seeks the zero x of point bucket coordinate system0:
According to given neighborhood system NRS=<U, N, θ>, seek the zero x of point bucket coordinate system0.Take N=CUD, then neighborhood system NRS=<U, N, θ>becomes neighborhood decision system NRS=<U, C ∪ D, θ>, x0Take minimum property value one characteristic vector of composition of each attribute.Sample Set U={x1,x2,x3..., xn, the set C={a of sample attribute1, a2..., am), the i-th of sample Individual attribute ai, sample decision attribute set D,
Then base is
x0=(min{xl(a1), x2(a1) ... xn(a1),
min{x1(a2), x2(a2) ... xn(a2) ..., min{x1(a1), x2(a1) ... xn(a1)})
Step 2, seeks the distance of sample:
ForSeek the distance between sample | | xi-x0||;
| | x i - x 0 | | = [ x i ( a 1 ) - x 0 ( a 1 ) ] 2 + [ x i ( a 2 ) - x 0 ( a 2 ) ] 2 + &CenterDot; &CenterDot; &CenterDot; [ x i ( a m ) - x 0 ( a m ) ] 2
Step 3, according to the distance in step 2, Hash method sets up bucket:
For &ForAll; x i &Element; U , i = 1,2 , &CenterDot; &CenterDot; &CenterDot; , n , CalculateK is nonnegative integer, with K sets up hash table as hash Key.Using hash Key as the radius of sphere, hash table in space just It is a series of mutually nested balls.Sample under a certain hash Key, is in the ball with hash Key as radius In space between face and the sphere that hashKey-1 is radius, the space between adjacent sphere is referred to as described bucket. B1, B2..., BbFor by b hash Key value as b bucket of radius gained.Then the part of records in U exists As shown in Figure 1 and Figure 2, Fig. 1 gives a kind of more special situation in the distribution in space, i.e. k is continuous print; And it practice, for specific data, k is probably discontinuous, i.e. shown in Fig. 2.
Step 4, it is thus achieved that neighborhood:
Search bucket Bk-1, Bk, Bk+1Interior record, seeks its neighborhood.When sample note in search bucket When record solves neighborhood, use alternative manner, due to the symmetry of neighborhood system, it is only necessary to search Bk, Bk+1Record in Tong solves its neighborhood.As shown in fig. 1, to the sample in the bucket of k=2, The when of solving neighborhood, can only search for the sample in k=2 Yu k=3 bucket.As in figure 2 it is shown, at bucket not In the case of continuous print, for the sample in k=2 bucket, owing to being empty, in its bucket in the k=3 bucket that it is adjacent The solution room of sample neighborhood reduces.
It is O (n), n=| U | that application hash sets up the time complexity of bucket.Sample attribute set C={a1, a2..., amThe number of the sample attribute in } is m, and the number of point bucket of foundation is b, at sample In the case of being evenly distributed in each bucket, the complexity that neighborhood calculates isAt b convergence In the case of | u |, the complexity that neighborhood calculates will level off to O (m | U |).

Claims (3)

1. the application Hash Hash processed for big data divides the method that bucket quickly obtains neighborhood, and it is special Levy and be that it comprises the steps:
Step one, seeks the zero x of point bucket coordinate system0,
In neighborhood system NRS=<U, N, θ>, U is the set that whole sample record is constituted, and N represents neighborhood Relation, θ is the radius of neighbourhood;
Step 2, seeks the distance of sample,
ForN is number of samples, seeks the distance between sample | | xi-x0||;
Step 3, according to the sample distance in step 2, sets up by Hash method and searches for bucket:
ForCalculateK is nonnegative integer, Hash table is set up as hash Key using k;Set up hash table Sphere Measurement Model in space: by hash Key As the radius of sphere, hash table is exactly a series of mutually nested balls in space;Under a certain hash Key Sample, is in the space between the sphere with hash Key as radius and the hashKey-1 sphere as radius, Space between adjacent sphere is referred to as bucket, B1, B2..., Bb, for by b hash Key value as the b of radius gained Individual bucket;
Step 4, it is thus achieved that neighborhood:
Search bucket Bk-1,Bk,Bk+1In record obtain sample x neighborhood.
A kind of application Hash Hash processed for big data divides bucket and quickly obtains The method obtaining neighborhood, it is characterised in that x0Take initial point or in N minima composition a characteristic vector.
A kind of application Hash Hash processed for big data divides bucket and quickly obtains The method obtaining neighborhood, it is characterised in that when searching for sample record acquisition neighborhood in bucket, use alternative manner, Have only to search for Bk, Bk+1Record in Tong obtains the neighborhood of sample x.
CN201310261081.6A 2013-06-26 2013-06-26 A kind of method applying Hash Hash division bucket quickly to obtain neighborhood Active CN103345491B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310261081.6A CN103345491B (en) 2013-06-26 2013-06-26 A kind of method applying Hash Hash division bucket quickly to obtain neighborhood

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310261081.6A CN103345491B (en) 2013-06-26 2013-06-26 A kind of method applying Hash Hash division bucket quickly to obtain neighborhood

Publications (2)

Publication Number Publication Date
CN103345491A CN103345491A (en) 2013-10-09
CN103345491B true CN103345491B (en) 2016-11-23

Family

ID=49280286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310261081.6A Active CN103345491B (en) 2013-06-26 2013-06-26 A kind of method applying Hash Hash division bucket quickly to obtain neighborhood

Country Status (1)

Country Link
CN (1) CN103345491B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125892B (en) * 2019-12-12 2021-10-12 北京科技大学 Data storage and indexing method and system for molecular dynamics simulation program
CN114490011B (en) * 2020-11-12 2024-06-18 上海交通大学 Parallel acceleration realization method of N-body simulation in heterogeneous architecture

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020321A (en) * 2013-01-11 2013-04-03 广东图图搜网络科技有限公司 Neighbor searching method and neighbor searching system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020321A (en) * 2013-01-11 2013-04-03 广东图图搜网络科技有限公司 Neighbor searching method and neighbor searching system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
异构平台上多维线性哈希的研究;刘勇等;《计算机科学》;20121031;第39卷(第10期);157-159,163 *

Also Published As

Publication number Publication date
CN103345491A (en) 2013-10-09

Similar Documents

Publication Publication Date Title
CN105912643A (en) Image retrieval method based on content improved Average Hash
CN102819733B (en) Rapid detection fuzzy method of face in street view image
Fan et al. Iterative particle filter for visual tracking
CN108734187A (en) A kind of multiple view spectral clustering based on tensor singular value decomposition
CN107623639A (en) Data flow distribution similarity join method based on EMD distances
Wang et al. Forest fire detection based on lightweight Yolo
Zhao et al. Content based image retrieval scheme using color, texture and shape features
Liu et al. Object proposal on RGB-D images via elastic edge boxes
CN103399863B (en) Image search method based on the poor characteristic bag of edge direction
CN103345491B (en) A kind of method applying Hash Hash division bucket quickly to obtain neighborhood
Wang et al. Toward structural learning and enhanced YOLOv4 network for object detection in optical remote sensing images
Wang et al. Material-aware Cross-channel Interaction Attention (MCIA) for occluded prohibited item detection
CN104933080B (en) A kind of method and device of determining abnormal data
Panigrahi et al. MS-ML-SNYOLOv3: A robust lightweight modification of SqueezeNet based YOLOv3 for pedestrian detection
Guo et al. Lightweight SSD: Real-time Lightweight Single Shot Detector for Mobile Devices.
Wang et al. Calyolov4: lightweight yolov4 target detection based on coordinated attention
Zhang et al. Real-time detector design for small targets based on bi-channel feature fusion mechanism
Chan et al. Rotating object detection in remote-sensing environment
Hou et al. A detection method for the ridge beast based on improved YOLOv3 algorithm
CN114792397A (en) SAR image urban road extraction method, system and storage medium
Mao et al. Mapping Whole DNA Sequence on Variant Maps
Baek et al. Bayesian learning of a search region for pedestrian detection
Liu et al. Performance analysis of different DCNN models in remote sensing image object detection
Wu et al. Dualray: Dual-View X-ray Security Inspection Benchmark and Fusion Detection Framework
Anh Segmentation by incremental clustering

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant