CN103345491B - A kind of method applying Hash Hash division bucket quickly to obtain neighborhood - Google Patents
A kind of method applying Hash Hash division bucket quickly to obtain neighborhood Download PDFInfo
- Publication number
- CN103345491B CN103345491B CN201310261081.6A CN201310261081A CN103345491B CN 103345491 B CN103345491 B CN 103345491B CN 201310261081 A CN201310261081 A CN 201310261081A CN 103345491 B CN103345491 B CN 103345491B
- Authority
- CN
- China
- Prior art keywords
- neighborhood
- hash
- bucket
- sample
- space
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000005259 measurement Methods 0.000 claims description 2
- 239000002245 particle Substances 0.000 abstract description 19
- 238000000205 computational method Methods 0.000 abstract 1
- 230000000694 effects Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007418 data mining Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000008187 granular material Substances 0.000 description 1
- 238000005469 granulation Methods 0.000 description 1
- 230000003179 granulation Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of quick neighborhood computational methods applying Hash division bucket to reduce neighborhood information particle search space.The method, according to the distance between sample record, utilizes the method for Hash that the set of sample record is divided into bucket, and in set, the search volume of the neighborhood information particle of any one sample record will be narrowed in adjacent three bucket.On this basis, deeply it has been observed that when the neighborhood space searching for sample record uses alternative manner, according to the symmetry principle of neighborhood system, the search volume of neighborhood information particle can be narrowed down in the range of two buckets further.The inventive method realizes the simplest, greatly reduces number of comparisons and the search volume of sample, and the process to processing big data possesses advantage.
Description
Technical field
The invention belongs to field of information processing, particularly relate to a kind of with two norm distances for tolerance, use Hash
(Hash) divide bucket and reduce the method quickly obtaining neighborhood in neighborhood information particle search space.
Background technology
Along with developing rapidly and the extensive application of data base management system, the data of people's record of information technology
More and more.The most much important information of data increased sharply, it is desirable to carry out higher to it
The analysis of level, in order to better profit from these data.
T.Y.Lin proposed the concept of neighbourhood model in 1988, and he realizes domain by using spatial neighborhood
Granulation, and spatial neighborhood is interpreted as basic information particle, then utilizes these essential information particles to retouch
Other concepts in review territory.In 1998, Yao Yiyu taught and professor Wu Weizhi in 2002 to neighborhood
The fundamental system character of operator and neighborhood system conducts in-depth research respectively.Yao discusses Granule Computing with thick
Relation between the Data Mining Tools such as rough collection, the quotient space, and by using logic decision making language to describe grain
Degree, builds the logical framework of granular world.Skowron also illustrates a language in the literature, and he is by information
Information regarded as by the meaning collection of the logical formula defined on table, and discusses the syntax and semantics of this information.
On the basis of these are studied, neighbourhood model is incorporated in rough set by Hu Qinghua, enters neighborhood rough set model
Go detailed definition, and design can yojan name type, numeric type, the Algorithm for Reduction of mixed type data simultaneously.
Along with the explosion type of data increases, when using neighbourhood model to process big data, time efficiency just becomes
The factor of overriding concern.How to reduce search and calculate neighborhood information particle time, be one considerable
Problem.
Neighborhood has two kinds of definition methods: a kind of be the quantity by object contained in neighborhood depending on, such as classical k-
Near neighbor method;Another kind is that the ultimate range according to centre of neighbourhood point in a certain tolerance to border is defined.
Neighborhood involved in the present invention is the 2nd kind of method.
Nonempty finite set U={x on real number space1, x2, x3..., xn, for any object on U
xi, its θ neighborhood is θ (xi)={ x ∈ U, Δ (x, xi)≤θ }, wherein, θ >=0, θ (xi) be referred to as by
xiThe θ neighborhood information particle generated, is called for short xiNeighborhood particle, with regard to two dimension real number space for, based on 1 model
Number, 2 norms and Infinite Norm neighborhood as it is shown on figure 3, respectively rhombus, circle and square area.Degree
The character of amount has: (1)Because xi∈θ(xi);(2)xj∈θ(xi)→xi∈θ(xj);
(3)Neighborhood information particle race { θ (xi) | i=1,2 ... n} constitutes a covering of U.
Neighborhood relationships N that neighborhood information particle race is guided out on the U of domain space, this relation can be by a pass
It is that matrix is to represent M (N)=(rij)n×nIf, xj∈θ(xi), then rij=1 otherwise rij=0。
Summary of the invention
It is an object of the invention to a kind of method applying Hash Hash division bucket quickly to obtain neighborhood, to reduce
Search and the time of calculating neighborhood information particle, it is achieved use neighbourhood model to process the rapidity of big data.To this end,
The present invention is by the following technical solutions:
Specifically comprising the following steps that of the inventive method
A kind of method applying Hash Hash division bucket quickly to obtain neighborhood, it is characterised in that it includes walking as follows
Rapid:
Step one, seeks the zero x of point bucket coordinate system0,
According to given neighborhood system NRS=<U, N, θ>, U is the set that whole sample record is constituted, N
Representing neighborhood relationships, θ is the radius of neighbourhood;
Step 2, seeks the distance of sample,
ForSeek the distance between sample | | xi-x0||;
Step 3, according to the sample distance in step 2, sets up by Hash method and searches for bucket:
For CalculateK is nonnegative integer.
Hash table is set up as hash Key using k.Set up hash table Sphere Measurement Model in space: by hash Key
As the radius of sphere, hash table is exactly a series of mutually nested balls in space;Under a certain hash Key
Sample, is in the space between the sphere with hash Key as radius and the hashKey-1 sphere as radius,
Space between adjacent sphere is referred to as described bucket, B1, B2..., Bb, for by b hash Key value as radius gained
B bucket.
Step 4, it is thus achieved that neighborhood:
Bucket Bk-1, Bk, Bk+1Interior record, it is thus achieved that the neighborhood of sample x.
Using on the basis of technique scheme, the present invention also can use technical scheme further below:
x0Take initial point or in N minima composition a characteristic vector.
When searching for sample record acquisition neighborhood in bucket, use alternative manner, it is only necessary to search
Bk, Bk+1The neighborhood obtaining sample x sought in record in Tong.
The present invention, according to the distance between sample record, utilizes the method for Hash the set of sample record to be divided into
Bucket, any one sample record x in setiThe search volume of neighborhood information particle will be narrowed to adjacent three
Individual bucket Bk-1, Bk, Bk+1In.On this basis, deeply it has been observed that the neighborhood working as search sample record is empty
Between when using alternative manner, according to the symmetry principle of neighborhood system, can searching neighborhood information particle
Rope space narrows down to two bucket B furtherk, Ek+1In the range of.
The method of the present invention can obtain different point bucket effects according to size θ of different neighborhoods.Along with θ value
Becoming big, the quantity of bucket will reduce, but the quantity of sample record contained in each barrel can increase, when a point bucket
In the case of being continuous print, when calculating neighborhood information particle, the amplitude of the reduced space of search will reduce, point bucket institute
The effect brought will weaken, and divided bucket is in discrete when, owing to adjacent bucket may not be there is,
At this moment search volume will significantly reduce, the effect reducing neighborhood information particle search space that point bucket is brought
Fruit strengthens.Along with θ value diminishes, the quantity of bucket will become and is gradually increased, and sample size contained in bucket will subtract
Few, being so likely to result in the quantity of information included in each neighborhood information particle will tail off.
According to the above, according to the different situations of neighborhood system NRS=<U, N, θ>, suitable by selecting
The θ value closed, gives full play to the effect reducing neighborhood information particle search space that point bucket is brought, and minimizing is searched
Rope and the time of calculating neighborhood information particle.
Accompanying drawing explanation
Fig. 1 be point bucket in the inventive method be continuously in the case of schematic diagram.
Fig. 2 be point bucket in the inventive method be the schematic diagram under discrete case.
In Fig. 3 is two dimension real number space, based on 1 norm, 2 norms and the neighborhood of Infinite Norm.
Detailed description of the invention
In order to be better understood from technical scheme, it is further described below in conjunction with drawings and Examples.
Step one, seeks the zero x of point bucket coordinate system0:
According to given neighborhood system NRS=<U, N, θ>, seek the zero x of point bucket coordinate system0.Take
N=CUD, then neighborhood system NRS=<U, N, θ>becomes neighborhood decision system
NRS=<U, C ∪ D, θ>, x0Take minimum property value one characteristic vector of composition of each attribute.Sample
Set U={x1,x2,x3..., xn, the set C={a of sample attribute1, a2..., am), the i-th of sample
Individual attribute ai, sample decision attribute set D,
Then base is
x0=(min{xl(a1), x2(a1) ... xn(a1),
min{x1(a2), x2(a2) ... xn(a2) ..., min{x1(a1), x2(a1) ... xn(a1)})
Step 2, seeks the distance of sample:
ForSeek the distance between sample | | xi-x0||;
Step 3, according to the distance in step 2, Hash method sets up bucket:
For CalculateK is nonnegative integer, with
K sets up hash table as hash Key.Using hash Key as the radius of sphere, hash table in space just
It is a series of mutually nested balls.Sample under a certain hash Key, is in the ball with hash Key as radius
In space between face and the sphere that hashKey-1 is radius, the space between adjacent sphere is referred to as described bucket.
B1, B2..., BbFor by b hash Key value as b bucket of radius gained.Then the part of records in U exists
As shown in Figure 1 and Figure 2, Fig. 1 gives a kind of more special situation in the distribution in space, i.e. k is continuous print;
And it practice, for specific data, k is probably discontinuous, i.e. shown in Fig. 2.
Step 4, it is thus achieved that neighborhood:
Search bucket Bk-1, Bk, Bk+1Interior record, seeks its neighborhood.When sample note in search bucket
When record solves neighborhood, use alternative manner, due to the symmetry of neighborhood system, it is only necessary to search
Bk, Bk+1Record in Tong solves its neighborhood.As shown in fig. 1, to the sample in the bucket of k=2,
The when of solving neighborhood, can only search for the sample in k=2 Yu k=3 bucket.As in figure 2 it is shown, at bucket not
In the case of continuous print, for the sample in k=2 bucket, owing to being empty, in its bucket in the k=3 bucket that it is adjacent
The solution room of sample neighborhood reduces.
It is O (n), n=| U | that application hash sets up the time complexity of bucket.Sample attribute set
C={a1, a2..., amThe number of the sample attribute in } is m, and the number of point bucket of foundation is b, at sample
In the case of being evenly distributed in each bucket, the complexity that neighborhood calculates isAt b convergence
In the case of | u |, the complexity that neighborhood calculates will level off to O (m | U |).
Claims (3)
1. the application Hash Hash processed for big data divides the method that bucket quickly obtains neighborhood, and it is special
Levy and be that it comprises the steps:
Step one, seeks the zero x of point bucket coordinate system0,
In neighborhood system NRS=<U, N, θ>, U is the set that whole sample record is constituted, and N represents neighborhood
Relation, θ is the radius of neighbourhood;
Step 2, seeks the distance of sample,
ForN is number of samples, seeks the distance between sample | | xi-x0||;
Step 3, according to the sample distance in step 2, sets up by Hash method and searches for bucket:
ForCalculateK is nonnegative integer,
Hash table is set up as hash Key using k;Set up hash table Sphere Measurement Model in space: by hash Key
As the radius of sphere, hash table is exactly a series of mutually nested balls in space;Under a certain hash Key
Sample, is in the space between the sphere with hash Key as radius and the hashKey-1 sphere as radius,
Space between adjacent sphere is referred to as bucket, B1, B2..., Bb, for by b hash Key value as the b of radius gained
Individual bucket;
Step 4, it is thus achieved that neighborhood:
Search bucket Bk-1,Bk,Bk+1In record obtain sample x neighborhood.
A kind of application Hash Hash processed for big data divides bucket and quickly obtains
The method obtaining neighborhood, it is characterised in that x0Take initial point or in N minima composition a characteristic vector.
A kind of application Hash Hash processed for big data divides bucket and quickly obtains
The method obtaining neighborhood, it is characterised in that when searching for sample record acquisition neighborhood in bucket, use alternative manner,
Have only to search for Bk, Bk+1Record in Tong obtains the neighborhood of sample x.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310261081.6A CN103345491B (en) | 2013-06-26 | 2013-06-26 | A kind of method applying Hash Hash division bucket quickly to obtain neighborhood |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310261081.6A CN103345491B (en) | 2013-06-26 | 2013-06-26 | A kind of method applying Hash Hash division bucket quickly to obtain neighborhood |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103345491A CN103345491A (en) | 2013-10-09 |
CN103345491B true CN103345491B (en) | 2016-11-23 |
Family
ID=49280286
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310261081.6A Active CN103345491B (en) | 2013-06-26 | 2013-06-26 | A kind of method applying Hash Hash division bucket quickly to obtain neighborhood |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103345491B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111125892B (en) * | 2019-12-12 | 2021-10-12 | 北京科技大学 | Data storage and indexing method and system for molecular dynamics simulation program |
CN114490011B (en) * | 2020-11-12 | 2024-06-18 | 上海交通大学 | Parallel acceleration realization method of N-body simulation in heterogeneous architecture |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103020321A (en) * | 2013-01-11 | 2013-04-03 | 广东图图搜网络科技有限公司 | Neighbor searching method and neighbor searching system |
-
2013
- 2013-06-26 CN CN201310261081.6A patent/CN103345491B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103020321A (en) * | 2013-01-11 | 2013-04-03 | 广东图图搜网络科技有限公司 | Neighbor searching method and neighbor searching system |
Non-Patent Citations (1)
Title |
---|
异构平台上多维线性哈希的研究;刘勇等;《计算机科学》;20121031;第39卷(第10期);157-159,163 * |
Also Published As
Publication number | Publication date |
---|---|
CN103345491A (en) | 2013-10-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105912643A (en) | Image retrieval method based on content improved Average Hash | |
CN102819733B (en) | Rapid detection fuzzy method of face in street view image | |
Fan et al. | Iterative particle filter for visual tracking | |
CN108734187A (en) | A kind of multiple view spectral clustering based on tensor singular value decomposition | |
CN107623639A (en) | Data flow distribution similarity join method based on EMD distances | |
Wang et al. | Forest fire detection based on lightweight Yolo | |
Zhao et al. | Content based image retrieval scheme using color, texture and shape features | |
Liu et al. | Object proposal on RGB-D images via elastic edge boxes | |
CN103399863B (en) | Image search method based on the poor characteristic bag of edge direction | |
CN103345491B (en) | A kind of method applying Hash Hash division bucket quickly to obtain neighborhood | |
Wang et al. | Toward structural learning and enhanced YOLOv4 network for object detection in optical remote sensing images | |
Wang et al. | Material-aware Cross-channel Interaction Attention (MCIA) for occluded prohibited item detection | |
CN104933080B (en) | A kind of method and device of determining abnormal data | |
Panigrahi et al. | MS-ML-SNYOLOv3: A robust lightweight modification of SqueezeNet based YOLOv3 for pedestrian detection | |
Guo et al. | Lightweight SSD: Real-time Lightweight Single Shot Detector for Mobile Devices. | |
Wang et al. | Calyolov4: lightweight yolov4 target detection based on coordinated attention | |
Zhang et al. | Real-time detector design for small targets based on bi-channel feature fusion mechanism | |
Chan et al. | Rotating object detection in remote-sensing environment | |
Hou et al. | A detection method for the ridge beast based on improved YOLOv3 algorithm | |
CN114792397A (en) | SAR image urban road extraction method, system and storage medium | |
Mao et al. | Mapping Whole DNA Sequence on Variant Maps | |
Baek et al. | Bayesian learning of a search region for pedestrian detection | |
Liu et al. | Performance analysis of different DCNN models in remote sensing image object detection | |
Wu et al. | Dualray: Dual-View X-ray Security Inspection Benchmark and Fusion Detection Framework | |
Anh | Segmentation by incremental clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |