CN105574214B - A kind of similarity retrieval method of the fine granularity position code filtering based on IDistance - Google Patents

A kind of similarity retrieval method of the fine granularity position code filtering based on IDistance Download PDF

Info

Publication number
CN105574214B
CN105574214B CN201610124087.2A CN201610124087A CN105574214B CN 105574214 B CN105574214 B CN 105574214B CN 201610124087 A CN201610124087 A CN 201610124087A CN 105574214 B CN105574214 B CN 105574214B
Authority
CN
China
Prior art keywords
anchor point
fgbc
dimension
code
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610124087.2A
Other languages
Chinese (zh)
Other versions
CN105574214A (en
Inventor
袁鑫攀
汪灿飞
何岸
向平
向一平
朱艳辉
满君丰
李长云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HUNAN YUN ZHI IOT NETWORKTECHNOLOGY Co.,Ltd.
Original Assignee
Hunan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Technology filed Critical Hunan University of Technology
Priority to CN201610124087.2A priority Critical patent/CN105574214B/en
Publication of CN105574214A publication Critical patent/CN105574214A/en
Application granted granted Critical
Publication of CN105574214B publication Critical patent/CN105574214B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees

Abstract

The present invention proposes a kind of fine granularity position code (fine grained bit code based on IDistance, abbreviation FGBC) filtering similarity retrieval method: this method establish index when, more fine-grained region is divided, the corresponding FGBC code in each region, realizes the Candidate Set that ring body is searched for using FGBC code and more accurately filters.The filtering of BC code is compared, FGBC-IDistance's at most can be reduced apart from calculation times to 1/22d, comparing in calculation times are as follows: FGBC-IDistance≤BC-IDistance≤IDistance.

Description

A kind of similarity retrieval method of the fine granularity position code filtering based on IDistance
Technical field
The present invention relates to data directory fields, more particularly, to a kind of fine granularity position code based on IDistance The similarity retrieval method of (fine grained bit code, abbreviation FGBC) filtering.
Background technique
IDistance is a kind of high dimension vector indexing means based on metric space.The basic thought that its index is established It is: chooses several anchor points in entire data space, each anchor point corresponds to a cluster subset.Each of data space to Amount is all divided into the cluster subset of the anchor point nearest from the vector.Then high dimension vector by being converted at a distance from anchor point One one-dimensional key value iDist that can be measured, utilizes B+- tree organizes the key value iDist of all high dimension vectors, The calculation formula of key value iDist are as follows:
IDist (x)=dist (Pi, x) and+i*c
Wherein x is any vector, PiFor anchor point, dist () is Euclidean distance function, and iDist () is one-dimensional key value letter Number.
As shown in Figure 1, P0、P1、P2For anchor point;CiFor anchor point PiVector subset in from anchor point PiFarthest vector away from From i.e. anchor point PiVector subset radius;C is a constant, greater than all Ci
If complete or collected works are D, a similarity dimensions inquiry (query) is given, i.e. retrieval is less than radius r with query point q distance Vector set: Range (q, r)={ x ∈ D, dist (q, x) < r }, wherein function dist (q, x) indicate query point q take office The distance for vector x of anticipating.
The retrieving of IDistance are as follows:
(1) pass through and each anchor point PiDistance calculate: the search circle of query point q whether with anchor point PiTo quantum Collection intersection.
The judgment formula of intersection are as follows: dist (q, Pi)<Ci+r
Disjoint judgment formula are as follows: dist (q, Pi)>Ci+r
Without searched targets point in the vector subset of the anchor point if non-intersecting;
If intersection, it is determined that anchor point PiDistance (dist) ring body range of search:
{x∈Pi,max(dist(Pi, q) and-r, 0) < dist (Pi, x) and < min (dist (Pi,q)+r,Ci)}
So that it is determined that the search range of iDist:
{x∈Pi,i*c+max(dist(Pi, q) and-r, 0) < iDist (Pi, x) and < i*c+min (dist (Pi,q)+r,Ci)} The vector set retrieved is then Candidate Set.
(2) each vector in Candidate Set is carried out with q apart from calculating respectively, if distance is less than r, enters final inspection Rope result set.
The index problem of high dimension vector is cleverly reduced on one-dimensional by IDistance by way of choosing anchor point, will One-dimensional index passes through B+Tree carries out tissue, has the characteristics that search is fast, has saved a large amount of distance and has calculated.
BC (bit Code)-IDistance increases the orientation code in reference axis, the code on the basis of IDistance It is made of binary digit (bit Code), abbreviation position code.By in one-dimensional key value structure increase BC code, to Candidate Set When filtering, has the characteristics that quickly to filter, can be calculated to avoid more distance.As shown in Fig. 2, the BC code difference of each region Are as follows: 00,01,10,11, i.e. BC0=00, BC1=01, BC2=10, BC3=11, the vector positioned at BC code region has corresponding BC code.
BC-IDistance increases the step of filtering (2.1) in the searching step 2 of IDistance:
(2.1) filtering of BC code is carried out to each vector in Candidate Set.
Judging whether the principle of filtering is: the search circle and certain anchor point P of qiBC code region whether intersect, if intersection It does not filter then, is filtered if non-intersecting.
The definition of BC code distance lower bound:
Assuming that q (q1,q2,…qd) it is query point, Pi(Pi1,Pi2,…Pid) it is anchor point, define q to certain anchor point PiThe region k Apart from lower bound be minBC (Pi,k,q)。
Wherein, δjBe query point q in j dimension at a distance from Pi;qjIt is coordinate of the query point q in j dimension;Pij is anchor point Pi Coordinate in j dimension;BCjIt is value of the BC code of region k in j dimension;BCqjIt is the BC code of the region query point q in j dimension Value.
By shown in the example in Fig. 2, it is assumed that P0Coordinate be (1,1), the coordinate of q is (5,0), the position code 10 of q.
minBC(P0, 0, q)=5;
minBC(P0, 1, q)=(12+42)1/2=171/2
minBC(P0, 2, q)=0;
minBC(P0, 3, q)=1;
Judge the search circle and certain anchor point P of qiThe region k intersection formula are as follows:
minBC(Pi,k,q)<r
(2.2) each vector in filtered Candidate Set is carried out with q apart from calculating, if distance is less than r, is entered most Whole retrieval set.
BC-IDistance can be filtered out in Candidate Set by BC code record position relationship by the rapid comparison of BC code 's.But the granularity of BC-IDistance codes is bigger, for it is every it is one-dimensional preferably also can only just filter a half data, usually can be by In the slightly bigger point of radius, to intersect with the axis of certain dimension of anchor point, so that this dimension loses filter effect.
Summary of the invention
The present invention is the shortcomings that overcoming above-mentioned prior art BC-IDistance, to propose a kind of based on IDistance's The similarity retrieval method of fine granularity position code filtering.
In order to solve the above technical problems, technical scheme is as follows:
A kind of similarity retrieval method of the fine granularity position code filtering based on IDistance, comprising the following steps:
S1, the index structure for establishing FGBC-IDistance;
S11, in anchor point Pi(Pi1,Pi2,…,Pij,…,Pid) find 2 anchor points again per one-dimensional both sides as time anchors Point, secondary anchor point ((L1,R1),(L2,R2),…,(Lj,Rj)…,(Ld,Rd)) indicate, Rj>Lj, 1≤j≤d, PijIndicate anchor point Pi Value in jth dimension, RjAnd LjIndicate anchor point PiJth dimension on two time anchor points;
S12, fine granularity position code FGBC, if vector S (S1,S2,…,Sd) belonging to cluster subspace anchor point be Pi(Pi1, Pi2,…,Pij,…,Pid), the FGBC code table of vector S is shown as BS(bS11bS12,bS21bS22,…,bSj1bSj2,…,bSd1bSd2), Middle bSj1bSj2Meet formula (1):
Wherein, bSj1bSj2It is vector S in anchor point PiJth dimension on position code, SjIt is value of the vector S in jth dimension;
S13, the index structure figure for establishing FGBC-IDistance;
S2, the index structure figure based on FGBC-IDistance are retrieved, retrieving are as follows:
S21, pass through and each anchor point PiDistance calculate: the search circle of query point q whether with anchor point PiTo quantum Collection intersection;
The judgment formula of intersection are as follows: dist (q, Pi)<Ci+r
Disjoint judgment formula are as follows: dist (q, Pi)>Ci+r
Wherein, function dist (q, Pi) indicate query point q to anchor point PiDistance, CiFor anchor point PiVector subset in from Anchor point PiThe distance of farthest vector, the radius for the search circle that r is query point q;
Without searched targets point in the vector subset of the anchor point if non-intersecting;
If intersection, it is determined that anchor point PiDistance (dist) ring body range of search:
{x∈Pi,max(dist(Pi, q) and-r, 0) < dist (Pi, x) and < min (dist (Pi,q)+r,Ci)}
Wherein, x indicates any vector;
So that it is determined that the search range of iDist:
{x∈Pi,i*c+max(dist(Pi, q) and-r, 0) < iDist (Pi, x) and < i*c+min (dist (Pi,q)+r,Ci)} The vector set retrieved is then Candidate Set.
S22, the filtering of FGBC code is carried out to each vector in Candidate Set;
Judging whether the principle of filtering is: the search circle and anchor point P of query point qiFGBC code region whether intersect, It is not filtered if intersection, if non-intersecting filter;
S23, in filtered Candidate Set each vector and q carry out apart from calculating, if distance is less than r, enter most Whole retrieval set.
Preferably, the step S12 is by anchor point PiCluster subspace is divided into four regions per one-dimensional, and every dimension produces Raw position code length is 2, then the position code length that the data of d dimension generate is 2d, and position code will entirely cluster Subspace partition at 22d A zonule.
Preferably, the index structure figure of FGBC-IDistance is divided into B in the step S13+- Tree layers and FGBC code Layer.
Preferably, it also needs to determine FGBC code when the step S22 carries out the filtering of FGBC code to each vector in Candidate Set Apart from lower bound:
Assuming that q (q1,q2,…qd) it is query point, Pi(Pi1,Pi2,…Pid) it is anchor point, define q to certain anchor point PiThe region k Apart from lower bound be minBC (Pi,k,q);
Wherein, δjIt is query point q in j dimension and PiDistance;
qjIt is coordinate of the query point q in j dimension;
PijIt is anchor point PiCoordinate in j dimension;
LjIt is anchor point PiThe coordinate of time anchor point in j dimension;
RjIt is anchor point PiAnother coordinate of secondary anchor point in j dimension;
bsj1bsj2It is value of the FGBC code of region k in j dimension;
bqj1bqj2It is value of the FGBC code of the region query point q in j dimension.
Compared with prior art, the beneficial effect of technical solution of the present invention is: a kind of fine granularity position based on IDistance The similarity retrieval method (FGBC-IDistance) of code filtering is that two-dimensional space is divided into 16 regions, each region corresponding one A FGBC code.Because FGBC-IDistance is the division of the granularity of more refinement on the basis of BC-IDistance, FGBC-IDistance can be more preferable relative to the filter effect of BC-IDistance.
Detailed description of the invention
Fig. 1 is the index structure schematic diagram of IDistance.
Fig. 2 is the position code schematic diagram of two-dimensional space.
Fig. 3 is the fine granularity position code schematic diagram of two-dimensional space.
Fig. 4 is FGBC-IDistance index structure figure.
Fig. 5 is two-dimensional space IDistance filter effect figure.
Fig. 6 is two-dimensional space BC-IDistance filter effect figure.
Fig. 7 is two-dimensional space FGBC-IDistance filter effect figure.
Fig. 8 is IDistance, and the distance of the measuring and calculating of tri- methods of BC-IDistance, FGBC-IDistance calculates secondary Number histogram.
Fig. 9 is implementation flow chart of the invention.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;In order to better illustrate this embodiment, attached Scheme certain components to have omission, zoom in or out, does not represent the size of actual product;
To those skilled in the art, it is to be understood that certain known features and its explanation, which may be omitted, in attached drawing 's.The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.
A kind of similarity retrieval method of the fine granularity position code filtering based on IDistance, such as Fig. 9, comprising the following steps:
Step 1 establishes FGBC index
Define 1 anchor point
Secondary anchor point refers to anchor point Pi(Pi1,Pi2,…Pid) find 2 anchor points, secondary anchor point ((L again per one-dimensional both sides1, R1), (L2,R2) ..., (Ld,Rd)) indicate, Rj>Lj, (1≤j≤d), j expression jth dimension.As shown in Fig. 3 (a), pass through these three Point can will be divided into four parts per one-dimensional.P0For anchor point, L1, R1As anchor point P0Two anchor points in the 0th dimension.
Define 2 fine granularity positions code FGBC
Assuming that vector S (S1,S2,…,Sd) belonging to cluster subspace anchor point be Pi(Pi1,Pi2,…Pid), Lj,Rj(1≤ J≤d) it is anchor point PiJth dimension on two time anchor points.The FGBC code table of vector S is shown as BS(bS11bS12,bS21bS22,…, bSj1bSj2,…,bSd1bSd2).Wherein bSj1bSj2Meet formula (1):
Wherein, bSj1bSj2It is vector S in anchor point PiJth dimension on position code, SjIt is value of the vector S in jth dimension.
FGBC code is by anchor point PiCluster subspace be divided into four regions per one-dimensional.So the position code that every dimension generates Length is 2.The position code length that the data of d dimension generate is 2d.Position code will entirely cluster Subspace partition at 22dA zonule.Such as Shown in Fig. 3 (a), P0For the center for entirely clustering subspace, this is divided into 4 sub-spaces, their position code is respectively (00,01,10,11)。
Fig. 3 (b), position code has 4 in two-dimensional space.Wherein the position code of black portions is the coding in the first dimension, red portion Quartile code is the coding in the second dimension.This cluster Subspace partition is 16 regions by 4 codings, the coding in each region its Real is exactly FGBC code.
Index structure figure
The index structure figure of FGBC-IDistance is established as shown in figure 4, can be seen that FGBC- from structure chart The index structure of IDistance is divided into B+- Tree layers and FGBC code layer.IO and Euclidean distance calculating are two than relatively time-consuming step Suddenly.Position code layer is smaller than data Layer the space occupied, can achieve two purposes by the filtering of position code layer, first, reducing IO's Amount of access.Second, reducing the number that Euclidean distance calculates.
Retrieval of the step 2 based on FGBC-IDistance
(1) pass through and each anchor point PiDistance calculate: the search circle of q whether with anchor point PiVector subset intersection.
The judgment formula of intersection are as follows: dist (q, Pi)<Ci+r
Disjoint judgment formula are as follows: dist (q, Pi)>Ci+r
(2) if non-intersecting in the vector subset of the anchor point without searched targets point;If intersection, it is determined that the ring body model of search It encloses.The ring body range of search are as follows:
{x∈Pi,max(dist(Pi, q) and-r, 0) < dist (Pi, x) and < min (dist (Pi,q)+r,Ci)}
(3) search range of iDist is determined, to quickly be searched on B+ tree, the vector found enters candidate Collection.The search range of iDist:
{x∈Pi,i*c+max(dist(Pi, q) and-r, 0) < iDist (Pi, x) and < i*c+min (dist (Pi,q)+r,Ci)}
(4) filtering of FGBC code is carried out to each vector in Candidate Set.
Judging whether the principle of filtering is: the search circle and certain anchor point P of qiFGBC code region whether intersect, if phase Friendship is not filtered then, is filtered if non-intersecting.
The definition apart from lower bound of FGBC code:
Assuming that q (q1,q2,…qd) it is query point, Pi(Pi1,Pi2,…Pid) it is anchor point, define query point q to certain anchor point Pi's The region k apart from lower bound be minBC (Pi,k,q)。
Wherein, δjBe query point q in j dimension at a distance from Pi;
qjIt is coordinate of the query point q in j dimension;
Pij is coordinate of the anchor point Pi in j dimension;
LjIt is one coordinate of the anchor point in j dimension of anchor point Pi;
RjIt is another coordinate of secondary anchor point in j dimension of anchor point Pi;
bsj1bsj2It is value of the FGBC code of region k in j dimension;
bqj1bqj2It is value of the FGBC code of the region query point q in j dimension.
The correctness of formula (2) and formula (3) is proved below:
1) work as bqj1bqj2=bsj1bsj2When, it indicates identical as the FGBC code of S in jth dimension q, belongs to same area in jth dimension Domain, therefore be 0 apart from lower bound;
2) work as bqj1bqj2≠bsj1bsj2and bqj1=bsj1When=1, (bqj1bqj2, bsj1bsj2) value be (10,11) or (11,10).As shown in Fig. 3 (a), qjWith sjNecessarily in RjTwo sides.(q at this timej-sj)2> (qj-Rj)2.Work as bsj1bsj2=11and bqj1bqj2When=00, then qj< Lj, sj≥Rj, and Lj< Rj, so sj-qj> Rj-qj, thus (qj-sj)2> (qj-Rj)2
3) work as bqj1bqj2≠bsj1bsj2and bqj1=bsj1When=0, (bqj1bqj2, bsj1bsj2) value be necessarily (00,01) Or (01,00).As shown in Fig. 3 (a), qjWith sjNecessarily in LjTwo sides.(q at this timej-sj)2> (qj-Lj)2.Work as bsj1bsj2= 00and bqj1bqj2When=11, then qj≥Rj, sj< Lj, and Lj< Rj, so qj-sj> qj-Lj, thus (qj-sj)2> (qj-Lj)2
4) remaining situation is (bqj1bqj2=01and bsj1bsj2=10) or (bqj1bqj2=10and bsj1bsj2=01), such as Shown in Fig. 3 (a), qjWith sjNecessarily in pijTwo sides, (q at this timej-sj)2> (qj-pij)2
Since the FGBC of q and S is not identical, thus at least conform to it is above 2), 3), 4) one of.Therefore it must demonstrate,prove.
Judge the search circle and certain anchor point P of qiThe region k intersection formula are as follows:
minBC(Pi,k,q)<r
Filter effect is as shown in Figure 7.
In addition a suboptimization can also be being done on the basis of the above filtering: judging that certain vector S in the region of intersection is It is no to filter, it is only necessary to calculate the FGBC code of query point q to S apart from lower bound, calculated without doing time-consuming distance.If looked into It askes radius to be less than apart from lower bound, then can filter out S.
(5) each vector in filtered Candidate Set is carried out with q apart from calculating, if distance is less than r, is entered finally Retrieval set.
The performance advantage of FGBC-IDistance
IDistance is filtered based on triangle inequality.As shown in figure 5, the Candidate Set that range query obtains is Vector on the annulus of blue, institute's directed quantity requires to carry out with query point apart from calculating in ring body.The area of ring body is maximum, needs It is most apart from calculating.
BC-IDistance is directed to this problem, proposes the location information that data point is recorded with BC code.As shown in fig. 6, Two-dimensional space in figure is divided into 4 regions, the region that when range query does not intersect with inquiry circle can filter out.Compared to IDistance, the distance for reducing a part calculate.
Two-dimensional space is divided into 16 regions, the corresponding FGBC code in each region by FGBC-IDistance in Fig. 7.Because FGBC-IDistance is the division of the granularity of more refinement on the basis of BC-IDistance, so FGBC-IDistance phase It can be more preferable for the filter effect of BC-IDistance.
The division in space is also not more thinner better, divides increase and the space that carefully will lead to very much additional computation complexity The increase of complexity, so as to cause the decline of performance.
In IDistance, BC-IDistance, FGBC-IDistance in three methods, before calculate the poly- of intersection Class subspace, passes through B+- Tree need to look for the node within the scope of first, and the step is the same.The node that they find Number is also identical.Euclidean distance calculating is one than relatively time-consuming process, if regarding Euclidean distance calculating as a consumption When atomic operation, then the number of nodes that the distance of BC-IDistance calculates at most can be reduced to 1/2d, worst is exactly all Need to calculate distance, that is, the distance of IDistance calculates number.Due to the special coding of FGBC-IDistance, The number that the distance of FGBC-IDistance calculates at most can be reduced to 1/22d, it is 1/2 at leastd, i.e. BC-IDistance's Apart from calculation times, comparing in calculation times: FGBC-IDistance≤BC-IDistance≤IDistance, Actually calculate apart from calculation times, as shown in Figure 8.
The same or similar label correspond to the same or similar components;Described in attached drawing positional relationship for being only used for showing Example property explanation, should not be understood as the limitation to this patent;
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims (4)

1. a kind of similarity retrieval method of the fine granularity position code filtering based on IDistance, which is characterized in that including following step It is rapid:
S1, the index structure figure for establishing FGBC-IDistance;
S11, in anchor point Pi(Pi1,Pi2,…,Pij,…,Pid) find 2 anchor points again per one-dimensional both sides as time anchor points, it is secondary Anchor point ((L1,R1),(L2,R2),…,(Lj,Rj)…,(Ld,Rd)) indicate, Rj>Lj, 1≤j≤d, PijIndicate anchor point PiIn jth Value in dimension, RjAnd LjIndicate anchor point PiJth dimension on two time anchor points;
S12, fine granularity position code FGBC, if vector S (S1,S2,…,Sd) belonging to cluster subspace anchor point be Pi(Pi1, Pi2,…,Pij,…,Pid), the FGBC code table of vector S is shown as BS(bS11bS12,bS21bS22,…,bSj1bSj2,…,bSd1bSd2), Middle bSj1bSj2Meet formula (1):
Wherein, bSj1bSj2It is vector S in anchor point PiJth dimension on position code, SjIt is value of the vector S in jth dimension;
S13, index structure figure is established;
S2, the index structure figure based on FGBC-IDistance are retrieved, retrieving are as follows:
S21, acquisition Candidate Set is retrieved by IDistance
By with each anchor point PiDistance calculate: the search circle of query point q whether with anchor point PiVector subset intersection;
The judgment formula of intersection are as follows: dist (q, Pi)<Ci+r
Disjoint judgment formula are as follows: dist (q, Pi)>Ci+r
Wherein, function dist (q, Pi) indicate query point q to anchor point PiDistance, CiFor anchor point PiVector subset in from anchor point Pi The distance of farthest vector, the radius for the search circle that r is query point q;
Without searched targets point in the vector subset of the anchor point if non-intersecting;
If intersection, it is determined that anchor point PiDistance (dist) ring body range of search:
{x∈Pi,max(dist(Pi, q) and-r, 0) < dist (Pi, x) and < min (dist (Pi,q)+r,Ci)}
Wherein, x indicates any vector;
So that it is determined that the search range of iDist:
{x∈Pi,i*c+max(dist(Pi, q) and-r, 0) < iDist (Pi, x) and < i*c+min (dist (Pi,q)+r,Ci) retrieval To vector set be then Candidate Set;
S22, the filtering of FGBC code is carried out to each vector in Candidate Set;
Judging whether the principle of filtering is: the search circle and anchor point P of query point qiFGBC code region whether intersect, if phase Friendship is not filtered then, is filtered if non-intersecting;
FGBC code region is FGBC code by anchor point PiCluster subspace is divided into 4 regions per one-dimensional, what every dimension generated Position code length is 2, then the position code length that the data of d dimension generate is 2d, and position code will entirely cluster Subspace partition at 22dIt is a small Region;
S23, in filtered Candidate Set each vector and query point q carry out apart from calculating, if distance is less than r, enter Final retrieval set.
2. the method according to claim 1, wherein the step S12 is by anchor point PiCluster each of subspace Dimension is divided into 4 regions, and the position code length that every dimension generates is 2, then the position code length that the data of d dimension generate is 2d, and position code will Entire cluster Subspace partition is at 22dA zonule.
3. the method according to claim 1, wherein in the step S13 FGBC-IDistance index knot Composition is divided into B+- Tree layers and FGBC code layer.
4. the method according to claim 1, wherein the step S22 carries out each vector in Candidate Set FGBC code filter when also need determine FGBC code apart from lower bound:
Assuming that q (q1,q2,…qd) it is query point, Pi(Pi1,Pi2,…Pid) it is anchor point, define q to certain anchor point PiThe region k away from It is minBC (P from lower boundi,k,q);
Wherein, δjIt is query point q in j dimension and PiDistance;
qjIt is coordinate of the query point q in j dimension;
PijIt is anchor point PiCoordinate in j dimension;
LjIt is anchor point PiThe coordinate of time anchor point in j dimension;
RjIt is anchor point PiAnother coordinate of secondary anchor point in j dimension;
bsj1bsj2It is value of the FGBC code of region k in j dimension;
bqj1bqj2It is value of the FGBC code of the region query point q in j dimension.
CN201610124087.2A 2016-03-04 2016-03-04 A kind of similarity retrieval method of the fine granularity position code filtering based on IDistance Active CN105574214B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610124087.2A CN105574214B (en) 2016-03-04 2016-03-04 A kind of similarity retrieval method of the fine granularity position code filtering based on IDistance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610124087.2A CN105574214B (en) 2016-03-04 2016-03-04 A kind of similarity retrieval method of the fine granularity position code filtering based on IDistance

Publications (2)

Publication Number Publication Date
CN105574214A CN105574214A (en) 2016-05-11
CN105574214B true CN105574214B (en) 2019-04-09

Family

ID=55884345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610124087.2A Active CN105574214B (en) 2016-03-04 2016-03-04 A kind of similarity retrieval method of the fine granularity position code filtering based on IDistance

Country Status (1)

Country Link
CN (1) CN105574214B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096065B (en) * 2016-07-29 2019-10-29 贵州大学 A kind of similar to search method and device of multimedia object

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1687932A (en) * 2005-05-30 2005-10-26 北大方正集团有限公司 Index structuring method for fast searching mass picture based on content
CN102306202A (en) * 2011-09-30 2012-01-04 中国传媒大学 High-dimension vector rapid searching algorithm based on block distance
CN103345509A (en) * 2013-07-04 2013-10-09 上海交通大学 Method and system for obtaining grading partition tree of dual-reverse furthest neighbors on road network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6834278B2 (en) * 2001-04-05 2004-12-21 Thothe Technologies Private Limited Transformation-based method for indexing high-dimensional data for nearest neighbour queries

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1687932A (en) * 2005-05-30 2005-10-26 北大方正集团有限公司 Index structuring method for fast searching mass picture based on content
CN102306202A (en) * 2011-09-30 2012-01-04 中国传媒大学 High-dimension vector rapid searching algorithm based on block distance
CN103345509A (en) * 2013-07-04 2013-10-09 上海交通大学 Method and system for obtaining grading partition tree of dual-reverse furthest neighbors on road network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BC_iDistance_基于位码的优化高维索引;梁俊杰等;《小型微型计算机系统》;20070930;第28卷(第9期);第1647-1651页

Also Published As

Publication number Publication date
CN105574214A (en) 2016-05-11

Similar Documents

Publication Publication Date Title
Chen et al. KNN-BLOCK DBSCAN: Fast clustering for large-scale data
Chen et al. Maximum co-located community search in large scale social networks
Hu et al. Distance indexing on road networks
Lee et al. Trajectory clustering: a partition-and-group framework
Wang et al. Evaluating a class of distance-mapping algorithms for data mining and clustering
Abbasifard et al. A survey on nearest neighbor search methods
US9996764B2 (en) Image matching method based on cascaded binary encoding
Pelekis et al. Clustering trajectories of moving objects in an uncertain world
Chen et al. Clustering of trajectories based on Hausdorff distance
CN105760780B (en) Track data method for secret protection based on road network
Li et al. Motion-alert: automatic anomaly detection in massive moving objects
CN107766433B (en) Range query method and device based on Geo-BTree
Song et al. Solutions for processing k nearest neighbor joins for massive data on mapreduce
CN102890719B (en) A kind of method that license plate number is searched for generally and device
Demiryurek et al. Indexing network voronoi diagrams
Wang et al. Polygonal clustering analysis using multilevel graph‐partition
Li et al. Probabilistic threshold k-ann query method based on uncertain voronoi diagram in internet of vehicles
Chen et al. Robustiq: A robust ann search method for billion-scale similarity search on gpus
CN105574214B (en) A kind of similarity retrieval method of the fine granularity position code filtering based on IDistance
Lee et al. Learnable structural semantic readout for graph classification
CN109885638B (en) Three-dimensional space indexing method and system
Kim et al. Effective urban region representation learning using heterogeneous urban graph attention network (hugat)
Zou et al. Multispans: A multi-range spatial-temporal transformer network for traffic forecast via structural entropy optimization
Tran et al. Mining spatial co-location patterns based on overlap maximal clique partitioning
Lin et al. General spatial skyline operator

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200528

Address after: Room g0044, headquarters building, Changsha Zhongdian Software Park Co., Ltd., No. 39, Jianshan Road, Changsha hi tech Development Zone, Changsha City, Hunan Province

Patentee after: HUNAN YUN ZHI IOT NETWORKTECHNOLOGY Co.,Ltd.

Address before: 412000 Taishan Road, Tianyuan District, Hunan, No. 88, No.

Patentee before: HUNAN UNIVERSITY OF TECHNOLOGY