CN105868414A - Clustering separation distributive indexing method - Google Patents

Clustering separation distributive indexing method Download PDF

Info

Publication number
CN105868414A
CN105868414A CN201610287204.7A CN201610287204A CN105868414A CN 105868414 A CN105868414 A CN 105868414A CN 201610287204 A CN201610287204 A CN 201610287204A CN 105868414 A CN105868414 A CN 105868414A
Authority
CN
China
Prior art keywords
vector
key
chord
node
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610287204.7A
Other languages
Chinese (zh)
Other versions
CN105868414B (en
Inventor
袁鑫攀
汪灿飞
何频捷
梁圣
满君丰
向平
向一平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Technology
Original Assignee
Hunan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Technology filed Critical Hunan University of Technology
Priority to CN201610287204.7A priority Critical patent/CN105868414B/en
Publication of CN105868414A publication Critical patent/CN105868414A/en
Application granted granted Critical
Publication of CN105868414B publication Critical patent/CN105868414B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer And Data Communications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a clustering separation distributive indexing method, called CS-Chord (Clustering separation-Chord) for short. In M-Chord distributive indexing, marginal vectors of clusters are generally sparser, the sparser vectors enable the radius of each cluster to become quite large; during range query, the clusters with larger radius intersect with range searching areas more easily, and the to-be-searched areas are increased; however, the marginal vectors of the clusters are usually vectors with high access, and the performance is further reduced. The sparse vectors at margins of the clusters are separated out and stored in a concentrated manner in an independent server by the CS-Chord, dense vectors are stored in a Chord ring, and during searching, on the one hand, the high-frequency query is concentrated on the vectors in the independent server, on the other hand, the searching range on the Chord ring is reduced, so that the retrieval efficiency is improved.

Description

A kind of distributed index method clustering separation
Technical field
The present invention relates to distributed index field, more particularly, to a kind of distributed index side clustering separation Method.
Background technology
P2P peer-to-peer network is independent of special centralized server, and in network, all of node is all equality, Freely interconnect.They carry out resource and the service of Sharing computer by exchange.P2P distributed index structure is filled That divides make use of the performance of each node in network node, has extensibility good, resource utilization advantages of higher. In recent years, distributed index has been increasingly becoming the focus of research.NOVAK.D et al. proposes M-Chord is the distributed index algorithm of a kind of high dimension vector similarity retrieval based on P2P network.This algorithm Being combined with Chord agreement by IDistance algorithm, IDistance algorithm is responsible for high dimension vector dimensionality reduction, Chord Agreement is responsible for the storage and retrieval of distributed vector.
Chord is a kind of structurized distributed lookup agreement, comes in P2P network fast by DHT technology Speed ground locating resource.In order to realize quick resource lookup, each node on Chord ring is required for safeguarding O(log2 n) routing table of (n is the node total number mesh in Chord ring) length.In Chord agreement, node and Data all can obtain the m bit identifier in same a room by mapping, makes each by introducing dummy node The data that node storage is roughly equal, i.e. Chord agreement is load balancing.Node route list is scattered, Each node has only to know that in whole system, the routing iinformation of minority node is the most permissible by constantly redirecting inquiry Obtain query path information.The operation of one query has only to produce O (log in ring2 n) bar message.
Distributed hash (DHT) is in a certain way for each node one the unique mark of distribution in network Know symbol.In Chord agreement, data resource is also by same rule one unique identifier of distribution.Chord Agreement uses concordance hash algorithm (Consistent Hash) to calculate node and resource, and the result of mapping is led to Cross 2mDelivery obtains the identifier of a m position, and scope is [0,2m-1].For node, IP address is only One, concordance Hash is by obtaining node identifier to the IP Address-Hash of node.For data, pass through Key value Hash is obtained with the identifier of data.M=2, N4Chord ring such as Fig. 1 (a) shown in, Ni is node, and Ki is resource.
Resource position rapidly the routing iinformation that each node relied primarily on is preserved.The data knot of each node Structure has a routing table, saves data and the address information of part of nodes, as shown in Fig. 1 (b).
The lookup of Chord can be divided into following steps:
(1) certain node N receives key value key to be checked, first searches in the local resource of this node N and is No have this key value, if node N has this key value, then searches and terminates and return node resource, otherwise Turn to step (2).
(2) check the pointer gauge of requested node, find the identifier and closest mapped less than key value Node, then search request is sent on this node, repeat step (1).
IDistance is a kind of high dimension vector indexing means based on metric space.The basic think of that its index is set up Thinking: choose several anchor points in whole data space, each anchor point correspond to a cluster subset.Data Each data point in space is divided in the cluster subset of the nearest anchor point of this data point.Then higher-dimension to Measure by being converted into an one-dimensional key value iDist that can measure with the distance of anchor point, utilize B+-Tree group Knitting the key value iDist managing all high dimension vectors, the computing formula of key value iDist is: iDist (x)=dist (pi,x) +i*c.As in figure 2 it is shown, P0、P1、P2For anchor point;CiFor PiData subset in from PiFarthest certain The distance of data point, i.e. PiThe radius of data subset;C is a constant, more than all of Ci
If complete or collected works are D, and given similarity dimensions inquiry Range (q, r), i.e. retrieval and data point q distance Set of data points less than radius r: Range (q, r)=x ∈ D, dist (q, x) < r}, wherein, function dist (q, x) Represent the vector q distance to data point x.
The retrieving of IDistance is:
(1) pass through and each anchor point PiDistance calculate: the search of q circle whether with this anchor point PiData Subset intersects.
The judgment formula intersected is: dist (q, Pi)<Ci+r
Disjoint judgment formula is: dist (q, Pi)>Ci+r
(2) if non-intersect, without searched targets point in the data subset of this anchor point;If intersecting, it is determined that search Ring body scope.The ring body scope of search is:
{x∈Pi,max(dist(Pi, q)-r, 0) < dist (Pi, x) < min (dist (Pi,q)+r,Ci)}
(3) determine the hunting zone of one-dimensional key value iDist, thus quickly search on B+ tree, The data point found enters Candidate Set.The hunting zone of one-dimensional key value iDist:
{x∈Pi,i*c+max(dist(Pi, q)-r, 0) < iDist (Pi, x) < i*c+min (dist (Pi,q)+r,Ci)}
(4) respectively with q, each data point in Candidate Set is carried out distance to calculate, if distance is less than r, then Enter final retrieval set.
IDistance, by the index problem of high dimension vector, is reduced to one-dimensional by the way of choosing anchor point cleverly On, one-dimensional index is organized by B+ tree, has the advantages that search is fast, saved substantial amounts of distance meter Calculate.
The M-Chord (M represents Metric) that NOVAK.D et al. proposes is distributed under a kind of metric space Index Algorithm, not only energy locating resource (equal lookup) under distributed p2p network, also extends similarity Search (range-based searching).IDistance algorithm is combined by this algorithm with Chord agreement, and IDistance algorithm is born The dimensionality reduction of duty high dimension vector, Chord agreement is responsible for the storage of distributed data, is successfully achieved high dimension vector Similarity search under distributed environment.IDistance with Chord is combined by M-Chord algorithm, logical Cross IDistance and high dimension vector is converted into one-dimensional key value, by hash function, one-dimensional key value is mapped to In the identifier space of Chord, inserted by Chord ring and retrieve data, as shown in Figure 3.
Certain node of M-Chord algorithm receives a range retrieval Range, and (Q, r) process is as follows, wherein Q For vector to be checked, r is query context radius.
(1) (Q, r) with the intersecting area clustered, maps to calculate range retrieval Range by IDistance Interval [xi, yi] for multiple key values.
(2) keep hash function h to xi, yi Hash by position, generate the key value model in Chord ring Enclose [h (xi), h (yi)].By the node at key value h (xi) place, table of query and routing location, if h (yi) is more than joint The key value maximum Key of stored data in pointmax, then by scope [Keymax, h (yi)] and it is sent to this node Descendant node.If h (yi) is than the Key of descendant nodemaxAlso want big, then continue the descendant node toward it and send Query Information.
(4) each node (including the node in server and Chord ring) receives inquiry request, The B of this node+-Tree retrieving in key value scope, whether directed quantity exists, if there is vector, with to be checked to Amount Q carries out distance and calculates, if distance is less than r, returns to be originally sent the node of request.
Marginal vectors in the cluster of M-Chord typically ratio is sparser, and the vector of these rarenesses makes each cluster Radius become the biggest.Range query when, radius is the biggest more the easy region with range-based searching is intersected, So that the region searched is increased.As long as this means that the region of range-based searching is intersected with each cluster, No matter intersecting area data number, be necessary for positioning a secondary data in Chord ring.These minimal amount of numbers It is greatly increased according to the number of times of the locating resource in Chord ring, therefore reduces the performance of M-Chord.
Fig. 4 is that the data that the characteristic of the color histogram of 68040 width images is clustered by Kmeans are distributed Figure.As can be seen from the figure the radius length of this cluster is 0.62, but most data is distributed in Between 0.09-0.35.Owing to the MARG of only a few causes the radius of cluster to add by about one time.
Fig. 5 is the data access frequency figure under 1000 random range-based searching in this Cluster space.Due to The when of range query, in this case it is not apparent that whether have data in query context, so there is no the interval of data in retrieval In also can go access.The contrast of Fig. 4 and Fig. 5 can be seen that, marginal vectors is rare, but these vectors are interviewed The frequency asked is the most at a relatively high, and substantially greater than 80%.
Summary of the invention
The present invention is to overcome at least one defect (not enough) described in above-mentioned prior art, it is provided that a kind of cluster point From distributed index method CS-Chord (Clustering separation-Chord), this indexing means reduce Hunting zone on Chord ring, improves recall precision.
For solving above-mentioned technical problem, technical scheme is as follows:
A kind of distributed index method clustering separation, comprises the following steps:
Step one: separation edge sparse vector, and use independent server centered storage edge sparse vector;
Step 2: set up distributed index, calculating needs the one-dimensional of edge sparse vector S of addition Chord ring Key value Key (s), and this vector is inserted into distributed index, the detailed process that vector inserts is;
(21) if Key (S) >=n*C, wherein n is the number of cluster subspace, and C is a constant, Its value more than all values in the DUAL PROBLEMS OF VECTOR MAPPING in IDistance index structure medium ring body to one-dimensional axle, then will be closed Key value Key (s) and vector S are sent on independent server, then vector S are inserted into this separate server B+In-Tree index, then this new vector has inserted;If Key (s) < n*C turns to step (22);
(22) keep hash function that Key (s) carries out Hash by position, generate and be assigned on Chord ring Key value KeyChord, utilize Chord location algorithm, search key value KeyChordThe node IP ground that should store Location, by KeyChordIt is sent on this node with this vector S, then vector S is inserted into the B of node+-Tree In index, index foundation completes;
Step 3: carry out range query based on constructed index, if the distributed index method that cluster separates (Q, r), wherein Q is vector to be checked to the range query Range of CS-Chord, and r is query context radius, step Rapid as follows:
(31) (Q, r) with the intersecting area clustered, maps to calculate range query Range by IDistance Interval [xi, yi] for multiple key values;
(32) if xi >=n*C, then by step (31) computer capacity inquiry Range, (Q, r) with cluster Intersecting area is sent on separate server, goes to step (34), if xi is < n*C, turns to step (33);
(33) generate the key value scope [h (xi), h (yi)] in Chord ring, closed by table of query and routing location The node at key assignments h (xi) place, if h (yi) is more than the key value maximum Key of stored data in nodemax, Then by scope [Keymax, h (yi)] and it is sent to the descendant node of this node, if h (yi) is still than descendant node KeymaxGreatly, then continue the descendant node toward it and send Query Information,
(34) each node receives inquiry request, at the B of this node+-Tree(B+Storage in-Tree The various S vector meeting condition) in retrieval key value scope whether directed quantity exist, if there is vector Z Then carry out distance with vector Q to be checked to calculate, when distance is less than inquiry radius r, then vector Z is returned to initially Send the node of request, if distance is more than or equal to inquiry radius r, then return null value.If there is not vector, Also return to null value.
Preferably, above-mentioned steps one separation edge sparse vector, and use independent server centered storage edge The detailed process of sparse vector data is:
If the dense vector of cluster is designated as R with the separation of edge sparse vectorb, the radius of cluster is R, then from The distance [0, R of cluster centre pointbRegion between] is dense vector district, [Rb, R] region be sparse vector District;
At n Cluster space, the calculating for sparse vector district key value Key (S) increases on the basis of original The distance of n*C, if vector S is a vector needing to add Chord ring, P is vector S point place cluster Central point, then the computing formula of key value Key (S) of vector S is as follows:
K e y ( S ) = i * C + d i s t ( S , P ) ( d i s t ( S , P ) &le; R b ) ( i + n ) * C + d i s t ( S , P ) ( d i s t ( S , P ) > R b ) - - - ( 1 )
Wherein 0≤i < n;
By formula (1), sparse vector is separated, and use independent server centered storage sparse data; Then as key value Key (the S) >=n*C of query vector S, directly access the server lookup of centralized stores.
Preferably, in above-mentioned steps two, position keeps hash function h to be defined as follows:
For data interval [Xmin, Xmax], interval [Y to be mapped that tomin, Ymax] and keep the consistent of data Property, it is assumed that Xi∈[Xmin, Xmax], after being mapped by function h, value is Yi, Yi∈[Ymin, Ymax], then this Kazakhstan Uncommon function is defined as:
h ( X i ) = Y m i n + ( Y m a x - Y m i n ) * ( X i - X min ) X m a x - X m i n - - - ( 2 )
Wherein, because of CS-Chord index key value interval be [0, Kmax], the identifier of Chord ring is empty Between scope be [0,2m-1], thus can be by Xmin=0, Xmax=Kmax,Ymin=0, Ymax=2m-1 substitutes into formula (2), can :
h ( K i ) = ( 2 m - 1 ) * K i K m a x - - - ( 3 ) .
It should be noted that the causa essendi that position keeps hash function is: the most right during common mapping method The key value Hash of data point, the most again to 2mValue after delivery is just for the key value of Chord ring.But this Plant way by the most adjacent Mapping of data points to different nodes.Need even for IDistance is this It is worthless for the continuous Index Algorithm searched, it is therefore desirable to the hash function that a kind of position keeps so that protect Hold the concordance of data order.
Compared with prior art, technical solution of the present invention provides the benefit that:
The present invention proposes a kind of distributed index method clustering separation, is called for short CS-Chord (Clustering separation-Chord).In M-Chord distributed index, the marginal vectors of cluster typically ratio is sparser, The vector of these rarenesses makes the radius of each cluster become the biggest.Range query when, radius is the biggest Cluster the easiest region with range-based searching to intersect, so that the region that candidate searches is increased.And the limit clustered Edge vector is typically again the vector of high access, and performance reduces further.CS-Chord of the present invention will The sparse vector at cluster edge is separated and is centrally stored on independent server, is stored in by dense vector In Chord ring, during lookup, on the one hand the inquiry of high frequency concentrates on the vector of separate server, on the other hand also subtracts Lack the hunting zone on Chord ring, thus improve recall precision.
Accompanying drawing explanation
Fig. 1 is Chord schematic diagram.
Fig. 2 is the schematic diagram of IDistance.
Fig. 3 is the schematic diagram of M-Chord.
Fig. 4 is certain Cluster space data profile.
Fig. 5 is the access frequency figure of certain stochastic clustering.
Fig. 6 is that two-dimensional space clusters edge separation schematic diagram.
Fig. 7 is that CS-Chord indexes schematic diagram.
Fig. 8 is CS-Chord range-based searching schematic diagram.
Detailed description of the invention
Accompanying drawing being merely cited for property explanation, it is impossible to be interpreted as the restriction to this patent;In order to this enforcement is more preferably described Example, some parts of accompanying drawing have omission, zoom in or out, do not represent the size of actual product;
To those skilled in the art, in accompanying drawing, some known features and explanation thereof may be omitted is to manage Solve.With embodiment, technical scheme is described further below in conjunction with the accompanying drawings.
A kind of distributed index method (CS-Chord) of separation that clusters of the present invention is by cluster edge Sparse vector is separated and is centrally stored on independent server, and dense vector is stored in Chord ring In, during lookup, on the one hand the inquiry of high frequency concentrates on the dense vector of separate server, on the other hand decreases Hunting zone on Chord ring, thus improve recall precision.It specifically comprises the following steps that
Step one: edge sparse vector separates
MARG is sparse and access frequency is high, it should centralised storage, eliminates Resource orientation in Chord ring Time.In order to reach this purpose, first have to.
If the dense vector of cluster and the separation of sparse vector, it is designated as Rb.If the radius of cluster is R.Then from The distance [0, R of cluster centre pointbRegion between] is the dense districts of data.[Rb, R] region be Sparse District.As shown in Figure 6, the data of two-dimensional space are divided into three Cluster spaces, the Dark grey part data of cluster Dense district, bright gray parts is Sparse district.The data in dense district are distributed in [0,3C].The number of rarefaction According to being distributed between [3C, 6C].
Assume total n Cluster space, Sparse district key value (Key) is calculated at original base The distance of n*C is increased on plinth.Assuming that vector S is a vector needing to add Chord ring, P is vector S The central point (anchor point) of some place cluster.Then the computing formula of the key value Key (S) of vector S is as follows:
K e y ( S ) = i * C + d i s t ( S , P ) ( d i s t ( S , P ) &le; R b ) ( i + n ) * C + d i s t ( S , P ) ( d i s t ( S , P ) > R b ) - - - ( 1 )
Wherein 0≤i < n.
By formula (1), it is possible to sparse data is separated, and puts in one section of continuous print region.Cause Even if these data are all put into Chord ring by this, also it is not result in that substantial amounts of Resource orientation operates.But, by It is high in the access frequency of sparse data, so this patent uses independent server centered storage sparse data.This During Key (the S) >=n*C of sample query context, directly access the server lookup of centralized stores.
Step 2: set up distributed index
Certain node passes through formula (1), calculates the one-dimensional key value Key of vector S, is inserted into by this vector The process of distributed index is:
(21) if Key >=n*C, then the information of key value Key and vector S is sent to independent service On device, it is then inserted into the B of server+In-Tree index, then this new vector has inserted.If Key < n*C turns to step (22).
(22) keep hash function that Key carries out Hash by position, generate the pass being assigned on Chord ring Key value KeyChord.Utilize Chord location algorithm, search key value KeyChordThe node IP that should store. Then the information of data point is sent on this node, is then inserted into the B of this node data+In-Tree index, Index foundation completes.
Wherein, position keeps hash function h to be defined as follows:
For data interval [Xmin, Xmax], interval [Y to be mapped that tomin, Ymax] and keep the consistent of data Property.Assume Xi∈[Xmin, Xmax], after being mapped by function h, value is Yi, Yi∈[Ymin, Ymax].Then this Kazakhstan Uncommon function can be defined as:
h ( X i ) = Y m i n + ( Y m a x - Y m i n ) * ( X i - X m i n ) X m a x - X m i n - - - ( 2 )
Wherein, because of CS-Chord index key value interval be [0, Kmax], the identifier of Chord ring is empty Between scope be [0,2m-1], thus can be by Xmin=0, Xmax=Kmax,Ymin=0, Ymax=2m-1 substitutes into formula (2), can :
h ( K i ) = ( 2 m - 1 ) * K i K m a x - - - ( 3 )
It should be noted that the causa essendi that position keeps hash function is: the most right during common mapping method The key value Hash of data point, the most again to 2mValue after delivery is just for the key value of Chord ring.But this Plant way by the most adjacent Mapping of data points to different nodes.Need even for IDistance is this It is worthless for the continuous Index Algorithm searched, it is therefore desirable to the hash function that a kind of position keeps so that protect Hold the concordance of data order.
Fig. 7 is the schematic diagram that in two-dimensional space, CS-Chord distributed index sets up process.
Step 3: range query
(wherein Q is to be checked for Q, schematic diagram r) to be illustrated in figure 8 the range query Range of CS-Chord Vector, r is query context radius, and step is as follows:
(31) (Q, r) with the intersecting area clustered, maps to calculate range query Range by IDistance Interval [xi, yi] for multiple key values.
(32) if xi >=n*C, then the information that step (31) calculates is sent on separate server, turns Step (34).If xi < n*C turns to step (33).
(33) the key value scope [h (xi), h (yi)] in Chord ring is generated.Closed by table of query and routing location The node at key assignments h (xi) place, if h (yi) is more than the key value maximum Key of stored data in nodemax, Then by scope [Keymax, h (yi)] and it is sent to the descendant node of this node.If h (yi) is than the Key of descendant nodemax Also want big, then continue the descendant node toward it and send Query Information.
(34) each node (including the node in server and Chord ring) receives inquiry request, The B of this node+-Tree retrieving in key value scope, whether directed quantity exists, if there is vector, with to be checked to Amount Q carries out distance and calculates, if distance is less than r, returns to be originally sent the node of request.
As shown in Figure 8, this inquiry Q and the cluster dense district of P0, rarefaction is the most crossing, dilute with cluster P1 Dredge district to intersect.The one-dimensional key value scope mapped is [x1, y1], [x2, y2] [x3, y3].[x1, y1] is interval Being sent in Chord ring retrieval, retrieval is sent in server in [x2, y2] [x3, y3] interval.
Obviously, the above embodiment of the present invention is only for clearly demonstrating example of the present invention, and not It it is the restriction to embodiments of the present invention.For those of ordinary skill in the field, in described above On the basis of can also make other changes in different forms.Here without also cannot be to all of enforcement Mode gives exhaustive.All any amendment, equivalent and improvement made within the spirit and principles in the present invention Deng, within should be included in the protection domain of the claims in the present invention.

Claims (3)

1. the distributed index method clustering separation, it is characterised in that comprise the following steps:
Step one: separation edge sparse vector, and use independent server centered storage edge sparse vector;
Step 2: set up distributed index, calculating needs the one-dimensional of edge sparse vector S of addition Chord ring Key value Key (S), and this vector is inserted into distributed index, the detailed process that vector inserts is;
(21) if Key (S) >=n*C, wherein n is the number of cluster subspace, and C is a constant, Its value more than all values in the DUAL PROBLEMS OF VECTOR MAPPING in IDistance index structure medium ring body to one-dimensional axle, then will be closed Key value Key (s) and vector S are sent on independent server, then vector S are inserted into this separate server B+In-Tree index, then this new vector has inserted;If Key (s) < n*C turns to step (22);
(22) keep hash function that Key (s) carries out Hash by position, generate and be assigned on Chord ring Key value KeyChord, utilize Chord location algorithm, search key value KeyChordThe node IP ground that should store Location, by KeyChordIt is sent on this node with this vector S, then vector S is inserted into the B of node+-Tree In index, index foundation completes;
Step 3: carry out range query based on constructed index, if the distributed index method that cluster separates (Q, r), wherein Q is vector to be checked to the range query Range of CS-Chord, and r is query context radius, step Rapid as follows:
(31) (Q, r) with the intersecting area clustered, maps to calculate range query Range by IDistance Interval [xi, yi] for multiple key values;
(32) if xi >=n*C, then by step (31) computer capacity inquiry Range, (Q, r) with cluster Intersecting area is sent on separate server, goes to step (34), if xi is < n*C, turns to step (33);
(33) generate the key value scope [h (xi), h (yi)] in Chord ring, closed by table of query and routing location The node at key assignments h (xi) place, if h (yi) is more than the key value maximum Key of stored data in nodemax, Then by scope [Keymax, h (yi)] and it is sent to the descendant node of this node, if h (yi) is still than descendant node KeymaxGreatly, then continue the descendant node toward it and send Query Information,
(34) each node receives inquiry request, at the B of this node+-Tree retrieves key value scope In whether directed quantity exist, if there is vector Z, carrying out distance with vector Q to be checked and calculating, when distance is less than Inquiry radius r, then return to be originally sent the node of request by vector Z, if distance is more than or equal to inquiry half During the r of footpath, then return null value;If there is not vector, also return to null value.
The distributed index method that cluster the most according to claim 1 separates, it is characterised in that above-mentioned Step one separation edge sparse vector, and use the tool of independent server centered storage edge sparse vector data Body process is:
If the dense vector of cluster is designated as R with the separation of edge sparse vectorb, the radius of cluster is R, then from The distance [0, R of cluster centre pointbRegion between] is dense vector district, [Rb, R] region be sparse vector District;
At n Cluster space, the calculating for sparse vector district key value Key (S) increases on the basis of original The distance of n*C, if vector S is a vector needing to add Chord ring, P is vector S point place cluster Central point, then the computing formula of key value Key (S) of vector S is as follows:
K e y ( S ) = i * C + d i s t ( S , P ) ( d i s t ( S , P ) &le; R b ) ( i + n ) * C + d i s t ( S , P ) ( d i s t ( S , P ) > R b ) - - - ( 1 )
Wherein 0≤i < n;
By formula (1), sparse vector is separated, and use independent server centered storage sparse data; Then as key value Key (the S) >=n*C of query vector S, directly access the server lookup of centralized stores.
The distributed index method that cluster the most according to claim 1 separates, it is characterised in that above-mentioned In step 2, position keeps hash function h to be defined as follows:
For data interval [Xmin, Xmax], interval [Y to be mapped that tomin, Ymax] and keep the consistent of data Property, it is assumed that Xi∈[Xmin, Xmax], after being mapped by function h, value is Yi, Yi∈[Ymin, Ymax], then this Kazakhstan Uncommon function is defined as:
h ( X i ) = Y m i n + ( Y m a x - Y m i n ) * ( X i - X m i n ) X m a x - X m i n - - - ( 2 )
Wherein, because of CS-Chord index key value interval be [0, Kmax], the identifier of Chord ring is empty Between scope be [0,2m-1], thus can be by Xmin=0, Xmax=Kmax,Ymin=0, Ymax=2m-1 substitutes into formula (2), can :
h ( K i ) = ( 2 m - 1 ) * K i K m a x - - - ( 3 ) .
CN201610287204.7A 2016-05-03 2016-05-03 A kind of distributed index method that cluster is isolated Expired - Fee Related CN105868414B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610287204.7A CN105868414B (en) 2016-05-03 2016-05-03 A kind of distributed index method that cluster is isolated

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610287204.7A CN105868414B (en) 2016-05-03 2016-05-03 A kind of distributed index method that cluster is isolated

Publications (2)

Publication Number Publication Date
CN105868414A true CN105868414A (en) 2016-08-17
CN105868414B CN105868414B (en) 2019-03-26

Family

ID=56630062

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610287204.7A Expired - Fee Related CN105868414B (en) 2016-05-03 2016-05-03 A kind of distributed index method that cluster is isolated

Country Status (1)

Country Link
CN (1) CN105868414B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110120918A (en) * 2019-05-10 2019-08-13 北京邮电大学 A kind of identification analytic method and device
CN111582224A (en) * 2020-05-19 2020-08-25 湖南视觉伟业智能科技有限公司 Face recognition system and method
CN113297331A (en) * 2020-09-27 2021-08-24 阿里云计算有限公司 Data storage method and device and data query method and device
CN116541420A (en) * 2023-07-07 2023-08-04 上海爱可生信息技术股份有限公司 Vector data query method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729434A (en) * 2013-12-26 2014-04-16 乐视网信息技术(北京)股份有限公司 Distributed index method and distributed index system for video data
CN103744934A (en) * 2013-12-30 2014-04-23 南京大学 Distributed index method based on LSH (Locality Sensitive Hashing)

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729434A (en) * 2013-12-26 2014-04-16 乐视网信息技术(北京)股份有限公司 Distributed index method and distributed index system for video data
CN103744934A (en) * 2013-12-30 2014-04-23 南京大学 Distributed index method based on LSH (Locality Sensitive Hashing)

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
林朝晖 等: "高维分布式局部敏感哈希索引方法", 《计算机科学与探索》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110120918A (en) * 2019-05-10 2019-08-13 北京邮电大学 A kind of identification analytic method and device
CN111582224A (en) * 2020-05-19 2020-08-25 湖南视觉伟业智能科技有限公司 Face recognition system and method
CN113297331A (en) * 2020-09-27 2021-08-24 阿里云计算有限公司 Data storage method and device and data query method and device
CN116541420A (en) * 2023-07-07 2023-08-04 上海爱可生信息技术股份有限公司 Vector data query method
CN116541420B (en) * 2023-07-07 2023-09-15 上海爱可生信息技术股份有限公司 Vector data query method

Also Published As

Publication number Publication date
CN105868414B (en) 2019-03-26

Similar Documents

Publication Publication Date Title
US8688708B2 (en) Storing and retrieving objects on a computer network in a distributed database
CN105868414A (en) Clustering separation distributive indexing method
CN104537091A (en) Networked relational data query method based on hierarchical identification routing
CN105357247A (en) Multi-dimensional cloud resource interval finding method based on hierarchical cloud peer-to-peer network
Xu et al. Energy‐efficient big data storage and retrieval for wireless sensor networks with nonuniform node distribution
CN101719155B (en) Method of multidimensional attribute range inquiry for supporting distributed multi-cluster computing environment
CN106020724A (en) Neighbor storage method based on data mapping algorithm
CN101902388A (en) Expandable fast discovery technology for multi-stage sequencing resources
Cui et al. Efficient skyline computation in structured peer-to-peer systems
CN108829695A (en) Flexible polymer K-NN search G-max method on road network
Mowshowitz et al. Query optimization in a distributed hypercube database
CN108829694A (en) The optimization method of flexible polymer K-NN search G tree on road network
Zhang et al. Storing and querying semi-structured spatio-temporal data in hbase
CN105989078B (en) A kind of method, the search method, apparatus and system of structured p2p network building index
Ganti et al. MP-trie: Fast spatial queries on moving objects
Rosch et al. Best effort query processing in dht-based p2p systems
CN102457568A (en) Internet of things information service system and method for processing information on system
CN106202303A (en) A kind of Chord routing table compression method and optimization file search method
Zhang et al. Indexing historical spatio-temporal data in the cloud
Zhou et al. HDKV: supporting efficient high‐dimensional similarity search in key‐value stores
Lee et al. Supporting similarity range queries efficiently by using reference points in structured p2p overlays
Zhu et al. DS-index: a distributed search solution for federated cloud
Jin et al. SCQR-A P2P query routing algorithm based on semantic cluster
Lu et al. Semantic information processing system based on CAN
Vu et al. Simpson: Efficient similarity search in metric spaces over P2P structured overlay networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190326

Termination date: 20190503

CF01 Termination of patent right due to non-payment of annual fee