CN105868414B - A kind of distributed index method that cluster is isolated - Google Patents

A kind of distributed index method that cluster is isolated Download PDF

Info

Publication number
CN105868414B
CN105868414B CN201610287204.7A CN201610287204A CN105868414B CN 105868414 B CN105868414 B CN 105868414B CN 201610287204 A CN201610287204 A CN 201610287204A CN 105868414 B CN105868414 B CN 105868414B
Authority
CN
China
Prior art keywords
vector
key
chord
cluster
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610287204.7A
Other languages
Chinese (zh)
Other versions
CN105868414A (en
Inventor
袁鑫攀
汪灿飞
何频捷
梁圣
满君丰
向平
向一平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Technology
Original Assignee
Hunan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Technology filed Critical Hunan University of Technology
Priority to CN201610287204.7A priority Critical patent/CN105868414B/en
Publication of CN105868414A publication Critical patent/CN105868414A/en
Application granted granted Critical
Publication of CN105868414B publication Critical patent/CN105868414B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention proposes a kind of distributed index method that cluster is isolated, abbreviation CS-Chord(Clustering separation-Chord).In M-Chord distributed index, for the marginal vectors of cluster generally than sparser, these rare vectors make the radius of each cluster become very big.When range query, the bigger cluster of radius is easier to intersect with the region of range-based searching, so that the candidate region searched is increased.And the vector of the marginal vectors usually high access clustered, performance further decrease.The sparse vector for clustering edge is separated and is centrally stored on independent server by CS-Chord of the present invention, dense vector is stored in Chord ring, the inquiry of one side high frequency concentrates on the vector of separate server when lookup, on the other hand the search range on Chord ring is decreased, to improve recall precision.

Description

A kind of distributed index method that cluster is isolated
Technical field
The present invention relates to distributed index fields, more particularly, to a kind of distributed index method that cluster is isolated.
Background technique
P2P peer-to-peer network does not depend on dedicated centralized server, and all nodes are all equality in network, freely mutually Connection.They are by exchanging resource and service come Sharing computer.Network section is adequately utilized in P2P distributed index structure The performance of each node in point has many advantages, such as that scalability is good, and resource utilization is high.In recent years, distributed to index It has been increasingly becoming the hot spot of research.The M-Chord that NOVAK.D et al. is proposed is that a kind of high dimension vector based on P2P network is similar Property retrieval distributed index algorithm.The algorithm by IDistance algorithm in conjunction with Chord agreement, be responsible for by IDistance algorithm High dimension vector dimensionality reduction, Chord agreement are responsible for distributed vector storage and retrieval.
Chord is a kind of distributed lookup agreement of structuring, is promptly positioned in P2P network by DHT technology Resource.In order to realize quick resource lookup, each the node on Chord ring requires maintenance O (log2 n) (n is Chord ring In node total number mesh) length routing table.In Chord agreement, node and data can all be obtained by mapping with a piece of sky Between in m bit identifier, by introduce dummy node so that each node is stored roughly equal data, i.e. Chord agreement is negative Carry equilibrium.Node route list is dispersion, and each node is it is only necessary to know that the routing iinformation of a small number of nodes is logical in whole system It crosses and constantly jumps inquiry and be obtained with query path information.The operation of one query only needs to generate O (log in ring2 n) item Message.
Distributed hash (DHT) is to distribute a unique identifier in a certain way for each node in network. In Chord agreement, data resource also presses one unique identifier of same rule distribution.Chord agreement uses consistency Hash algorithm (Consistent Hash) comes calculate node and resource, and the result of mapping passes through to 2mModulus obtains one m Identifier, range is [0,2m-1].For node, IP address is uniquely that consistency Hash passes through the IP address to node Hash obtains node identifier.For data, the identifier of data is obtained with by key value Hash.M=2, N4's Shown in Chord ring such as Fig. 1 (a), Ni is node, and Ki is resource.
The routing iinformation for positioning each node relied primarily on rapidly and being saved of resource.In the data structure of each node There is a routing table, the data and address information of part of nodes is saved, as shown in Fig. 1 (b).
The lookup of Chord can be divided into following steps:
(1) some node N receives key value key to be checked, searches for whether have the pass in the local resource of node N first Key assignments searches end and return node resource if node N has the key value, otherwise turns to step (2).
(2) pointer gauge for checking requested node, finds the identifier less than key value mapping and apart from nearest node, Then search request is sent on the node, is repeated step (1).
IDistance is a kind of high dimension vector indexing means based on metric space.The basic thought that its index is established It is: chooses several anchor points in entire data space, each anchor point corresponds to a cluster subset.Every number of data space Strong point is all divided into the cluster subset of the anchor point nearest from the data point.Then high dimension vector by turning at a distance from anchor point The one-dimensional key value iDist that can be measured is turned to, B is utilized+- Tree organizes the key value of all high dimension vectors IDist, the calculation formula of key value iDist are as follows: iDist (x)=dist (pi,x)+i*c.As shown in Fig. 2, P0、P1、P2For anchor Point;CiFor PiData subset in from PiThe distance of some farthest data point, i.e. PiData subset radius;C is one normal Amount, greater than all Ci
If complete or collected works are D, similarity dimensions inquiry Range (q, r) is given, i.e. retrieval is less than half with data point q distance The set of data points of diameter r: Range (q, r)={ x ∈ D, dist (q, x) < r }, wherein function dist (q, x) indicates that vector q is arrived The distance of data point x.
The retrieving of IDistance are as follows:
(1) pass through and each anchor point PiDistance calculate: the search circle of q whether with anchor point PiData subset intersection.
The judgment formula of intersection are as follows: dist (q, Pi)<Ci+r
Disjoint judgment formula are as follows: dist (q, Pi)>Ci+r
(2) if non-intersecting in the data subset of the anchor point without searched targets point;If intersection, it is determined that the ring body model of search It encloses.The ring body range of search are as follows:
{x∈Pi,max(dist(Pi, q) and-r, 0) < dist (Pi, x) and < min (dist (Pi,q)+r,Ci)}
(3) search range of one-dimensional key value iDist is determined, to quickly be searched on B+ tree, the number found Strong point enters Candidate Set.The search range of one-dimensional key value iDist:
{x∈Pi,i*c+max(dist(Pi, q) and-r, 0) < iDist (Pi, x) and < i*c+min (dist (Pi,q)+r,Ci)}
(4) each data point in Candidate Set is carried out with q apart from calculating respectively, if distance is less than r, is entered finally Retrieval set.
The index problem of high dimension vector is cleverly reduced on one-dimensional by IDistance by way of choosing anchor point, will One-dimensional index carries out tissue by B+ tree, has the characteristics that search is fast, has saved a large amount of distance and has calculated.
The M-Chord (M indicate Metric) that NOVAK.D et al. is proposed is distributed index algorithm under a kind of metric space, It not only can locating resource (equal lookup), also extension similarity searching (range-based searching) under distributed p2p network.The algorithm By IDistance algorithm in conjunction with Chord agreement, IDistance algorithm is responsible for the dimensionality reduction of high dimension vector, and Chord agreement is responsible for Distributed data storage, successfully realizes similarity search of the high dimension vector under distributed environment.M-Chord algorithm will IDistance is combined with Chord, converts one-dimensional key value for high dimension vector by IDistance, will by hash function One-dimensional key value is mapped in the identifier space of Chord, data is inserted into and retrieved by Chord ring, as shown in Figure 3.
It is as follows that some node of M-Chord algorithm receives range retrieval Range (Q, a r) process, wherein Q be it is to be checked to Amount, r are query context radius.
(1) intersecting area that range retrieval Range (Q, r) and cluster are calculated by IDistance, is mapped as multiple passes Key assignments section [xi, yi].
(2) keep hash function h to xi by position, yi Hash generates key value range [h (xi), h in Chord ring (yi)].The node where key value h (xi) is positioned by table of query and routing, if h (yi) is greater than the pass of stored data in node Key assignments maximum value Keymax, then by range [Keymax, h (yi)] and it is sent to the descendant node of the node.If h (yi) is than subsequent section The Key of pointmaxAlso then to continue to send query information toward its descendant node greatly.
(4) each node (including the node in server and Chord ring) receives inquiry request, in the B of this node+Retrieve in key value range whether directed quantity exists in-Tree, vector is then carried out with vector Q to be checked apart from calculating if it exists, if Distance is less than r, back to the node for being originally sent request.
For marginal vectors in the cluster of M-Chord generally than sparser, these rare vectors make the half of each cluster Diameter becomes very big.When range query, the bigger radius the easier to intersect with the region of range-based searching, so that search Increase in region.As long as this means that the region of range-based searching is intersected with each cluster, regardless of intersecting area data number, A data must be just positioned in Chord ring.The number of these locating resources of the minimal amount of data in Chord ring is significantly Increase, therefore reduces the performance of M-Chord.
Fig. 4 is the data profile that the characteristic of the color histogram of 68040 width images is clustered by Kmeans.From figure In it can be seen that radius length of this cluster is 0.62, but most data is distributed between 0.09-0.35.Due to pole A small number of edge datas causes the radius of cluster to increase will by about one time.
Fig. 5 is the data access frequency figure under 1000 random range-based searchings in the Cluster space.Since range is looked into When inquiry, in this case it is not apparent that whether have data in query context, so the section of data can not gone to access in retrieval yet.Figure It can be seen that marginal vectors are rare in the comparison of 4 and Fig. 5, but the frequency that these vectors are accessed is quite high, substantially greatly In 80%.
Summary of the invention
The present invention in order to overcome at least one of the drawbacks of the prior art described above (deficiency), provides a kind of point of cluster separation Cloth indexing means CS-Chord (Clustering separation-Chord), which reduces on Chord ring Recall precision is improved in search range.
In order to solve the above technical problems, technical scheme is as follows:
A kind of distributed index method that cluster is isolated, comprising the following steps:
Step 1: separation edge sparse vector, and edge sparse vector is stored using independent server centered;
Step 2: establishing distributed index, calculates the one-dimensional key value for needing to be added the edge sparse vector S of Chord ring Key (s), and the vector is inserted into distributed index, the detailed process of vector insertion is;
(21) if Key (S) >=n*C, wherein n is the number for clustering subspace, and C is a constant, and value is greater than All values in IDistance index structure middle ring intracorporal DUAL PROBLEMS OF VECTOR MAPPING to one-dimensional axis, then by key value Key (s) and vector S It is sent on independent server, then vector S is inserted into the B of the separate server+In-Tree index, then the new vector Insertion is completed;If Key (s) < n*C turns to step (22);
(22) it keeps hash function to carry out Hash to Key (s) by position, generates the key value being assigned on Chord ring KeyChord, using Chord location algorithm, search key value KeyChordThe node IP address that should be stored, by KeyChordWith the vector S is sent on the node, then vector S is inserted into the B of node+In-Tree index, index, which is established, to be completed;
Step 3: carrying out range query based on constructed index, if the distributed index method CS- that cluster is isolated The range query Range (Q, r) of Chord, wherein Q is vector to be checked, and r is query context radius, and steps are as follows:
(31) intersecting area that range query Range (Q, r) and cluster are calculated by IDistance, is mapped as multiple Key value section [xi, yi];
(32) if xi >=n*C, the intersecting area of step (31) computer capacity inquiry Range (Q, r) and cluster is sent out It is sent on separate server, goes to step (34), step (33) are turned to if xi < n*C;
(33) the key value range [h (xi), h (yi)] in Chord ring is generated, key value h is positioned by table of query and routing (xi) node where, if h (yi) is greater than the key value maximum value Key of stored data in nodemax, then by range [Keymax, H (yi)] it is sent to the descendant node of the node, if h (yi) is still than the Key of descendant nodemaxGreatly, then continue toward the subsequent of it Node sends query information,
(34) each node receives inquiry request, in the B of this node+-Tree(B+What is stored in-Tree is to meet item The various S vectors of part) in retrieval key value range in whether directed quantity exist, if it exists vector Z then with vector Q to be checked carry out away from From calculating, when distance is less than inquiry radius r, then by vector Z back to the node for being originally sent request, if distance is greater than or equal to When inquiring radius r, then null value is returned.Vector if it does not exist also returns to null value.
Preferably, one separation edge sparse vector of above-mentioned steps, and it is sparse using independent server centered storage edge The detailed process of vector data are as follows:
If the dense vector of cluster and the separation of edge sparse vector are denoted as Rb, the radius of cluster is R, then from cluster The distance [0, R of heart pointb] between region be dense vector area, [Rb, R] region be sparse vector area;
In n Cluster space, n*C's is increased on the basis of the original for the calculating of sparse vector area key value Key (S) Distance, if vector S is the vector for needing to be added Chord ring, P is the central point clustered where vector S point, then vector S The calculation formula of key value Key (S) is as follows:
Wherein 0≤i < n;
By formula (1), sparse vector is separated, and stores sparse data using independent server centered;Then when looking into When asking key value Key (S) >=n*C of vector S, centrally stored server inquiry is directly accessed.
Preferably, in above-mentioned steps two, position keeps hash function h to be defined as follows:
For data interval [Xmin, Xmax], to map that section [Ymin, Ymax] and keep the consistency of data, it is false If Xi∈[Xmin, Xmax], value is Y after being mapped by function hi, Yi∈[Ymin, Ymax], then the hash function is defined as:
Wherein, because the section of the index key value of CS-Chord is [0, Kmax], the identifier space range of Chord ring is [0,2mIt -1], therefore can be by Xmin=0, Xmax=Kmax,Ymin=0, Ymax=2m- 1 substitutes into formula (2), can obtain:
It should be noted that position keeps the causa essendi of hash function to be: first to data when common mapping method The key value Hash of point, then again to 2mValue after modulus is just the key value of Chord ring.But this way will be originally adjacent Mapping of data points to different nodes on.It is not for the Index Algorithm for needing continuously to search this for IDistance Desirable, it is therefore desirable to a kind of hash function that position is kept, so that keeping the consistency of data sequence.
Compared with prior art, the beneficial effect of technical solution of the present invention is:
The invention proposes a kind of distributed index method that cluster is isolated, abbreviation CS-Chord (Clustering separation-Chord).In M-Chord distributed index, for the marginal vectors of cluster generally than sparser, these are rare Vector makes the radius of each cluster become very big.When range query, the bigger cluster of radius is easier to be looked into range The region intersection looked for, so that the candidate region searched is increased.And cluster marginal vectors usually high access to Amount, performance further decrease.CS-Chord of the present invention separates the sparse vector for clustering edge and centrally stored On independent server, dense vector is stored in Chord ring, the inquiry of one side high frequency concentrates on independent clothes when lookup The vector of business device, on the other hand decreases the search range on Chord ring, to improve recall precision.
Detailed description of the invention
Fig. 1 is Chord schematic diagram.
Fig. 2 is the schematic diagram of IDistance.
Fig. 3 is the schematic diagram of M-Chord.
Fig. 4 is certain Cluster space data profile.
Fig. 5 is the access frequency figure of some stochastic clustering.
Fig. 6 is two-dimensional space cluster edge separation schematic diagram.
Fig. 7 is CS-Chord index schematic diagram.
Fig. 8 is CS-Chord range-based searching schematic diagram.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;In order to better illustrate this embodiment, attached Scheme certain components to have omission, zoom in or out, does not represent the size of actual product;
To those skilled in the art, it is to be understood that certain known features and its explanation, which may be omitted, in attached drawing 's.The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.
A kind of isolated distributed index method (CS-Chord) of cluster of the present invention by cluster edge it is sparse to Amount is separated and is centrally stored on independent server, and dense vector is stored in Chord ring, on the one hand high when lookup The inquiry of frequency concentrates on the dense vector of separate server, on the other hand decreases the search range on Chord ring, to mention High recall precision.The specific steps of which are as follows:
Step 1: edge sparse vector separation
Edge data is sparse and access frequency is high, it should centralised storage, eliminate Resource orientation in Chord ring when Between.In order to reach this purpose, it first has to.
If the dense vector of cluster and the separation of sparse vector, are denoted as Rb.If the radius of cluster is R.Then from cluster The distance [0, R of heart pointb] between region be the dense area of data.[Rb, R] region be Sparse area.As shown in fig. 6, two-dimentional The data in space are divided into three Cluster spaces, and the dense area of dark gray section data of cluster, bright gray parts are Sparse Area.The data distribution in dense area is in [0,3C].The data distribution of rarefaction is between [3C, 6C].
Assuming that sharing n Cluster space, the calculating of Sparse area key value (Key) is increased on the basis of the original The distance of n*C.Assuming that vector S is the vector for needing to be added Chord ring, P is the central point (anchor of cluster where vector S point Point).Then the calculation formula of the key value Key (S) of vector S is as follows:
Wherein 0≤i < n.
Pass through formula (1), so that it may sparse data be separated, and be put into one section of continuous region.Even if therefore These data are all put into Chord ring, a large amount of Resource orientation will not be caused to operate.But due to the access of sparse data Frequency is high, so this patent stores sparse data using independent server centered.Key (S) >=n*C of query context in this way When, directly access centrally stored server inquiry.
Step 2: distributed index is established
Some node calculates the one-dimensional key value Key of vector S, which is inserted into distributed rope by formula (1) The process drawn are as follows:
(21) it if Key >=n*C, sends the information of key value Key and vector S on independent server, then Insert it into the B of server+In-Tree index, then the new vector insertion is completed.If Key < n*C turns to step (22).
(22) it keeps hash function to carry out Hash to Key by position, generates the key value being assigned on Chord ring KeyChord.Using Chord location algorithm, key value Key is searchedChordThe node IP that should be stored.Then the information of data point is sent out It is sent on the node, is then inserted into the B of the node data+In-Tree index, index, which is established, to be completed.
Wherein, position keeps hash function h to be defined as follows:
For data interval [Xmin, Xmax], to map that section [Ymin, Ymax] and keep data consistency.It is false If Xi∈[Xmin, Xmax], value is Y after being mapped by function hi, Yi∈[Ymin, Ymax].Then the hash function can be with is defined as:
Wherein, because the section of the index key value of CS-Chord is [0, Kmax], the identifier space range of Chord ring is [0,2mIt -1], therefore can be by Xmin=0, Xmax=Kmax,Ymin=0, Ymax=2m- 1 substitutes into formula (2), can obtain:
It should be noted that position keeps the causa essendi of hash function to be: first to data when common mapping method The key value Hash of point, then again to 2mValue after modulus is just the key value of Chord ring.But this way will be originally adjacent Mapping of data points to different nodes on.It is not for the Index Algorithm for needing continuously to search this for IDistance Desirable, it is therefore desirable to a kind of hash function that position is kept, so that keeping the consistency of data sequence.
Fig. 7 is the schematic diagram of CS-Chord distributed index establishment process in two-dimensional space.
Step 3: range query
It is illustrated in figure 8 the schematic diagram of the range query Range (Q, r) of CS-Chord, wherein Q is vector to be checked, and r is to look into Range radius is ask, steps are as follows:
(31) intersecting area that range query Range (Q, r) and cluster are calculated by IDistance, is mapped as multiple Key value section [xi, yi].
(32) it if xi >=n*C, sends the information that step (31) calculate on separate server, goes to step (34). If xi < n*C turns to step (33).
(33) the key value range [h (xi), h (yi)] in Chord ring is generated.Key value h is positioned by table of query and routing (xi) node where, if h (yi) is greater than the key value maximum value Key of stored data in nodemax, then by range [Keymax, H (yi)] it is sent to the descendant node of the node.If h (yi) is than the Key of descendant nodemaxAlso then to continue toward after it greatly Query information is sent after node.
(34) each node (including the node in server and Chord ring) receives inquiry request, in the B of this node+Retrieve in key value range whether directed quantity exists in-Tree, vector is then carried out with vector Q to be checked apart from calculating if it exists, if Distance is less than r, back to the node for being originally sent request.
As shown in Figure 8, inquiry Q and the cluster dense area P0, rarefaction are all intersected, and are intersected with the cluster rarefaction P1.It reflects The one-dimensional key value range penetrated is [x1, y1], [x2, y2] [x3, y3].The section [x1, y1], which is sent in Chord ring, to be retrieved, [x2, Y2] section [x3, y3] is sent in server and retrieves.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims (2)

1. a kind of distributed index method that cluster is isolated, which comprises the following steps:
Step 1: separation edge sparse vector, and edge sparse vector is stored using independent server centered;
Step 2: establishing distributed index, calculates the one-dimensional key value Key for needing to be added the edge sparse vector S of Chord ring (S), it and by the vector is inserted into distributed index, the detailed process of vector insertion is;
(21) if Key (S) >=n*C, wherein n is the number for clustering subspace, and C is a constant, and value is greater than IDistance All values in index structure middle ring intracorporal DUAL PROBLEMS OF VECTOR MAPPING to one-dimensional axis then send key value Key (S) and vector S to solely On vertical server, then vector S is inserted into the B of the separate server+In-Tree index, then the new vector insertion is completed; If Key (S) < n*C turns to step (22);
(22) it keeps hash function to carry out Hash to Key (S) by position, generates the key value being assigned on Chord ring KeyChord, using Chord location algorithm, search key value KeyChordThe node IP address that should be stored, by KeyChordWith the vector S is sent on the node, then vector S is inserted into the B of node+In-Tree index, index, which is established, to be completed;
Position keeps hash function h to be defined as follows:
For data interval [Xmin, Xmax], to map that section [Ymin, Ymax] and keep data consistency, it is assumed that Xi∈ [Xmin, Xmax], value is Y after being mapped by function hi, Yi∈[Ymin, Ymax], then the hash function is defined as:
Wherein, because the section of the index key value of CS-Chord is [0, Kmax], the identifier space range of Chord ring is [0,2m- It 1], therefore can be by Xmin=0, Xmax=Kmax,Ymin=0, Ymax=2m- 1 substitutes into formula (2), can obtain:
Step 3: carrying out range query based on constructed index, if clustering isolated distributed index method CS-Chord's Range query Range (Q, r), wherein Q is vector to be checked, and r is query context radius, and steps are as follows:
(31) intersecting area that range query Range (Q, r) and cluster are calculated by IDistance, is mapped as multiple keys It is worth section [xi, yi];
(32) it if xi >=n*C, sends step (31) computer capacity inquiry Range (Q, r) and the intersecting area of cluster to (34) on separate server, are gone to step, step (33) are turned to if xi < n*C;
(33) the key value range [h (xi), h (yi)] in Chord ring is generated, key value h (xi) is positioned by table of query and routing The node at place, if h (yi) is greater than the key value maximum value Key of stored data in nodemax, then by range [Keymax, h (yi)] it is sent to the descendant node of the node, if h (yi) is still than the Key of descendant nodemaxGreatly, then continue the subsequent section toward it Point sends query information,
(34) each node receives inquiry request, in the B of this node+In-Tree retrieve key value range in whether directed quantity In the presence of vector Z then carries out inquiring radius r apart from calculating when distance is less than, then returning to vector Z with vector Q to be checked if it exists It is originally sent the node of request, if distance is greater than or equal to inquiry radius r, null value is returned, if it does not exist vector, also returns Make the return trip empty value.
2. the isolated distributed index method of cluster according to claim 1, which is characterized in that above-mentioned steps one separate side Edge sparse vector, and use the detailed process of independent server centered storage edge sparse vector data are as follows:
If the dense vector of cluster and the separation of edge sparse vector are denoted as Rb, the radius of cluster is R, then from cluster centre point Distance [0, Rb] between region be dense vector area, [Rb, R] region be sparse vector area;
In n Cluster space, for the calculating of sparse vector area key value Key (S) increase on the basis of the original n*C away from From if vector S is the vector for needing to be added Chord ring, P is the central point of cluster where vector S point, then the pass of vector S The calculation formula of key value Key (S) is as follows:
Wherein 0≤i < n;
By formula (1), sparse vector is separated, and stores sparse data using independent server centered;Then when inquiry to When measuring key value Key (S) >=n*C of S, centrally stored server inquiry is directly accessed.
CN201610287204.7A 2016-05-03 2016-05-03 A kind of distributed index method that cluster is isolated Expired - Fee Related CN105868414B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610287204.7A CN105868414B (en) 2016-05-03 2016-05-03 A kind of distributed index method that cluster is isolated

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610287204.7A CN105868414B (en) 2016-05-03 2016-05-03 A kind of distributed index method that cluster is isolated

Publications (2)

Publication Number Publication Date
CN105868414A CN105868414A (en) 2016-08-17
CN105868414B true CN105868414B (en) 2019-03-26

Family

ID=56630062

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610287204.7A Expired - Fee Related CN105868414B (en) 2016-05-03 2016-05-03 A kind of distributed index method that cluster is isolated

Country Status (1)

Country Link
CN (1) CN105868414B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110120918B (en) * 2019-05-10 2020-05-08 北京邮电大学 Identification analysis method and device
CN111582224A (en) * 2020-05-19 2020-08-25 湖南视觉伟业智能科技有限公司 Face recognition system and method
CN113297331B (en) * 2020-09-27 2022-09-09 阿里云计算有限公司 Data storage method and device and data query method and device
CN116541420B (en) * 2023-07-07 2023-09-15 上海爱可生信息技术股份有限公司 Vector data query method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729434A (en) * 2013-12-26 2014-04-16 乐视网信息技术(北京)股份有限公司 Distributed index method and distributed index system for video data
CN103744934A (en) * 2013-12-30 2014-04-23 南京大学 Distributed index method based on LSH (Locality Sensitive Hashing)

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729434A (en) * 2013-12-26 2014-04-16 乐视网信息技术(北京)股份有限公司 Distributed index method and distributed index system for video data
CN103744934A (en) * 2013-12-30 2014-04-23 南京大学 Distributed index method based on LSH (Locality Sensitive Hashing)

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高维分布式局部敏感哈希索引方法;林朝晖 等;《计算机科学与探索》;20130528;第7卷(第9期);第811-819页

Also Published As

Publication number Publication date
CN105868414A (en) 2016-08-17

Similar Documents

Publication Publication Date Title
CN105868414B (en) A kind of distributed index method that cluster is isolated
US7788400B2 (en) Utilizing proximity information in an overlay network
US7483391B2 (en) Providing a notification including location information for nodes in an overlay network
Li et al. Semantic small world: An overlay network for peer-to-peer search
Shen et al. A distributed spatial-temporal similarity data storage scheme in wireless sensor networks
Doulkeridis et al. Peer-to-peer similarity search in metric spaces
CN110022234B (en) Method for realizing unstructured data sharing mechanism facing edge calculation
CN107656989B (en) Nearest Neighbor based on data distribution perception in cloud storage system
EP1859602B1 (en) Distributed storing of network position information for nodes
CN101719155B (en) Method of multidimensional attribute range inquiry for supporting distributed multi-cluster computing environment
US9292559B2 (en) Data distribution/retrieval using multi-dimensional index
Li et al. A small world overlay network for semantic based search in P2P systems
Shen A P2P-based intelligent resource discovery mechanism in Internet-based distributed systems
Ishi et al. Range-key extension of the skip graph
CN108446356B (en) Data caching method, server and data caching system
CN113010373A (en) Data monitoring method and device, electronic equipment and storage medium
CN112256638A (en) Method for searching limited decentralized distributed hash table resources in CNFS protocol
Shen et al. Combining efficiency, fidelity, and flexibility in resource information services
Lee et al. Supporting similarity range queries efficiently by using reference points in structured p2p overlays
Zhou et al. HDKV: supporting efficient high‐dimensional similarity search in key‐value stores
Jin et al. SCQR-A P2P query routing algorithm based on semantic cluster
Zhang et al. Indexing historical spatio-temporal data in the cloud
Amagata et al. Efficient Multidimensional Top‐k Query Processing in Wireless Multihop Networks
Chen et al. Reverse nearest neighbor search in peer-to-peer systems
Brindha et al. Reliable And Ascendable Content Based Image Retrieval Aproach In Peer To Peer Networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190326

Termination date: 20190503