CN105868414B - A kind of distributed index method that cluster is isolated - Google Patents
A kind of distributed index method that cluster is isolated Download PDFInfo
- Publication number
- CN105868414B CN105868414B CN201610287204.7A CN201610287204A CN105868414B CN 105868414 B CN105868414 B CN 105868414B CN 201610287204 A CN201610287204 A CN 201610287204A CN 105868414 B CN105868414 B CN 105868414B
- Authority
- CN
- China
- Prior art keywords
- vector
- key
- chord
- cluster
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2272—Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
- G06F16/134—Distributed indices
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention proposes a kind of distributed index method that cluster is isolated, abbreviation CS-Chord(Clustering separation-Chord).In M-Chord distributed index, for the marginal vectors of cluster generally than sparser, these rare vectors make the radius of each cluster become very big.When range query, the bigger cluster of radius is easier to intersect with the region of range-based searching, so that the candidate region searched is increased.And the vector of the marginal vectors usually high access clustered, performance further decrease.The sparse vector for clustering edge is separated and is centrally stored on independent server by CS-Chord of the present invention, dense vector is stored in Chord ring, the inquiry of one side high frequency concentrates on the vector of separate server when lookup, on the other hand the search range on Chord ring is decreased, to improve recall precision.
Description
Technical field
The present invention relates to distributed index fields, more particularly, to a kind of distributed index method that cluster is isolated.
Background technique
P2P peer-to-peer network does not depend on dedicated centralized server, and all nodes are all equality in network, freely mutually
Connection.They are by exchanging resource and service come Sharing computer.Network section is adequately utilized in P2P distributed index structure
The performance of each node in point has many advantages, such as that scalability is good, and resource utilization is high.In recent years, distributed to index
It has been increasingly becoming the hot spot of research.The M-Chord that NOVAK.D et al. is proposed is that a kind of high dimension vector based on P2P network is similar
Property retrieval distributed index algorithm.The algorithm by IDistance algorithm in conjunction with Chord agreement, be responsible for by IDistance algorithm
High dimension vector dimensionality reduction, Chord agreement are responsible for distributed vector storage and retrieval.
Chord is a kind of distributed lookup agreement of structuring, is promptly positioned in P2P network by DHT technology
Resource.In order to realize quick resource lookup, each the node on Chord ring requires maintenance O (log2 n) (n is Chord ring
In node total number mesh) length routing table.In Chord agreement, node and data can all be obtained by mapping with a piece of sky
Between in m bit identifier, by introduce dummy node so that each node is stored roughly equal data, i.e. Chord agreement is negative
Carry equilibrium.Node route list is dispersion, and each node is it is only necessary to know that the routing iinformation of a small number of nodes is logical in whole system
It crosses and constantly jumps inquiry and be obtained with query path information.The operation of one query only needs to generate O (log in ring2 n) item
Message.
Distributed hash (DHT) is to distribute a unique identifier in a certain way for each node in network.
In Chord agreement, data resource also presses one unique identifier of same rule distribution.Chord agreement uses consistency
Hash algorithm (Consistent Hash) comes calculate node and resource, and the result of mapping passes through to 2mModulus obtains one m
Identifier, range is [0,2m-1].For node, IP address is uniquely that consistency Hash passes through the IP address to node
Hash obtains node identifier.For data, the identifier of data is obtained with by key value Hash.M=2, N4's
Shown in Chord ring such as Fig. 1 (a), Ni is node, and Ki is resource.
The routing iinformation for positioning each node relied primarily on rapidly and being saved of resource.In the data structure of each node
There is a routing table, the data and address information of part of nodes is saved, as shown in Fig. 1 (b).
The lookup of Chord can be divided into following steps:
(1) some node N receives key value key to be checked, searches for whether have the pass in the local resource of node N first
Key assignments searches end and return node resource if node N has the key value, otherwise turns to step (2).
(2) pointer gauge for checking requested node, finds the identifier less than key value mapping and apart from nearest node,
Then search request is sent on the node, is repeated step (1).
IDistance is a kind of high dimension vector indexing means based on metric space.The basic thought that its index is established
It is: chooses several anchor points in entire data space, each anchor point corresponds to a cluster subset.Every number of data space
Strong point is all divided into the cluster subset of the anchor point nearest from the data point.Then high dimension vector by turning at a distance from anchor point
The one-dimensional key value iDist that can be measured is turned to, B is utilized+- Tree organizes the key value of all high dimension vectors
IDist, the calculation formula of key value iDist are as follows: iDist (x)=dist (pi,x)+i*c.As shown in Fig. 2, P0、P1、P2For anchor
Point;CiFor PiData subset in from PiThe distance of some farthest data point, i.e. PiData subset radius;C is one normal
Amount, greater than all Ci。
If complete or collected works are D, similarity dimensions inquiry Range (q, r) is given, i.e. retrieval is less than half with data point q distance
The set of data points of diameter r: Range (q, r)={ x ∈ D, dist (q, x) < r }, wherein function dist (q, x) indicates that vector q is arrived
The distance of data point x.
The retrieving of IDistance are as follows:
(1) pass through and each anchor point PiDistance calculate: the search circle of q whether with anchor point PiData subset intersection.
The judgment formula of intersection are as follows: dist (q, Pi)<Ci+r
Disjoint judgment formula are as follows: dist (q, Pi)>Ci+r
(2) if non-intersecting in the data subset of the anchor point without searched targets point;If intersection, it is determined that the ring body model of search
It encloses.The ring body range of search are as follows:
{x∈Pi,max(dist(Pi, q) and-r, 0) < dist (Pi, x) and < min (dist (Pi,q)+r,Ci)}
(3) search range of one-dimensional key value iDist is determined, to quickly be searched on B+ tree, the number found
Strong point enters Candidate Set.The search range of one-dimensional key value iDist:
{x∈Pi,i*c+max(dist(Pi, q) and-r, 0) < iDist (Pi, x) and < i*c+min (dist (Pi,q)+r,Ci)}
(4) each data point in Candidate Set is carried out with q apart from calculating respectively, if distance is less than r, is entered finally
Retrieval set.
The index problem of high dimension vector is cleverly reduced on one-dimensional by IDistance by way of choosing anchor point, will
One-dimensional index carries out tissue by B+ tree, has the characteristics that search is fast, has saved a large amount of distance and has calculated.
The M-Chord (M indicate Metric) that NOVAK.D et al. is proposed is distributed index algorithm under a kind of metric space,
It not only can locating resource (equal lookup), also extension similarity searching (range-based searching) under distributed p2p network.The algorithm
By IDistance algorithm in conjunction with Chord agreement, IDistance algorithm is responsible for the dimensionality reduction of high dimension vector, and Chord agreement is responsible for
Distributed data storage, successfully realizes similarity search of the high dimension vector under distributed environment.M-Chord algorithm will
IDistance is combined with Chord, converts one-dimensional key value for high dimension vector by IDistance, will by hash function
One-dimensional key value is mapped in the identifier space of Chord, data is inserted into and retrieved by Chord ring, as shown in Figure 3.
It is as follows that some node of M-Chord algorithm receives range retrieval Range (Q, a r) process, wherein Q be it is to be checked to
Amount, r are query context radius.
(1) intersecting area that range retrieval Range (Q, r) and cluster are calculated by IDistance, is mapped as multiple passes
Key assignments section [xi, yi].
(2) keep hash function h to xi by position, yi Hash generates key value range [h (xi), h in Chord ring
(yi)].The node where key value h (xi) is positioned by table of query and routing, if h (yi) is greater than the pass of stored data in node
Key assignments maximum value Keymax, then by range [Keymax, h (yi)] and it is sent to the descendant node of the node.If h (yi) is than subsequent section
The Key of pointmaxAlso then to continue to send query information toward its descendant node greatly.
(4) each node (including the node in server and Chord ring) receives inquiry request, in the B of this node+Retrieve in key value range whether directed quantity exists in-Tree, vector is then carried out with vector Q to be checked apart from calculating if it exists, if
Distance is less than r, back to the node for being originally sent request.
For marginal vectors in the cluster of M-Chord generally than sparser, these rare vectors make the half of each cluster
Diameter becomes very big.When range query, the bigger radius the easier to intersect with the region of range-based searching, so that search
Increase in region.As long as this means that the region of range-based searching is intersected with each cluster, regardless of intersecting area data number,
A data must be just positioned in Chord ring.The number of these locating resources of the minimal amount of data in Chord ring is significantly
Increase, therefore reduces the performance of M-Chord.
Fig. 4 is the data profile that the characteristic of the color histogram of 68040 width images is clustered by Kmeans.From figure
In it can be seen that radius length of this cluster is 0.62, but most data is distributed between 0.09-0.35.Due to pole
A small number of edge datas causes the radius of cluster to increase will by about one time.
Fig. 5 is the data access frequency figure under 1000 random range-based searchings in the Cluster space.Since range is looked into
When inquiry, in this case it is not apparent that whether have data in query context, so the section of data can not gone to access in retrieval yet.Figure
It can be seen that marginal vectors are rare in the comparison of 4 and Fig. 5, but the frequency that these vectors are accessed is quite high, substantially greatly
In 80%.
Summary of the invention
The present invention in order to overcome at least one of the drawbacks of the prior art described above (deficiency), provides a kind of point of cluster separation
Cloth indexing means CS-Chord (Clustering separation-Chord), which reduces on Chord ring
Recall precision is improved in search range.
In order to solve the above technical problems, technical scheme is as follows:
A kind of distributed index method that cluster is isolated, comprising the following steps:
Step 1: separation edge sparse vector, and edge sparse vector is stored using independent server centered;
Step 2: establishing distributed index, calculates the one-dimensional key value for needing to be added the edge sparse vector S of Chord ring
Key (s), and the vector is inserted into distributed index, the detailed process of vector insertion is;
(21) if Key (S) >=n*C, wherein n is the number for clustering subspace, and C is a constant, and value is greater than
All values in IDistance index structure middle ring intracorporal DUAL PROBLEMS OF VECTOR MAPPING to one-dimensional axis, then by key value Key (s) and vector S
It is sent on independent server, then vector S is inserted into the B of the separate server+In-Tree index, then the new vector
Insertion is completed;If Key (s) < n*C turns to step (22);
(22) it keeps hash function to carry out Hash to Key (s) by position, generates the key value being assigned on Chord ring
KeyChord, using Chord location algorithm, search key value KeyChordThe node IP address that should be stored, by KeyChordWith the vector
S is sent on the node, then vector S is inserted into the B of node+In-Tree index, index, which is established, to be completed;
Step 3: carrying out range query based on constructed index, if the distributed index method CS- that cluster is isolated
The range query Range (Q, r) of Chord, wherein Q is vector to be checked, and r is query context radius, and steps are as follows:
(31) intersecting area that range query Range (Q, r) and cluster are calculated by IDistance, is mapped as multiple
Key value section [xi, yi];
(32) if xi >=n*C, the intersecting area of step (31) computer capacity inquiry Range (Q, r) and cluster is sent out
It is sent on separate server, goes to step (34), step (33) are turned to if xi < n*C;
(33) the key value range [h (xi), h (yi)] in Chord ring is generated, key value h is positioned by table of query and routing
(xi) node where, if h (yi) is greater than the key value maximum value Key of stored data in nodemax, then by range [Keymax,
H (yi)] it is sent to the descendant node of the node, if h (yi) is still than the Key of descendant nodemaxGreatly, then continue toward the subsequent of it
Node sends query information,
(34) each node receives inquiry request, in the B of this node+-Tree(B+What is stored in-Tree is to meet item
The various S vectors of part) in retrieval key value range in whether directed quantity exist, if it exists vector Z then with vector Q to be checked carry out away from
From calculating, when distance is less than inquiry radius r, then by vector Z back to the node for being originally sent request, if distance is greater than or equal to
When inquiring radius r, then null value is returned.Vector if it does not exist also returns to null value.
Preferably, one separation edge sparse vector of above-mentioned steps, and it is sparse using independent server centered storage edge
The detailed process of vector data are as follows:
If the dense vector of cluster and the separation of edge sparse vector are denoted as Rb, the radius of cluster is R, then from cluster
The distance [0, R of heart pointb] between region be dense vector area, [Rb, R] region be sparse vector area;
In n Cluster space, n*C's is increased on the basis of the original for the calculating of sparse vector area key value Key (S)
Distance, if vector S is the vector for needing to be added Chord ring, P is the central point clustered where vector S point, then vector S
The calculation formula of key value Key (S) is as follows:
Wherein 0≤i < n;
By formula (1), sparse vector is separated, and stores sparse data using independent server centered;Then when looking into
When asking key value Key (S) >=n*C of vector S, centrally stored server inquiry is directly accessed.
Preferably, in above-mentioned steps two, position keeps hash function h to be defined as follows:
For data interval [Xmin, Xmax], to map that section [Ymin, Ymax] and keep the consistency of data, it is false
If Xi∈[Xmin, Xmax], value is Y after being mapped by function hi, Yi∈[Ymin, Ymax], then the hash function is defined as:
Wherein, because the section of the index key value of CS-Chord is [0, Kmax], the identifier space range of Chord ring is
[0,2mIt -1], therefore can be by Xmin=0, Xmax=Kmax,Ymin=0, Ymax=2m- 1 substitutes into formula (2), can obtain:
It should be noted that position keeps the causa essendi of hash function to be: first to data when common mapping method
The key value Hash of point, then again to 2mValue after modulus is just the key value of Chord ring.But this way will be originally adjacent
Mapping of data points to different nodes on.It is not for the Index Algorithm for needing continuously to search this for IDistance
Desirable, it is therefore desirable to a kind of hash function that position is kept, so that keeping the consistency of data sequence.
Compared with prior art, the beneficial effect of technical solution of the present invention is:
The invention proposes a kind of distributed index method that cluster is isolated, abbreviation CS-Chord (Clustering
separation-Chord).In M-Chord distributed index, for the marginal vectors of cluster generally than sparser, these are rare
Vector makes the radius of each cluster become very big.When range query, the bigger cluster of radius is easier to be looked into range
The region intersection looked for, so that the candidate region searched is increased.And cluster marginal vectors usually high access to
Amount, performance further decrease.CS-Chord of the present invention separates the sparse vector for clustering edge and centrally stored
On independent server, dense vector is stored in Chord ring, the inquiry of one side high frequency concentrates on independent clothes when lookup
The vector of business device, on the other hand decreases the search range on Chord ring, to improve recall precision.
Detailed description of the invention
Fig. 1 is Chord schematic diagram.
Fig. 2 is the schematic diagram of IDistance.
Fig. 3 is the schematic diagram of M-Chord.
Fig. 4 is certain Cluster space data profile.
Fig. 5 is the access frequency figure of some stochastic clustering.
Fig. 6 is two-dimensional space cluster edge separation schematic diagram.
Fig. 7 is CS-Chord index schematic diagram.
Fig. 8 is CS-Chord range-based searching schematic diagram.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;In order to better illustrate this embodiment, attached
Scheme certain components to have omission, zoom in or out, does not represent the size of actual product;
To those skilled in the art, it is to be understood that certain known features and its explanation, which may be omitted, in attached drawing
's.The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.
A kind of isolated distributed index method (CS-Chord) of cluster of the present invention by cluster edge it is sparse to
Amount is separated and is centrally stored on independent server, and dense vector is stored in Chord ring, on the one hand high when lookup
The inquiry of frequency concentrates on the dense vector of separate server, on the other hand decreases the search range on Chord ring, to mention
High recall precision.The specific steps of which are as follows:
Step 1: edge sparse vector separation
Edge data is sparse and access frequency is high, it should centralised storage, eliminate Resource orientation in Chord ring when
Between.In order to reach this purpose, it first has to.
If the dense vector of cluster and the separation of sparse vector, are denoted as Rb.If the radius of cluster is R.Then from cluster
The distance [0, R of heart pointb] between region be the dense area of data.[Rb, R] region be Sparse area.As shown in fig. 6, two-dimentional
The data in space are divided into three Cluster spaces, and the dense area of dark gray section data of cluster, bright gray parts are Sparse
Area.The data distribution in dense area is in [0,3C].The data distribution of rarefaction is between [3C, 6C].
Assuming that sharing n Cluster space, the calculating of Sparse area key value (Key) is increased on the basis of the original
The distance of n*C.Assuming that vector S is the vector for needing to be added Chord ring, P is the central point (anchor of cluster where vector S point
Point).Then the calculation formula of the key value Key (S) of vector S is as follows:
Wherein 0≤i < n.
Pass through formula (1), so that it may sparse data be separated, and be put into one section of continuous region.Even if therefore
These data are all put into Chord ring, a large amount of Resource orientation will not be caused to operate.But due to the access of sparse data
Frequency is high, so this patent stores sparse data using independent server centered.Key (S) >=n*C of query context in this way
When, directly access centrally stored server inquiry.
Step 2: distributed index is established
Some node calculates the one-dimensional key value Key of vector S, which is inserted into distributed rope by formula (1)
The process drawn are as follows:
(21) it if Key >=n*C, sends the information of key value Key and vector S on independent server, then
Insert it into the B of server+In-Tree index, then the new vector insertion is completed.If Key < n*C turns to step (22).
(22) it keeps hash function to carry out Hash to Key by position, generates the key value being assigned on Chord ring
KeyChord.Using Chord location algorithm, key value Key is searchedChordThe node IP that should be stored.Then the information of data point is sent out
It is sent on the node, is then inserted into the B of the node data+In-Tree index, index, which is established, to be completed.
Wherein, position keeps hash function h to be defined as follows:
For data interval [Xmin, Xmax], to map that section [Ymin, Ymax] and keep data consistency.It is false
If Xi∈[Xmin, Xmax], value is Y after being mapped by function hi, Yi∈[Ymin, Ymax].Then the hash function can be with is defined as:
Wherein, because the section of the index key value of CS-Chord is [0, Kmax], the identifier space range of Chord ring is
[0,2mIt -1], therefore can be by Xmin=0, Xmax=Kmax,Ymin=0, Ymax=2m- 1 substitutes into formula (2), can obtain:
It should be noted that position keeps the causa essendi of hash function to be: first to data when common mapping method
The key value Hash of point, then again to 2mValue after modulus is just the key value of Chord ring.But this way will be originally adjacent
Mapping of data points to different nodes on.It is not for the Index Algorithm for needing continuously to search this for IDistance
Desirable, it is therefore desirable to a kind of hash function that position is kept, so that keeping the consistency of data sequence.
Fig. 7 is the schematic diagram of CS-Chord distributed index establishment process in two-dimensional space.
Step 3: range query
It is illustrated in figure 8 the schematic diagram of the range query Range (Q, r) of CS-Chord, wherein Q is vector to be checked, and r is to look into
Range radius is ask, steps are as follows:
(31) intersecting area that range query Range (Q, r) and cluster are calculated by IDistance, is mapped as multiple
Key value section [xi, yi].
(32) it if xi >=n*C, sends the information that step (31) calculate on separate server, goes to step (34).
If xi < n*C turns to step (33).
(33) the key value range [h (xi), h (yi)] in Chord ring is generated.Key value h is positioned by table of query and routing
(xi) node where, if h (yi) is greater than the key value maximum value Key of stored data in nodemax, then by range [Keymax,
H (yi)] it is sent to the descendant node of the node.If h (yi) is than the Key of descendant nodemaxAlso then to continue toward after it greatly
Query information is sent after node.
(34) each node (including the node in server and Chord ring) receives inquiry request, in the B of this node+Retrieve in key value range whether directed quantity exists in-Tree, vector is then carried out with vector Q to be checked apart from calculating if it exists, if
Distance is less than r, back to the node for being originally sent request.
As shown in Figure 8, inquiry Q and the cluster dense area P0, rarefaction are all intersected, and are intersected with the cluster rarefaction P1.It reflects
The one-dimensional key value range penetrated is [x1, y1], [x2, y2] [x3, y3].The section [x1, y1], which is sent in Chord ring, to be retrieved, [x2,
Y2] section [x3, y3] is sent in server and retrieves.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair
The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description
To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this
Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention
Protection scope within.
Claims (2)
1. a kind of distributed index method that cluster is isolated, which comprises the following steps:
Step 1: separation edge sparse vector, and edge sparse vector is stored using independent server centered;
Step 2: establishing distributed index, calculates the one-dimensional key value Key for needing to be added the edge sparse vector S of Chord ring
(S), it and by the vector is inserted into distributed index, the detailed process of vector insertion is;
(21) if Key (S) >=n*C, wherein n is the number for clustering subspace, and C is a constant, and value is greater than IDistance
All values in index structure middle ring intracorporal DUAL PROBLEMS OF VECTOR MAPPING to one-dimensional axis then send key value Key (S) and vector S to solely
On vertical server, then vector S is inserted into the B of the separate server+In-Tree index, then the new vector insertion is completed;
If Key (S) < n*C turns to step (22);
(22) it keeps hash function to carry out Hash to Key (S) by position, generates the key value being assigned on Chord ring
KeyChord, using Chord location algorithm, search key value KeyChordThe node IP address that should be stored, by KeyChordWith the vector
S is sent on the node, then vector S is inserted into the B of node+In-Tree index, index, which is established, to be completed;
Position keeps hash function h to be defined as follows:
For data interval [Xmin, Xmax], to map that section [Ymin, Ymax] and keep data consistency, it is assumed that Xi∈
[Xmin, Xmax], value is Y after being mapped by function hi, Yi∈[Ymin, Ymax], then the hash function is defined as:
Wherein, because the section of the index key value of CS-Chord is [0, Kmax], the identifier space range of Chord ring is [0,2m-
It 1], therefore can be by Xmin=0, Xmax=Kmax,Ymin=0, Ymax=2m- 1 substitutes into formula (2), can obtain:
Step 3: carrying out range query based on constructed index, if clustering isolated distributed index method CS-Chord's
Range query Range (Q, r), wherein Q is vector to be checked, and r is query context radius, and steps are as follows:
(31) intersecting area that range query Range (Q, r) and cluster are calculated by IDistance, is mapped as multiple keys
It is worth section [xi, yi];
(32) it if xi >=n*C, sends step (31) computer capacity inquiry Range (Q, r) and the intersecting area of cluster to
(34) on separate server, are gone to step, step (33) are turned to if xi < n*C;
(33) the key value range [h (xi), h (yi)] in Chord ring is generated, key value h (xi) is positioned by table of query and routing
The node at place, if h (yi) is greater than the key value maximum value Key of stored data in nodemax, then by range [Keymax, h
(yi)] it is sent to the descendant node of the node, if h (yi) is still than the Key of descendant nodemaxGreatly, then continue the subsequent section toward it
Point sends query information,
(34) each node receives inquiry request, in the B of this node+In-Tree retrieve key value range in whether directed quantity
In the presence of vector Z then carries out inquiring radius r apart from calculating when distance is less than, then returning to vector Z with vector Q to be checked if it exists
It is originally sent the node of request, if distance is greater than or equal to inquiry radius r, null value is returned, if it does not exist vector, also returns
Make the return trip empty value.
2. the isolated distributed index method of cluster according to claim 1, which is characterized in that above-mentioned steps one separate side
Edge sparse vector, and use the detailed process of independent server centered storage edge sparse vector data are as follows:
If the dense vector of cluster and the separation of edge sparse vector are denoted as Rb, the radius of cluster is R, then from cluster centre point
Distance [0, Rb] between region be dense vector area, [Rb, R] region be sparse vector area;
In n Cluster space, for the calculating of sparse vector area key value Key (S) increase on the basis of the original n*C away from
From if vector S is the vector for needing to be added Chord ring, P is the central point of cluster where vector S point, then the pass of vector S
The calculation formula of key value Key (S) is as follows:
Wherein 0≤i < n;
By formula (1), sparse vector is separated, and stores sparse data using independent server centered;Then when inquiry to
When measuring key value Key (S) >=n*C of S, centrally stored server inquiry is directly accessed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610287204.7A CN105868414B (en) | 2016-05-03 | 2016-05-03 | A kind of distributed index method that cluster is isolated |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610287204.7A CN105868414B (en) | 2016-05-03 | 2016-05-03 | A kind of distributed index method that cluster is isolated |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105868414A CN105868414A (en) | 2016-08-17 |
CN105868414B true CN105868414B (en) | 2019-03-26 |
Family
ID=56630062
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610287204.7A Expired - Fee Related CN105868414B (en) | 2016-05-03 | 2016-05-03 | A kind of distributed index method that cluster is isolated |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105868414B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110120918B (en) * | 2019-05-10 | 2020-05-08 | 北京邮电大学 | Identification analysis method and device |
CN111582224A (en) * | 2020-05-19 | 2020-08-25 | 湖南视觉伟业智能科技有限公司 | Face recognition system and method |
CN113297331B (en) * | 2020-09-27 | 2022-09-09 | 阿里云计算有限公司 | Data storage method and device and data query method and device |
CN116541420B (en) * | 2023-07-07 | 2023-09-15 | 上海爱可生信息技术股份有限公司 | Vector data query method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103729434A (en) * | 2013-12-26 | 2014-04-16 | 乐视网信息技术(北京)股份有限公司 | Distributed index method and distributed index system for video data |
CN103744934A (en) * | 2013-12-30 | 2014-04-23 | 南京大学 | Distributed index method based on LSH (Locality Sensitive Hashing) |
-
2016
- 2016-05-03 CN CN201610287204.7A patent/CN105868414B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103729434A (en) * | 2013-12-26 | 2014-04-16 | 乐视网信息技术(北京)股份有限公司 | Distributed index method and distributed index system for video data |
CN103744934A (en) * | 2013-12-30 | 2014-04-23 | 南京大学 | Distributed index method based on LSH (Locality Sensitive Hashing) |
Non-Patent Citations (1)
Title |
---|
高维分布式局部敏感哈希索引方法;林朝晖 等;《计算机科学与探索》;20130528;第7卷(第9期);第811-819页 |
Also Published As
Publication number | Publication date |
---|---|
CN105868414A (en) | 2016-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105868414B (en) | A kind of distributed index method that cluster is isolated | |
US7788400B2 (en) | Utilizing proximity information in an overlay network | |
US7483391B2 (en) | Providing a notification including location information for nodes in an overlay network | |
Li et al. | Semantic small world: An overlay network for peer-to-peer search | |
Shen et al. | A distributed spatial-temporal similarity data storage scheme in wireless sensor networks | |
Doulkeridis et al. | Peer-to-peer similarity search in metric spaces | |
CN110022234B (en) | Method for realizing unstructured data sharing mechanism facing edge calculation | |
CN107656989B (en) | Nearest Neighbor based on data distribution perception in cloud storage system | |
EP1859602B1 (en) | Distributed storing of network position information for nodes | |
CN101719155B (en) | Method of multidimensional attribute range inquiry for supporting distributed multi-cluster computing environment | |
US9292559B2 (en) | Data distribution/retrieval using multi-dimensional index | |
Li et al. | A small world overlay network for semantic based search in P2P systems | |
Shen | A P2P-based intelligent resource discovery mechanism in Internet-based distributed systems | |
Ishi et al. | Range-key extension of the skip graph | |
CN108446356B (en) | Data caching method, server and data caching system | |
CN113010373A (en) | Data monitoring method and device, electronic equipment and storage medium | |
CN112256638A (en) | Method for searching limited decentralized distributed hash table resources in CNFS protocol | |
Shen et al. | Combining efficiency, fidelity, and flexibility in resource information services | |
Lee et al. | Supporting similarity range queries efficiently by using reference points in structured p2p overlays | |
Zhou et al. | HDKV: supporting efficient high‐dimensional similarity search in key‐value stores | |
Jin et al. | SCQR-A P2P query routing algorithm based on semantic cluster | |
Zhang et al. | Indexing historical spatio-temporal data in the cloud | |
Amagata et al. | Efficient Multidimensional Top‐k Query Processing in Wireless Multihop Networks | |
Chen et al. | Reverse nearest neighbor search in peer-to-peer systems | |
Brindha et al. | Reliable And Ascendable Content Based Image Retrieval Aproach In Peer To Peer Networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190326 Termination date: 20190503 |