CN105868414A - Clustering separation distributive indexing method - Google Patents
Clustering separation distributive indexing method Download PDFInfo
- Publication number
- CN105868414A CN105868414A CN201610287204.7A CN201610287204A CN105868414A CN 105868414 A CN105868414 A CN 105868414A CN 201610287204 A CN201610287204 A CN 201610287204A CN 105868414 A CN105868414 A CN 105868414A
- Authority
- CN
- China
- Prior art keywords
- vector
- key
- chord
- node
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2272—Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
- G06F16/134—Distributed indices
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer And Data Communications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a clustering separation distributive indexing method, called CS-Chord (Clustering separation-Chord) for short. In M-Chord distributive indexing, marginal vectors of clusters are generally sparser, the sparser vectors enable the radius of each cluster to become quite large; during range query, the clusters with larger radius intersect with range searching areas more easily, and the to-be-searched areas are increased; however, the marginal vectors of the clusters are usually vectors with high access, and the performance is further reduced. The sparse vectors at margins of the clusters are separated out and stored in a concentrated manner in an independent server by the CS-Chord, dense vectors are stored in a Chord ring, and during searching, on the one hand, the high-frequency query is concentrated on the vectors in the independent server, on the other hand, the searching range on the Chord ring is reduced, so that the retrieval efficiency is improved.
Description
Technical field
The present invention relates to distributed index field, more particularly, to a kind of distributed index side clustering separation
Method.
Background technology
P2P peer-to-peer network is independent of special centralized server, and in network, all of node is all equality,
Freely interconnect.They carry out resource and the service of Sharing computer by exchange.P2P distributed index structure is filled
That divides make use of the performance of each node in network node, has extensibility good, resource utilization advantages of higher.
In recent years, distributed index has been increasingly becoming the focus of research.NOVAK.D et al. proposes
M-Chord is the distributed index algorithm of a kind of high dimension vector similarity retrieval based on P2P network.This algorithm
Being combined with Chord agreement by IDistance algorithm, IDistance algorithm is responsible for high dimension vector dimensionality reduction, Chord
Agreement is responsible for the storage and retrieval of distributed vector.
Chord is a kind of structurized distributed lookup agreement, comes in P2P network fast by DHT technology
Speed ground locating resource.In order to realize quick resource lookup, each node on Chord ring is required for safeguarding
O(log2 n) routing table of (n is the node total number mesh in Chord ring) length.In Chord agreement, node and
Data all can obtain the m bit identifier in same a room by mapping, makes each by introducing dummy node
The data that node storage is roughly equal, i.e. Chord agreement is load balancing.Node route list is scattered,
Each node has only to know that in whole system, the routing iinformation of minority node is the most permissible by constantly redirecting inquiry
Obtain query path information.The operation of one query has only to produce O (log in ring2 n) bar message.
Distributed hash (DHT) is in a certain way for each node one the unique mark of distribution in network
Know symbol.In Chord agreement, data resource is also by same rule one unique identifier of distribution.Chord
Agreement uses concordance hash algorithm (Consistent Hash) to calculate node and resource, and the result of mapping is led to
Cross 2mDelivery obtains the identifier of a m position, and scope is [0,2m-1].For node, IP address is only
One, concordance Hash is by obtaining node identifier to the IP Address-Hash of node.For data, pass through
Key value Hash is obtained with the identifier of data.M=2, N4Chord ring such as Fig. 1 (a) shown in,
Ni is node, and Ki is resource.
Resource position rapidly the routing iinformation that each node relied primarily on is preserved.The data knot of each node
Structure has a routing table, saves data and the address information of part of nodes, as shown in Fig. 1 (b).
The lookup of Chord can be divided into following steps:
(1) certain node N receives key value key to be checked, first searches in the local resource of this node N and is
No have this key value, if node N has this key value, then searches and terminates and return node resource, otherwise
Turn to step (2).
(2) check the pointer gauge of requested node, find the identifier and closest mapped less than key value
Node, then search request is sent on this node, repeat step (1).
IDistance is a kind of high dimension vector indexing means based on metric space.The basic think of that its index is set up
Thinking: choose several anchor points in whole data space, each anchor point correspond to a cluster subset.Data
Each data point in space is divided in the cluster subset of the nearest anchor point of this data point.Then higher-dimension to
Measure by being converted into an one-dimensional key value iDist that can measure with the distance of anchor point, utilize B+-Tree group
Knitting the key value iDist managing all high dimension vectors, the computing formula of key value iDist is: iDist (x)=dist (pi,x)
+i*c.As in figure 2 it is shown, P0、P1、P2For anchor point;CiFor PiData subset in from PiFarthest certain
The distance of data point, i.e. PiThe radius of data subset;C is a constant, more than all of Ci。
If complete or collected works are D, and given similarity dimensions inquiry Range (q, r), i.e. retrieval and data point q distance
Set of data points less than radius r: Range (q, r)=x ∈ D, dist (q, x) < r}, wherein, function dist (q, x)
Represent the vector q distance to data point x.
The retrieving of IDistance is:
(1) pass through and each anchor point PiDistance calculate: the search of q circle whether with this anchor point PiData
Subset intersects.
The judgment formula intersected is: dist (q, Pi)<Ci+r
Disjoint judgment formula is: dist (q, Pi)>Ci+r
(2) if non-intersect, without searched targets point in the data subset of this anchor point;If intersecting, it is determined that search
Ring body scope.The ring body scope of search is:
{x∈Pi,max(dist(Pi, q)-r, 0) < dist (Pi, x) < min (dist (Pi,q)+r,Ci)}
(3) determine the hunting zone of one-dimensional key value iDist, thus quickly search on B+ tree,
The data point found enters Candidate Set.The hunting zone of one-dimensional key value iDist:
{x∈Pi,i*c+max(dist(Pi, q)-r, 0) < iDist (Pi, x) < i*c+min (dist (Pi,q)+r,Ci)}
(4) respectively with q, each data point in Candidate Set is carried out distance to calculate, if distance is less than r, then
Enter final retrieval set.
IDistance, by the index problem of high dimension vector, is reduced to one-dimensional by the way of choosing anchor point cleverly
On, one-dimensional index is organized by B+ tree, has the advantages that search is fast, saved substantial amounts of distance meter
Calculate.
The M-Chord (M represents Metric) that NOVAK.D et al. proposes is distributed under a kind of metric space
Index Algorithm, not only energy locating resource (equal lookup) under distributed p2p network, also extends similarity
Search (range-based searching).IDistance algorithm is combined by this algorithm with Chord agreement, and IDistance algorithm is born
The dimensionality reduction of duty high dimension vector, Chord agreement is responsible for the storage of distributed data, is successfully achieved high dimension vector
Similarity search under distributed environment.IDistance with Chord is combined by M-Chord algorithm, logical
Cross IDistance and high dimension vector is converted into one-dimensional key value, by hash function, one-dimensional key value is mapped to
In the identifier space of Chord, inserted by Chord ring and retrieve data, as shown in Figure 3.
Certain node of M-Chord algorithm receives a range retrieval Range, and (Q, r) process is as follows, wherein Q
For vector to be checked, r is query context radius.
(1) (Q, r) with the intersecting area clustered, maps to calculate range retrieval Range by IDistance
Interval [xi, yi] for multiple key values.
(2) keep hash function h to xi, yi Hash by position, generate the key value model in Chord ring
Enclose [h (xi), h (yi)].By the node at key value h (xi) place, table of query and routing location, if h (yi) is more than joint
The key value maximum Key of stored data in pointmax, then by scope [Keymax, h (yi)] and it is sent to this node
Descendant node.If h (yi) is than the Key of descendant nodemaxAlso want big, then continue the descendant node toward it and send
Query Information.
(4) each node (including the node in server and Chord ring) receives inquiry request,
The B of this node+-Tree retrieving in key value scope, whether directed quantity exists, if there is vector, with to be checked to
Amount Q carries out distance and calculates, if distance is less than r, returns to be originally sent the node of request.
Marginal vectors in the cluster of M-Chord typically ratio is sparser, and the vector of these rarenesses makes each cluster
Radius become the biggest.Range query when, radius is the biggest more the easy region with range-based searching is intersected,
So that the region searched is increased.As long as this means that the region of range-based searching is intersected with each cluster,
No matter intersecting area data number, be necessary for positioning a secondary data in Chord ring.These minimal amount of numbers
It is greatly increased according to the number of times of the locating resource in Chord ring, therefore reduces the performance of M-Chord.
Fig. 4 is that the data that the characteristic of the color histogram of 68040 width images is clustered by Kmeans are distributed
Figure.As can be seen from the figure the radius length of this cluster is 0.62, but most data is distributed in
Between 0.09-0.35.Owing to the MARG of only a few causes the radius of cluster to add by about one time.
Fig. 5 is the data access frequency figure under 1000 random range-based searching in this Cluster space.Due to
The when of range query, in this case it is not apparent that whether have data in query context, so there is no the interval of data in retrieval
In also can go access.The contrast of Fig. 4 and Fig. 5 can be seen that, marginal vectors is rare, but these vectors are interviewed
The frequency asked is the most at a relatively high, and substantially greater than 80%.
Summary of the invention
The present invention is to overcome at least one defect (not enough) described in above-mentioned prior art, it is provided that a kind of cluster point
From distributed index method CS-Chord (Clustering separation-Chord), this indexing means reduce
Hunting zone on Chord ring, improves recall precision.
For solving above-mentioned technical problem, technical scheme is as follows:
A kind of distributed index method clustering separation, comprises the following steps:
Step one: separation edge sparse vector, and use independent server centered storage edge sparse vector;
Step 2: set up distributed index, calculating needs the one-dimensional of edge sparse vector S of addition Chord ring
Key value Key (s), and this vector is inserted into distributed index, the detailed process that vector inserts is;
(21) if Key (S) >=n*C, wherein n is the number of cluster subspace, and C is a constant,
Its value more than all values in the DUAL PROBLEMS OF VECTOR MAPPING in IDistance index structure medium ring body to one-dimensional axle, then will be closed
Key value Key (s) and vector S are sent on independent server, then vector S are inserted into this separate server
B+In-Tree index, then this new vector has inserted;If Key (s) < n*C turns to step (22);
(22) keep hash function that Key (s) carries out Hash by position, generate and be assigned on Chord ring
Key value KeyChord, utilize Chord location algorithm, search key value KeyChordThe node IP ground that should store
Location, by KeyChordIt is sent on this node with this vector S, then vector S is inserted into the B of node+-Tree
In index, index foundation completes;
Step 3: carry out range query based on constructed index, if the distributed index method that cluster separates
(Q, r), wherein Q is vector to be checked to the range query Range of CS-Chord, and r is query context radius, step
Rapid as follows:
(31) (Q, r) with the intersecting area clustered, maps to calculate range query Range by IDistance
Interval [xi, yi] for multiple key values;
(32) if xi >=n*C, then by step (31) computer capacity inquiry Range, (Q, r) with cluster
Intersecting area is sent on separate server, goes to step (34), if xi is < n*C, turns to step (33);
(33) generate the key value scope [h (xi), h (yi)] in Chord ring, closed by table of query and routing location
The node at key assignments h (xi) place, if h (yi) is more than the key value maximum Key of stored data in nodemax,
Then by scope [Keymax, h (yi)] and it is sent to the descendant node of this node, if h (yi) is still than descendant node
KeymaxGreatly, then continue the descendant node toward it and send Query Information,
(34) each node receives inquiry request, at the B of this node+-Tree(B+Storage in-Tree
The various S vector meeting condition) in retrieval key value scope whether directed quantity exist, if there is vector Z
Then carry out distance with vector Q to be checked to calculate, when distance is less than inquiry radius r, then vector Z is returned to initially
Send the node of request, if distance is more than or equal to inquiry radius r, then return null value.If there is not vector,
Also return to null value.
Preferably, above-mentioned steps one separation edge sparse vector, and use independent server centered storage edge
The detailed process of sparse vector data is:
If the dense vector of cluster is designated as R with the separation of edge sparse vectorb, the radius of cluster is R, then from
The distance [0, R of cluster centre pointbRegion between] is dense vector district, [Rb, R] region be sparse vector
District;
At n Cluster space, the calculating for sparse vector district key value Key (S) increases on the basis of original
The distance of n*C, if vector S is a vector needing to add Chord ring, P is vector S point place cluster
Central point, then the computing formula of key value Key (S) of vector S is as follows:
Wherein 0≤i < n;
By formula (1), sparse vector is separated, and use independent server centered storage sparse data;
Then as key value Key (the S) >=n*C of query vector S, directly access the server lookup of centralized stores.
Preferably, in above-mentioned steps two, position keeps hash function h to be defined as follows:
For data interval [Xmin, Xmax], interval [Y to be mapped that tomin, Ymax] and keep the consistent of data
Property, it is assumed that Xi∈[Xmin, Xmax], after being mapped by function h, value is Yi, Yi∈[Ymin, Ymax], then this Kazakhstan
Uncommon function is defined as:
Wherein, because of CS-Chord index key value interval be [0, Kmax], the identifier of Chord ring is empty
Between scope be [0,2m-1], thus can be by Xmin=0, Xmax=Kmax,Ymin=0, Ymax=2m-1 substitutes into formula (2), can
:
It should be noted that the causa essendi that position keeps hash function is: the most right during common mapping method
The key value Hash of data point, the most again to 2mValue after delivery is just for the key value of Chord ring.But this
Plant way by the most adjacent Mapping of data points to different nodes.Need even for IDistance is this
It is worthless for the continuous Index Algorithm searched, it is therefore desirable to the hash function that a kind of position keeps so that protect
Hold the concordance of data order.
Compared with prior art, technical solution of the present invention provides the benefit that:
The present invention proposes a kind of distributed index method clustering separation, is called for short CS-Chord (Clustering
separation-Chord).In M-Chord distributed index, the marginal vectors of cluster typically ratio is sparser,
The vector of these rarenesses makes the radius of each cluster become the biggest.Range query when, radius is the biggest
Cluster the easiest region with range-based searching to intersect, so that the region that candidate searches is increased.And the limit clustered
Edge vector is typically again the vector of high access, and performance reduces further.CS-Chord of the present invention will
The sparse vector at cluster edge is separated and is centrally stored on independent server, is stored in by dense vector
In Chord ring, during lookup, on the one hand the inquiry of high frequency concentrates on the vector of separate server, on the other hand also subtracts
Lack the hunting zone on Chord ring, thus improve recall precision.
Accompanying drawing explanation
Fig. 1 is Chord schematic diagram.
Fig. 2 is the schematic diagram of IDistance.
Fig. 3 is the schematic diagram of M-Chord.
Fig. 4 is certain Cluster space data profile.
Fig. 5 is the access frequency figure of certain stochastic clustering.
Fig. 6 is that two-dimensional space clusters edge separation schematic diagram.
Fig. 7 is that CS-Chord indexes schematic diagram.
Fig. 8 is CS-Chord range-based searching schematic diagram.
Detailed description of the invention
Accompanying drawing being merely cited for property explanation, it is impossible to be interpreted as the restriction to this patent;In order to this enforcement is more preferably described
Example, some parts of accompanying drawing have omission, zoom in or out, do not represent the size of actual product;
To those skilled in the art, in accompanying drawing, some known features and explanation thereof may be omitted is to manage
Solve.With embodiment, technical scheme is described further below in conjunction with the accompanying drawings.
A kind of distributed index method (CS-Chord) of separation that clusters of the present invention is by cluster edge
Sparse vector is separated and is centrally stored on independent server, and dense vector is stored in Chord ring
In, during lookup, on the one hand the inquiry of high frequency concentrates on the dense vector of separate server, on the other hand decreases
Hunting zone on Chord ring, thus improve recall precision.It specifically comprises the following steps that
Step one: edge sparse vector separates
MARG is sparse and access frequency is high, it should centralised storage, eliminates Resource orientation in Chord ring
Time.In order to reach this purpose, first have to.
If the dense vector of cluster and the separation of sparse vector, it is designated as Rb.If the radius of cluster is R.Then from
The distance [0, R of cluster centre pointbRegion between] is the dense districts of data.[Rb, R] region be Sparse
District.As shown in Figure 6, the data of two-dimensional space are divided into three Cluster spaces, the Dark grey part data of cluster
Dense district, bright gray parts is Sparse district.The data in dense district are distributed in [0,3C].The number of rarefaction
According to being distributed between [3C, 6C].
Assume total n Cluster space, Sparse district key value (Key) is calculated at original base
The distance of n*C is increased on plinth.Assuming that vector S is a vector needing to add Chord ring, P is vector S
The central point (anchor point) of some place cluster.Then the computing formula of the key value Key (S) of vector S is as follows:
Wherein 0≤i < n.
By formula (1), it is possible to sparse data is separated, and puts in one section of continuous print region.Cause
Even if these data are all put into Chord ring by this, also it is not result in that substantial amounts of Resource orientation operates.But, by
It is high in the access frequency of sparse data, so this patent uses independent server centered storage sparse data.This
During Key (the S) >=n*C of sample query context, directly access the server lookup of centralized stores.
Step 2: set up distributed index
Certain node passes through formula (1), calculates the one-dimensional key value Key of vector S, is inserted into by this vector
The process of distributed index is:
(21) if Key >=n*C, then the information of key value Key and vector S is sent to independent service
On device, it is then inserted into the B of server+In-Tree index, then this new vector has inserted.If
Key < n*C turns to step (22).
(22) keep hash function that Key carries out Hash by position, generate the pass being assigned on Chord ring
Key value KeyChord.Utilize Chord location algorithm, search key value KeyChordThe node IP that should store.
Then the information of data point is sent on this node, is then inserted into the B of this node data+In-Tree index,
Index foundation completes.
Wherein, position keeps hash function h to be defined as follows:
For data interval [Xmin, Xmax], interval [Y to be mapped that tomin, Ymax] and keep the consistent of data
Property.Assume Xi∈[Xmin, Xmax], after being mapped by function h, value is Yi, Yi∈[Ymin, Ymax].Then this Kazakhstan
Uncommon function can be defined as:
Wherein, because of CS-Chord index key value interval be [0, Kmax], the identifier of Chord ring is empty
Between scope be [0,2m-1], thus can be by Xmin=0, Xmax=Kmax,Ymin=0, Ymax=2m-1 substitutes into formula (2), can
:
It should be noted that the causa essendi that position keeps hash function is: the most right during common mapping method
The key value Hash of data point, the most again to 2mValue after delivery is just for the key value of Chord ring.But this
Plant way by the most adjacent Mapping of data points to different nodes.Need even for IDistance is this
It is worthless for the continuous Index Algorithm searched, it is therefore desirable to the hash function that a kind of position keeps so that protect
Hold the concordance of data order.
Fig. 7 is the schematic diagram that in two-dimensional space, CS-Chord distributed index sets up process.
Step 3: range query
(wherein Q is to be checked for Q, schematic diagram r) to be illustrated in figure 8 the range query Range of CS-Chord
Vector, r is query context radius, and step is as follows:
(31) (Q, r) with the intersecting area clustered, maps to calculate range query Range by IDistance
Interval [xi, yi] for multiple key values.
(32) if xi >=n*C, then the information that step (31) calculates is sent on separate server, turns
Step (34).If xi < n*C turns to step (33).
(33) the key value scope [h (xi), h (yi)] in Chord ring is generated.Closed by table of query and routing location
The node at key assignments h (xi) place, if h (yi) is more than the key value maximum Key of stored data in nodemax,
Then by scope [Keymax, h (yi)] and it is sent to the descendant node of this node.If h (yi) is than the Key of descendant nodemax
Also want big, then continue the descendant node toward it and send Query Information.
(34) each node (including the node in server and Chord ring) receives inquiry request,
The B of this node+-Tree retrieving in key value scope, whether directed quantity exists, if there is vector, with to be checked to
Amount Q carries out distance and calculates, if distance is less than r, returns to be originally sent the node of request.
As shown in Figure 8, this inquiry Q and the cluster dense district of P0, rarefaction is the most crossing, dilute with cluster P1
Dredge district to intersect.The one-dimensional key value scope mapped is [x1, y1], [x2, y2] [x3, y3].[x1, y1] is interval
Being sent in Chord ring retrieval, retrieval is sent in server in [x2, y2] [x3, y3] interval.
Obviously, the above embodiment of the present invention is only for clearly demonstrating example of the present invention, and not
It it is the restriction to embodiments of the present invention.For those of ordinary skill in the field, in described above
On the basis of can also make other changes in different forms.Here without also cannot be to all of enforcement
Mode gives exhaustive.All any amendment, equivalent and improvement made within the spirit and principles in the present invention
Deng, within should be included in the protection domain of the claims in the present invention.
Claims (3)
1. the distributed index method clustering separation, it is characterised in that comprise the following steps:
Step one: separation edge sparse vector, and use independent server centered storage edge sparse vector;
Step 2: set up distributed index, calculating needs the one-dimensional of edge sparse vector S of addition Chord ring
Key value Key (S), and this vector is inserted into distributed index, the detailed process that vector inserts is;
(21) if Key (S) >=n*C, wherein n is the number of cluster subspace, and C is a constant,
Its value more than all values in the DUAL PROBLEMS OF VECTOR MAPPING in IDistance index structure medium ring body to one-dimensional axle, then will be closed
Key value Key (s) and vector S are sent on independent server, then vector S are inserted into this separate server
B+In-Tree index, then this new vector has inserted;If Key (s) < n*C turns to step (22);
(22) keep hash function that Key (s) carries out Hash by position, generate and be assigned on Chord ring
Key value KeyChord, utilize Chord location algorithm, search key value KeyChordThe node IP ground that should store
Location, by KeyChordIt is sent on this node with this vector S, then vector S is inserted into the B of node+-Tree
In index, index foundation completes;
Step 3: carry out range query based on constructed index, if the distributed index method that cluster separates
(Q, r), wherein Q is vector to be checked to the range query Range of CS-Chord, and r is query context radius, step
Rapid as follows:
(31) (Q, r) with the intersecting area clustered, maps to calculate range query Range by IDistance
Interval [xi, yi] for multiple key values;
(32) if xi >=n*C, then by step (31) computer capacity inquiry Range, (Q, r) with cluster
Intersecting area is sent on separate server, goes to step (34), if xi is < n*C, turns to step (33);
(33) generate the key value scope [h (xi), h (yi)] in Chord ring, closed by table of query and routing location
The node at key assignments h (xi) place, if h (yi) is more than the key value maximum Key of stored data in nodemax,
Then by scope [Keymax, h (yi)] and it is sent to the descendant node of this node, if h (yi) is still than descendant node
KeymaxGreatly, then continue the descendant node toward it and send Query Information,
(34) each node receives inquiry request, at the B of this node+-Tree retrieves key value scope
In whether directed quantity exist, if there is vector Z, carrying out distance with vector Q to be checked and calculating, when distance is less than
Inquiry radius r, then return to be originally sent the node of request by vector Z, if distance is more than or equal to inquiry half
During the r of footpath, then return null value;If there is not vector, also return to null value.
The distributed index method that cluster the most according to claim 1 separates, it is characterised in that above-mentioned
Step one separation edge sparse vector, and use the tool of independent server centered storage edge sparse vector data
Body process is:
If the dense vector of cluster is designated as R with the separation of edge sparse vectorb, the radius of cluster is R, then from
The distance [0, R of cluster centre pointbRegion between] is dense vector district, [Rb, R] region be sparse vector
District;
At n Cluster space, the calculating for sparse vector district key value Key (S) increases on the basis of original
The distance of n*C, if vector S is a vector needing to add Chord ring, P is vector S point place cluster
Central point, then the computing formula of key value Key (S) of vector S is as follows:
Wherein 0≤i < n;
By formula (1), sparse vector is separated, and use independent server centered storage sparse data;
Then as key value Key (the S) >=n*C of query vector S, directly access the server lookup of centralized stores.
The distributed index method that cluster the most according to claim 1 separates, it is characterised in that above-mentioned
In step 2, position keeps hash function h to be defined as follows:
For data interval [Xmin, Xmax], interval [Y to be mapped that tomin, Ymax] and keep the consistent of data
Property, it is assumed that Xi∈[Xmin, Xmax], after being mapped by function h, value is Yi, Yi∈[Ymin, Ymax], then this Kazakhstan
Uncommon function is defined as:
Wherein, because of CS-Chord index key value interval be [0, Kmax], the identifier of Chord ring is empty
Between scope be [0,2m-1], thus can be by Xmin=0, Xmax=Kmax,Ymin=0, Ymax=2m-1 substitutes into formula (2), can
:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610287204.7A CN105868414B (en) | 2016-05-03 | 2016-05-03 | A kind of distributed index method that cluster is isolated |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610287204.7A CN105868414B (en) | 2016-05-03 | 2016-05-03 | A kind of distributed index method that cluster is isolated |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105868414A true CN105868414A (en) | 2016-08-17 |
CN105868414B CN105868414B (en) | 2019-03-26 |
Family
ID=56630062
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610287204.7A Expired - Fee Related CN105868414B (en) | 2016-05-03 | 2016-05-03 | A kind of distributed index method that cluster is isolated |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105868414B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110120918A (en) * | 2019-05-10 | 2019-08-13 | 北京邮电大学 | A kind of identification analytic method and device |
CN111582224A (en) * | 2020-05-19 | 2020-08-25 | 湖南视觉伟业智能科技有限公司 | Face recognition system and method |
CN113297331A (en) * | 2020-09-27 | 2021-08-24 | 阿里云计算有限公司 | Data storage method and device and data query method and device |
CN116541420A (en) * | 2023-07-07 | 2023-08-04 | 上海爱可生信息技术股份有限公司 | Vector data query method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103729434A (en) * | 2013-12-26 | 2014-04-16 | 乐视网信息技术(北京)股份有限公司 | Distributed index method and distributed index system for video data |
CN103744934A (en) * | 2013-12-30 | 2014-04-23 | 南京大学 | Distributed index method based on LSH (Locality Sensitive Hashing) |
-
2016
- 2016-05-03 CN CN201610287204.7A patent/CN105868414B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103729434A (en) * | 2013-12-26 | 2014-04-16 | 乐视网信息技术(北京)股份有限公司 | Distributed index method and distributed index system for video data |
CN103744934A (en) * | 2013-12-30 | 2014-04-23 | 南京大学 | Distributed index method based on LSH (Locality Sensitive Hashing) |
Non-Patent Citations (1)
Title |
---|
林朝晖 等: "高维分布式局部敏感哈希索引方法", 《计算机科学与探索》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110120918A (en) * | 2019-05-10 | 2019-08-13 | 北京邮电大学 | A kind of identification analytic method and device |
CN111582224A (en) * | 2020-05-19 | 2020-08-25 | 湖南视觉伟业智能科技有限公司 | Face recognition system and method |
CN113297331A (en) * | 2020-09-27 | 2021-08-24 | 阿里云计算有限公司 | Data storage method and device and data query method and device |
CN116541420A (en) * | 2023-07-07 | 2023-08-04 | 上海爱可生信息技术股份有限公司 | Vector data query method |
CN116541420B (en) * | 2023-07-07 | 2023-09-15 | 上海爱可生信息技术股份有限公司 | Vector data query method |
Also Published As
Publication number | Publication date |
---|---|
CN105868414B (en) | 2019-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8688708B2 (en) | Storing and retrieving objects on a computer network in a distributed database | |
CN105868414A (en) | Clustering separation distributive indexing method | |
CN104537091A (en) | Networked relational data query method based on hierarchical identification routing | |
CN105357247A (en) | Multi-dimensional cloud resource interval finding method based on hierarchical cloud peer-to-peer network | |
Xu et al. | Energy‐efficient big data storage and retrieval for wireless sensor networks with nonuniform node distribution | |
CN101719155B (en) | Method of multidimensional attribute range inquiry for supporting distributed multi-cluster computing environment | |
CN106020724A (en) | Neighbor storage method based on data mapping algorithm | |
CN101902388A (en) | Expandable fast discovery technology for multi-stage sequencing resources | |
Cui et al. | Efficient skyline computation in structured peer-to-peer systems | |
CN108829695A (en) | Flexible polymer K-NN search G-max method on road network | |
Mowshowitz et al. | Query optimization in a distributed hypercube database | |
CN108829694A (en) | The optimization method of flexible polymer K-NN search G tree on road network | |
Zhang et al. | Storing and querying semi-structured spatio-temporal data in hbase | |
CN105989078B (en) | A kind of method, the search method, apparatus and system of structured p2p network building index | |
Ganti et al. | MP-trie: Fast spatial queries on moving objects | |
Rosch et al. | Best effort query processing in dht-based p2p systems | |
CN102457568A (en) | Internet of things information service system and method for processing information on system | |
CN106202303A (en) | A kind of Chord routing table compression method and optimization file search method | |
Zhang et al. | Indexing historical spatio-temporal data in the cloud | |
Zhou et al. | HDKV: supporting efficient high‐dimensional similarity search in key‐value stores | |
Lee et al. | Supporting similarity range queries efficiently by using reference points in structured p2p overlays | |
Zhu et al. | DS-index: a distributed search solution for federated cloud | |
Jin et al. | SCQR-A P2P query routing algorithm based on semantic cluster | |
Lu et al. | Semantic information processing system based on CAN | |
Vu et al. | Simpson: Efficient similarity search in metric spaces over P2P structured overlay networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190326 Termination date: 20190503 |
|
CF01 | Termination of patent right due to non-payment of annual fee |