CN108829694A

CN108829694A - The optimization method of flexible polymer K-NN search G tree on road network

Info

Publication number: CN108829694A
Application number: CN201810342316.7A
Authority: CN
Inventors: 姚斌; 过敏意; 陈中普; 沈耀; 陈�全
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2018-04-17
Filing date: 2018-04-17
Publication date: 2018-11-16

Abstract

The invention discloses a kind of optimization methods of the flexible polymer K-NN search G tree on road network, include the following steps：One, establish G tree index；Two, it defines and initializes；Three, if queue is sky, terminate；Otherwise go out team and obtain x, into the 4th step；Four, if x is leaf node, for point v all inside x, calculated using optimization method(including initialization；Judge whether D is less thanAnd whether queue is empty；Team obtains out<dis,e>, judge whether e is point on road network), final result is updated, returns to third step after traversal；Otherwise enter the 5th step；Five, traverse the child node c of x, calculate all the points in Q to c minimum potential range, before obtainingThe maximum value max of minimum range or and sum, be denoted as τ；Six, if τ is less than r*, the child nodes of c are joined the team, third step is returned to；If τ is greater than or equal to r*, terminate.The present invention can effectively improveEfficiency reduce cost to promote inquiry velocity.

Description

The optimization method of flexible polymer K-NN search G tree on road network

Technical field

The invention belongs to computer fields, and in particular on the querying method of spatial database more particularly to a kind of road network Flexible polymer K-NN search G tree optimization method.

Background technique

Polymerizeing K-NN search (Aggregate nearest neighbor, hereinafter referred to as ANN) is in spatial database Classical inquiry, have wide application scenarios, such as based on location-based service etc..A given group polling point set Q, ANN is in data A point is found in point set V, so that the polymerization distance of this all the points into Q is minimum.This aggregate function is usually max Or sum.ANN problem theorem in Euclid space [referring to D.Papadias, Q.Shen, Y.Tao, and K.Mouratidis, “Group nearest neighbor queries,”in Data Engineering,2004.Proceedings.20th International Conference on.IEEE, 2004, pp.301-312.] and road network on [referring to D.Papadias, Q.Shen,Y.Tao,and K.Mouratidis,“Group nearest neighbor queries,”in Data Engineering,2004.Proceedings.20th International Conference on.IEEE,2004, Pp.301-312.] it is studied.

Many times, consider that the partial query point in Q is then more meaningful.Consider the example in Fig. 1, set of data points is V={ v₁,v₂,…,v₈,v₉, (circle), inquiry point set is Q={ q₁,q₂,q₃,q₄(triangle).Pay attention to v₃And q₃,v₅And q₄ The same position is shared respectively；q₁Positioned at (v₂,v₃) on, q₂Positioned at (v₃,v₆) on.Assuming that V is the position candidate for building harbour, Q It is small cargo collecting and distributing centre, and each collecting and distributing centre can store 1 ton of cargo daily.A candidate point is found in present V, is received Collect all cargos of Q, and makes polymerization distance minimum.At this moment the result of max-ANN is exactly v₂, distance is 16；The knot of sum-ANN Fruit is also v₂, distance is 52.Because of v₂Opposite is the "center" of Q, so we can intuitively understand this result.But If harbour only needs 2 tons of cargos daily, i.e., only needs to consider 50% small freight collecting and distributing centre, rather than consider institute in Q There is query point.More precisely, more generally inquiry is to allow a user to specify a parameterTarget is sought in V Look for a point so that the point into Q certainThe polymerization distance of a point is minimum, and this inquiry is known as flexible polymer most by we NN Query (flexible aggregate nearest neighbor, hereinafter referred to as FANN).If we enableThen max-FANN's the result is that v₃, distance is 2；The result of sum-FANN is also v₃, distance is 4.

FANN problem on present invention research road network.FANN inquiry be earliest proposed in theorem in Euclid space [referring to Y.Li, F.Li, K.Yi,B.Yao,and M.Wang,“Flexible aggregate similarity search,”in Proceedings of the 2011 ACM SIGMOD international conference on management of data.ACM,2011,pp.1009–1020.].It compares and theorem in Euclid space, many operations on road network are all more complicated.Such as Determine that the shortest distance of point-to-point transmission can determine in constant time in theorem in Euclid space, and the operation depends on most in road network Short-circuit algorithm.In order to propose more efficient FANN algorithm in road network, it is necessary to using the topological structure of road network, thus to not Possible candidate point carries out beta pruning.

It is reported that currently without other on road network about the research work of FANN.We are not to the research of FANN ANN is [referring to D.Papadias, Q.Shen, Y.Tao, and K.Mouratidis, " Group nearest on road network neighbor queries,” in Data Engineering,2004.Proceedings.20th International Conference on.IEEE, 2004, pp. 301-312.] simple extension.[D.Papadias,Q.Shen,Y.Tao,and K.Mouratidis,“Group nearest neighbor queries,”in Data Engineering, 2004.Proceedings.20th International Conference on.IEEE, 2004, pp.301-312.] in IER algorithm relies on R tree, but R tree shows and bad on road network.[D.Yan,Z.Zhao,and W. Ng,"Efficient algorithms for finding optimal meeting point on road networks,”Proceedings of The VLDB Endowment, vol.4, no.11,2011.] method that has used convex closure to carry out beta pruning to impossible point, But its scalability is bad.[M.Safar,"Group k-nearest neighbors queries in spatial network databases,”Journal of geographical systems,vol.10,no.4,pp.407–416, 2008.][L.Zhu,Y.Jing,W.Sun,D.Mao,and P.Liu,“Voronoi-based aggregate nearest neighbor query processing in road networks,”in Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems.ACM, 2010, pp.518-521.] subregion is carried out to road network using Voronoi diagram, but they often result in division It is unbalanced, so as to cause inefficient.Further, since the parameter being newly addedThe result of FANN can be more difficult to find.In Q ArbitrarilyPoint can become target, and scale is reachable

Therefore, a kind of method for needing to research and develop FANN problem that can solve on road network.

Summary of the invention

The technical problem to be solved in the present invention is that providing a kind of optimization of the flexible polymer K-NN search G tree on road network Method, this method significantly less can reduceCall number reduce cost to promote inquiry velocity.

In order to solve the above technical problems, the present invention adopts the following technical scheme that：

The present invention provides a kind of optimization method of the flexible polymer K-NN search G tree on road network, specifically rightCalculating process optimize, include the following steps：

The first step establishes G tree index to entire road network；

Second step, definition and initialization：

It defines road network G=(V, E, W), wherein V indicates vertex, and E indicates side, and W indicates the weight on side, δ (v_i, v_j) indicate v_i To v_jRoad network distance；Q is query set (query objects), size M；FANN is query-defined to be：One FANN is looked into Inquiry is a five-tupleReturn to a tripleSo that：

Wherein p^*It is to make flexible polymer apart from the smallest point in V,It is the optimal elastic subset of Q, r^*It is exactly at this time Flexible polymer distance；

DefinitionFor flexible polymer function, the subset Q that it receives point p, a V that one belongs to V is used as input, Return to onePair as a result, meet：

WhereinBe a subset of Q and

Initialization：By r^*It is initialized as infinity；A priority query is constructed, the root node of G tree is joined the team；

Third step judges whether queue is empty；If queue is sky, terminate；Otherwise go out team and obtain x；

4th step judges whether x is leaf node；If x is leaf node, for v all inside x, calculateFinal result is updated if necessary, and third step is returned to after traversal；Otherwise, into the 5th step；The calculatingUsing the optimization method included the following steps：

1) initializing variable is sky apart from list D；Safeguard a minimum priority query, store q to G tree node or The distance of road network point will by distance-taxis<The root node of 0, G tree>It joins the team；Calculate the inquiry point list about Q；

2) if the size of D is less thanAnd queue is not sky, is entered step 3)；Otherwise calculate D maximum value max or Person and sum, as r^*；

3) go out team to obtain<dis,e>If e is the point on road network, dis is put into D, returns step 2)；Otherwise, e is Point on G tree traverses the point v in the inquiry point list of e, calculates the distance of p to v, and v is joined the team, and returns step 2)；

5th step, traverses the child node c of x, calculate all the points in Q to c minimum potential range, before obtainingMost narrow spacing From maximum value max or and sum, be denoted as τ；

6th step, judges whether τ is greater than or equal to r^*；If τ is less than r^*, the child nodes of c are joined the team, third step is returned to； If τ is greater than or equal to r^*, then terminate.

As the technical solution that optimizes of the present invention, in the first step, it is described is established by G tree and is indexed for entire road network be specially：It is first First original image carries out subgraph division, and each subgraph mutually disjoints, and similar division is then carried out to subgraph, by this recursive Number of the mode inside subgraph comprising data point is less than the threshold value of setting；Calculate the distance of each G tree boundary point of graph Matrix.

As the technical solution that the present invention optimizes, the distance matrix is in construction using the realization side of δ on following G tree Method：

Given road network point u and v, it is assumed that the leaf node where it is respectively C_uAnd C_v；

Work as C_u=C_v, local dijkstra's algorithm is executed in the leaf node first；If in algorithm implementation procedure not Comprising any boundary point, it is considered that local dijkstra's algorithm is efficient enough；Otherwise, stop dijkstra's algorithm, under use The formula in face calculates δ (u, v)：

δ (u, v)=min { δ (u, b₁)+δ(b₁, b₂)+δ (v, b₂)|b₁, b₂∈B_c}

Wherein B_cIt is C_uOr C_vBoundary point set；

Work as C_u≠C_v, it is clear that the boundary point of leaf node where must travel respectively from the path that u reaches v enables C_AFor C_uAnd C_vThe public father node of minimum, then the shortest path from u to v is bound to bottom-uply from C_uTo C_A, then from push up to Lowerly from C_ATo C_v, it is formulated as：

δ (u, v)=min (δ (u, b₁)+δ (u, b₂)+…+

δ(b_m-1, b_m)+…+δ(b_n, v))

Wherein b1, b2 ..., bn are Cu, the boundary point of .., Cv respectively.

As the technical solution that optimizes of the present invention, the implementation method of δ, is solved, general using dynamic programming method on the G tree General objective δ (u, v) is decomposed into a series of sub-goals, and by storing pilot process, the value of δ (u, v) is obtained in linear session.

As the technical solution that optimizes of the present invention, in the step 1) of the 4th step, the inquiry about Q is calculated in the initialization Point list, i.e., the point q in each Q, the node in which G tree include each node of it and G tree, which child node packet Include Q.

As the technical solution that optimizes of the present invention, in the 5th step, the minimum potential range of all the points to c are G trees in the Q Minimum potential range of the node to road network point.

As the technical solution that optimizes of the present invention, in the 5th step, the τ is lower bound of the dynamic threshold as r*, i.e., The polymerization distance of any p has to be larger than τ, so if τ is greater than or equal to r^*, then terminate.

As the technical solution that optimizes of the present invention, in the 5th step, including the implementation method of θ (u, v), it is specific as follows：Use θ (u, v) carries out beta pruning, it is directly regarded as Euclidean distance by lower bound of the θ (u, v) as distance；It is closed using triangle is not equal System：Assuming that w is third point, then δ (u, v) >=δ (w, u)-δ (w, v) and δ (u, v) >=δ (w, v)-δ (w, u) are set up simultaneously, Therefore θ (u, v)=max | δ (w, u)-δ (w, v) |, d^ε(u,v)}。

As the technical solution that optimizes of the present invention, in the 5th step, in the implementation method of the θ (u, v), in order to further make The boundary of θ (u, v) is tighter, and some road signs are arranged in advance, using the point in road sign successively as third point, according to triangle etc. Relationship takes maximum one.

As the technical solution that optimizes of the present invention, in second step, the element of the priority query, queue storage is binary Group<C, d>, wherein c is G tree node, and the calculation of d is as follows：Calculate Q in all the points to c minimum potential range, before obtainingThe maximum value max of minimum range or and sum, as d, that is, the τ of the 5th step, priority is according to the big minispread of d.

Compared with prior art, the invention has the advantages that：

1, top-down traversal is carried out by G tree, can rapidly accesses entire road network.

2, the distance matrix of G tree storage allows the calculating of G tree node to the distance of road network point complete in linear session At.Significantly less it can reduce in this wayCall number reduce cost to promote inquiry velocity.

It 3, will by G treeIt is changed into a KNN problem；Because calculating the convenient of shortest path on G tree, the G tree Optimization method efficiency is very high.

Detailed description of the invention

Present invention will be further explained below with reference to the attached drawings and examples.

Fig. 1 is the example schematic of FANN.

Fig. 2 is the structural map of an example for G tree.

Fig. 3 is the stored matrices schematic diagram of distance.

Fig. 4 is the flow chart of the optimization method of the flexible polymer K-NN search G tree on road network of the present invention.

Fig. 5 is the present inventionOptimization algorithm flow chart.

Fig. 6 is the Path extension schematic diagram of " realization of δ on G tree " in the present invention.

Fig. 7 is the efficiency comparative result schematic diagram of the variation A of G-max algorithm of the present invention and Baseline rudimentary algorithm.

Fig. 8 is the efficiency comparative result schematic diagram of the variation A of G-sum algorithm of the present invention and Baseline rudimentary algorithm.

Fig. 9 is the efficiency comparative result schematic diagram of the variation M of G-max algorithm of the present invention and Baseline rudimentary algorithm.

Figure 10 is the efficiency comparative result schematic diagram of the variation M of G-sum algorithm of the present invention and Baseline rudimentary algorithm.

Figure 11 is the variation of G-max algorithm and Baseline rudimentary algorithm of the present inventionEfficiency comparative result schematic diagram.

Figure 12 is the variation of G-sum algorithm and Baseline rudimentary algorithm of the present inventionEfficiency comparative result schematic diagram.

Specific embodiment

In conjunction with the accompanying drawings, the present invention is further explained in detail.These attached drawings are simplified schematic diagram, only with Illustration illustrates basic structure of the invention, therefore it only shows the composition relevant to the invention.

1 problem definition

Road network can be expressed as the undirected figure for having weight, and G (V, E, W), wherein V is vertex set, and E is the collection on side It closes, W is the mapping of E to positive real number, indicates the weight on side.Enabling δ is the distance function being defined on G, δ (v_i, v_j) indicate v_iIt arrives v_jRoad network distance.It is worth noting that, the weight on side needs not be equal to the Euclidean distance of point-to-point transmission.For example, it can be through Spend the time of side needs.Obviously, if the weight on side and Euclidean distance are proportional, conversion is just very simple.We are using similar In [M.L.Yiu, N.Mamoulis, and D.Papadias, " Aggregate nearest neighbor queries in road networks,”IEEE Transactions on Knowledge and Data Engineering,vol.17, No.6, pp.820-833,2005.] method (normalization) handle arbitrary weight.Firstly, we calculate a ratio system Number：

, wherein d^ε(v_i, v_j)) indicate v_iTo v_jEuclidean distance.Then we are by all weights multiplied by ratio system above Number.In this way, Euclidean distance is still its lower bound.

We indicate inquiry point set (query objects), size M using Q.Indicate elastic parameter, wherein | V | =N, | Q |=M,Between (0,1).For ease of description, it will be assumed that all query points are on the vertex of figure, i.e.,G is enabled to indicate an aggregate function, it is defined on a point p and a point set P, it can be most in the present invention Big value max or and sum：

Wherein | P |=k, v_iBelong to P.

In this way, we can define flexible polymer functionThe subset Q that it receives point p, a V that one belongs to V makees For input, one is returnedPair as a result, meet：

WhereinBe a subset of Q and

Our target is that a point p is found in V^*So that r^pIt is minimum.One FANN, which is inquired, to be with formal definition：One A FANN inquiry is a five-tupleReturn to a tripleSo that：

Wherein p^*It is to make flexible polymer apart from the smallest point in V,It is the optimal elastic subset of Q, r^*It is exactly at this time Flexible polymer distance.

Given G, Q and parameterTarget be in V searching one point so that the point into Q certainA point Polymerization distance (usually sum or max) is minimum.

2. violence method

Firstly, we first discussRealization.A p and Q is given, we at most haveMultiple selections are to determine However, it is not necessary to go to consider every kind of possibility.Looking back dijkstra's algorithm, (dijkstra's algorithm is by Dutch computer science Family Dick Si Tela is proposed in nineteen fifty-nine, therefore is called Dijkstra algorithm.It is from a vertex to remaining each vertex Shortest path first, solution is shortest route problem in digraph.Dijkstra's algorithm is mainly characterized by Center extends layer by layer outward, until expanding to terminal) implementation procedure：In its each spread step, it has been chosen The nearest point having not visited of point, and the neighbours of the point are updated to the distance of starting point.This process can also be appliedFirstly, enabling p is starting point, dijkstra's algorithm is called, until having in QA point is marked as accessing.At this time These labeled points are exactlyIt is exactly r that it, which polymerize distance,^p.It is not difficult to find that we can also beRegard as about The kNN of p and Q is inquired, wherein

According to the definition of above-mentioned FANN, we can design the violence solution of FANN：We run the p in each VAlgorithm.In the process, we safeguard a smallest r^p?.It is noted that we can use the r^pOr It is realized using Euclidean distance as lower bound and introduces a wheel iteration in advance.Similar strategy is to any useAlgorithm have Effect.

Now it is contemplated that the time complexity of violence method.BecauseThere is an identical complexity with Dijkstra, i.e. O (| E |+N lgN) (assuming that the most rickle used is Fibonacci heap), wherein | E | it is the number on side in road network.Therefore, total time For O (N | E |+N² lgN)。

Intuitively, we optimize the violence algorithm there are two types of method：First is that carrying out beta pruning (i.e. to the point in V as far as possible It reducesCall number), second is that improveEfficiency.Content herein below will focus on discussion both methods.

3. the algorithm based on G tree

We realize FANN algorithm using index structure G tree.1) it can meet simultaneously cuts the point in V as far as possible Branch；2) it improvesEfficiency.

The construction of G tree：One subgraph of given figure G (V, E, W), we are first according to the position of their abutment points subgraph Midpoint is divided into internal point and boundary node.For an internal point, its all abutment points same height belonging to the point In figure.For a boundary point, its subgraph of at least one abutment points not belonging to it.G tree is a balanced tree.Each Nonleaf node have B (>=2) a child nodes, each leaf node include at most T element.Recursive side can be used in we Method constructs G tree.We divide figure herein, and the specific method is as follows：Firstly, being obtained by deleting some sides or point The figure of coarseness；Then, figure is divided into small-scale；Finally, re-mapping back original image.Fig. 2 is an example for G tree.

The storage of G tree：Storage model is the key that G tree.To reduce space expense, point data only is stored in leaf node. Each node is identified by an ID, and stores the ID on all boundaries, the ID of father node and the ID of child nodes.It should be noted that , the ID of road network point and the ID of tree node be not or not the same field.We calculate in advance in same layer and store some distances. Specifically, non-leaf nodes safeguards the mutual distance of the boundary point of its child；Leaf node safeguards its boundary point and is included Road network point distance.

By taking Fig. 2 as an example, the boundary point of G1 is { v3, v4 }, and the boundary point of G2 is { v6, v7 }.Therefore G0 will safeguard v3, v4, The distance between v6, v7 }.The boundary point of G3 is { v3 }, it is therefore desirable to safeguard the distance between { v1, v2, v3 }.We use Matrix indicates that (top half of matrix is omitted in we to pre-stored distance, because of δ (v_i, v_j)=δ (v_j, v_i)), such as Fig. 3 institute Show.

The search of FANN on G tree：Two o'clock u and v on given V, we indicate the minimum between u and v using θ (u, v) Potential range.Similarly, a node C for giving G tree, enables B_CFor the set of boundary point, and define θ (u, C) be v to C most Small potential range：

We are the FANN algorithm description in G tree in following algorithm 1.We start queue from top to getting off to traverse G tree In be put into the root node of G tree.When reaching leaf node, it includes all road networks points will be processed, and updates result ( 7-9 row).If it is non-leaf nodes is reached, if its lower bound can be to all road networks of its inside greater than current optimal value Other G tree points carry out beta pruning (terminating algorithm) in point and queue, its child nodes is otherwise put into queue (10-16 Row).

Input：

Output：

The max-FANN algorithm of 1 G tree of algorithm

As shown in figure 4, the optimization method of the flexible polymer K-NN search G tree on road network of the present invention, includes the following steps：

The first step establishes G tree index to entire road network (figure) first.Specifically, subgraph division is carried out to original image first, Each subgraph mutually disjoints, and similar division is then carried out to subgraph, includes inside subgraph by this recursive mode The number of data point is less than the threshold value of setting.Then the distance matrix for calculating each G tree boundary point of graph (is constructed apart from square Need to use the realization algorithm of δ on hereafter G tree when battle array).

Second step, initialization.r^*It is initialized as infinity；A priority query is constructed, the root node of G tree is joined the team. The element of the priority query, queue storage is binary group<C, d>, wherein c is G tree node, and the calculation of d is as follows： Calculate Q in all the points to c minimum potential range, before obtainingThe maximum value max of minimum range, as d, that is, the The τ of five steps, priority is according to the big minispread of d.

Third step judges whether queue is empty；If queue is sky, terminate algorithm；Otherwise go out team and obtain x.

4th step judges whether x is leaf node；If x is leaf node, for v all inside x, calculate(on G tree i.e. hereafterRealization, useOptimization algorithm, see below algorithm 2), update if necessary most Eventually as a result, returning to third step after traversal；Otherwise, into the 5th step.

5th step, traverses the child node c of x, calculate all the points in Q to c minimum potential range (θ's (u, v) i.e. hereafter Realize), before obtainingThe maximum value max of minimum range, is denoted as τ.The minimum potential range of all the points to c are G trees in the Q Minimum potential range of the node to road network point.The τ is a dynamic threshold as r^*Lower bound, i.e., the polymerization distance of any p τ is had to be larger than, so if τ is greater than or equal to r^*, then terminate.

6th step, judges whether τ is greater than or equal to r^*；If τ is less than r^*, the child nodes of c are joined the team, third step is returned to. If τ is greater than or equal to r^*, then terminate algorithm.

On G treeRealization：We can beRealization process regard the kNN about p and Q, i.e. distance p as more Close query point is paid the utmost attention to.Because we have stored the mapping of road network point id to tree node ID, we are easily determined Which node stores some specific road network point.For example, have in Fig. 2 we can determine whether the node comprising v1 G6, G2, G0}.Therefore, give a Q, we can determine whether 1) for leaf node, it includes Q in query point；2) for n omicronn-leaf Child node, it includes child's nodes of the query point in Q.This is called inquiry point list by we.Such as in Fig. 2, it is assumed that Q= { v1, v4, v5, v8, v9 }, the inquiry point list in G3 are { v1 }, and it is { v4, v5 }, the query point classification of G1 that G4, which inquires point list, For { G3, G4 }.

The node C of given road network point q and tree, we are defined as σ (u, C) distance of q to C：

In the calculating process of obvious σ (u, C), many paths are shared, it is possible to more efficient calculating is put using this.

WeOptimization realize description algorithm 2 again.We safeguard a priority query, storage<Dis, obj>And It sorts by dis, wherein obj is a point in Q or a node on G tree, dis are the distances of q to obj.Firstly, We<0,root>It joins the team (the 3rd row), then we iteratively take object (the 5th row) from queue.If that team is G out The node of tree, all elements in the query object list of the node are put into queue by we.If team is road network point out, that It centainly belongs to Q, then final result (the 7th row) is added.In addition, out team necessarily more than the element value (distance) in queue It is small, in this way we it is easily verified that algorithm correctness.

Input：

Output：r^p

Algorithm 2Optimization

As shown in figure 5, of the inventionOptimization implementation method (algorithm 2), include the following steps：

1) it initializes.It is sky apart from list D；It safeguards a minimum priority query, stores q to G tree node or road network The distance of point, will<The root node of 0, G tree>It joins the team；Inquiry point list (the point q in i.e. each Q, which the G tree about Q calculated In node include it；And each node of G tree, which child node include Q).

2) if the size of D is less thanAnd queue is not sky, is entered step 3)；Otherwise the max or sum of D are calculated, As r^p。

3) go out team to obtain<dis,e>If e is the point (being certainly also the point in Q) on road network, dis is put into D, then return To step 2)；Otherwise, e is the point on G tree, traverses the point v in the inquiry point list of e, calculates the distance of q to v, and v is entered Team returns step 2).

Advantage：Top-down traversal is carried out by G tree, can rapidly access entire road network；G tree storage apart from square Battle array allows G tree node to the calculating completion in linear session again of the distance of road network point.Significantly less it can reduce in this way Call number.It, will by G treeIt is changed into a KNN problem.Equally because calculating the convenient of shortest path on G tree, on It is very high to state efficiency of algorithm.

The realization of δ on G tree：This method can be applied in G tree index constitution step, i.e., the above-mentioned first step.Given road network Point u and v, it is assumed that the leaf node where it is respectively C_uAnd C_v.Work as C_u=C_v, our executive boards first in the leaf node The dijkstra's algorithm in portion.If not including any boundary point in algorithm implementation procedure, it is considered that part Dijkstra Algorithm is efficient enough.Otherwise, we stop dijkstra's algorithm, calculate σ (u, v) using following formula:

δ (u, v)=min { δ (u, b₁)+δ(b₁, b₂)+δ (v, b₂)|b₁, b₂∈B_c}

Wherein B_cIt is C_uOr C_vBoundary point set.

Work as C_u≠C_v, it is clear that the boundary point of leaf node where must travel respectively from the path that u reaches v.Enable C_AFor C_uAnd C_vThe public father node of minimum.Shortest path so from u to v is bound to bottom-uply from C_uTo C_A, then from push up to Lowerly from C_ATo C_v.Such as the shortest path of v1 to v6 is bound to by G3, G1, G2, the boundary point of G5, process in Fig. 2 As shown in Figure 6.We can be expressed as with formula：

δ (u, v)=min (δ (u, b₁)+δ (u, b₂)+…+

δ(b_m-1, b_m)+…+δ(b_n, v))

Wherein b1, b2 ..., bn are Cu, the boundary point of .., Cv respectively.Dynamic programming method solution can be used in we.Tool For body, our general objective δ (u, v) can be decomposed into a series of sub-goals, by storing pilot process, it can online The value of δ (u, v) is obtained in the property time.

The realization of θ (u, v)：This method can apply the 5th step in above-mentioned steps.We have seen that in above-mentioned algorithm Beta pruning is carried out using θ (u, v).Lower bound of the θ (u, v) as distance, it directly can of course be regarded as Euclidean distance by we. But many times, this lower bound is not tight enough.We can use the triangle relationships such as not：Assuming that w is third point, then δ (u, v) >=δ (w, u)-δ (w, v) and δ (u, v) >=δ (w, v)-δ (w, u) is set up simultaneously.Therefore θ (u, v)=max | δ (w, u)- δ (w, v) |, d^ε(u, v).In order to further make the boundary of θ (u, v) tighter, some road signs are arranged in we in advance at random, in road sign Point be successively used as " third point ", according to the triangle relationships such as not, take maximum one.The number foot of general road sign It is enough small, therefore pretreated cost also very little.

4 experiments

4.1 setting

We realize algorithm above using standard C++, and the running experiment on a Linux machine, machine are matched Setting is 64 Intel Xeon 3.30GHz CPU, 16GB RAM.We use the LRU cache of a 1M size.All roads Network data is all from real world.As shown in the table：

For the FANN problem in road network, there are many factors for influencing expense.In our experiment, we are primarily upon 3 most important：

● the coverage rate of A, Q

● the size of M, Q

● Elastic parameter

We change these three variables one by one.When changing one of them, other two are remained unchanged.A, M andDefault Value is 0.6,60,0.6 respectively.It is limited for length, in addition to illustrating SF, LKS, CTR and USA number when measuring scalability According to collection, we default the result for only showing SF data set.Furthermore we have studied G trees for 5.4 parts on different data sets (opposite) optimized parameter, default choice B=6, T=200 (SF), T=300 (LKS), T=400 (CTR) and T=500 (USA). 4.2 efficiency

Firstly, we investigate the efficiency comparative result schematic diagram of inventive algorithm Yu Baseline rudimentary algorithm. Baseline rudimentary algorithm refers to：The road network point in V is traversed, is then run based on dijkstra's algorithm

Change A：We change A using 0.1,0.2,0.4,0.6,0.8 and 1.0.As a result see Fig. 7, Fig. 8.

Change M：We change M using 10,20,30,40,60,80,100.As a result see Fig. 9, Figure 10.

VariationWe are changed using 0.1,0.2,0.4,0.6,0.8,1.0The result is shown in Figure 11, Figure 12.

Taking the above-mentioned ideal embodiment according to the present invention as inspiration, through the above description, relevant staff is complete Various changes and amendments can be carried out without departing from the scope of the technological thought of the present invention' entirely.The technology of this invention Property range is not limited to the contents of the specification, it is necessary to which the technical scope thereof is determined according to the scope of the claim.

Claims

1. a kind of optimization method of the flexible polymer K-NN search G tree on road network, which is characterized in that include the following steps：

The first step establishes G tree index to entire road network；

Second step, definition and initialization：

It defines road network G=(V, E, W), wherein V indicates vertex, and E indicates side, and W indicates the weight on side, δ (v_i, v_j) indicate v_iTo v_j's Road network distance；Q is query set (query objects), size M；FANN is query-defined to be：One FANN inquiry is one Five-tupleReturn to a tripleSo that：

Wherein p^*It is to make flexible polymer apart from the smallest point in V,It is the optimal elastic subset of Q, r^*It is exactly bullet at this time Property polymerization distance；

DefinitionFor flexible polymer function, it receives the subset Q of point p, a V that one belongs to V as input, returns OnePair as a result, meet：

WhereinBe a subset of Q and

4th step judges whether x is leaf node；If x is leaf node, for v all inside x, calculateSuch as It is necessary to update final result, third step is returned to after traversal；Otherwise, into the 5th step；The calculatingUsing including The optimization method of following steps：

1) initializing variable is sky apart from list D；It safeguards a minimum priority query, stores q to G tree node or road network The distance of point will by distance-taxis<The root node of 0, G tree>It joins the team；Calculate the inquiry point list about Q；

2) if the size of D is less thanAnd queue is not sky, is entered step 3)；Otherwise calculate D maximum value max or and Sum, as r^*；

3) go out team to obtain<dis,e>If e is the point on road network, dis is put into D, returns step 2)；Otherwise, e is on G tree Point, traverse the point v in the inquiry point list of e, calculate the distance of p to v, and v is joined the team, return step 2)；

5th step, traverses the child node c of x, calculate all the points in Q to c minimum potential range, before obtainingMinimum range Maximum value max or and sum, be denoted as τ；

6th step, judges whether τ is greater than or equal to r^*；If τ is less than r^*, the child nodes of c are joined the team, third step is returned to；If τ is greater than or equal to r^*, then terminate.

2. the method as described in claim 1, which is characterized in that described to establish G tree index specifically to entire road network in the first step For：Original image carries out subgraph division first, and each subgraph mutually disjoints, similar division is then carried out to subgraph, is passed by this Number of the mode returned inside subgraph comprising data point is less than the threshold value of setting；Calculate each G tree boundary point of graph Distance matrix.

3. method according to claim 2, which is characterized in that the distance matrix is in construction using the reality of δ on following G tree Existing method：

Work as C_u=C_v, local dijkstra's algorithm is executed in the leaf node first；If do not included in algorithm implementation procedure Any boundary point, it is considered that local dijkstra's algorithm is efficient enough；Otherwise, stop dijkstra's algorithm, using following Formula calculates δ (u, v):

δ (u, v)=min { δ (u, b₁)+δ(b₁, b₂)+δ (v, b₂)|b₁, b₂∈B_c}

Wherein B_cIt is C_uOr C_vBoundary point set；

Work as C_u≠C_v, it is clear that the boundary point of leaf node where must travel respectively from the path that u reaches v enables C_AFor C_uWith C_vThe public father node of minimum, then the shortest path from u to v is bound to bottom-uply from C_uTo C_A, then it is top-down from C_ATo C_v, it is formulated as：

δ (u, v)=min (δ (u, b₁)+δ (u, b₂)+…+

δ(b_m-1, b_m)+…+δ(b_n, v))

Wherein b1, b2 ..., bn are Cu, the boundary point of .., Cv respectively.

4. method as claimed in claim 3, which is characterized in that the implementation method of δ on the G tree, using dynamic programming method It solves, general objective δ (u, v) is decomposed into a series of sub-goals, by storing pilot process, δ (u, v) is obtained in linear session Value.

5. the method as described in claim 1, which is characterized in that in the step 1) of the 4th step, calculate in the initialization about Q Inquiry point list, i.e., the point q in each Q, the node in which G tree includes each node of it and G tree, which sub- knot Point includes Q.

6. the method as described in claim 1, which is characterized in that in the 5th step, the minimum of all the points to c may be away from the Q From being minimum potential range of the G tree node to road network point.

7. the method as described in claim 1, which is characterized in that in the 5th step, the τ is a dynamic threshold as r^*Under The polymerization distance on boundary, i.e., any p has to be larger than τ, so if τ is greater than or equal to r^*, then terminate.

8. the method as described in claim 1, which is characterized in that in the 5th step, including the implementation method of θ (u, v), specifically such as Under：Beta pruning is carried out using θ (u, v), it is directly regarded as Euclidean distance by lower bound of the θ (u, v) as distance；Utilize triangle The relationships such as not：Assuming that w is third point, then δ (u, v) >=δ (w, u)-δ (w, v) and δ (u, v) >=δ (w, v)-δ (w, u) are simultaneously It sets up, therefore θ (u, v)=max | δ (w, u)-δ (w, v) |, d^ε(u, v) }.

9. method according to claim 8, which is characterized in that in the 5th step, in the implementation method of the θ (u, v), in order into One step keeps the boundary of θ (u, v) tighter, and some road signs are arranged in advance, using the point in road sign successively as third point, according to triangle The relationships such as not, take maximum one.

10. the method as described in claim 1, which is characterized in that in second step, the priority query, the member of queue storage Element is binary group < c, d >, and wherein c is G tree node, and the calculation of d is as follows：The minimum for calculating all the points to c in Q may be away from From before obtainingThe maximum value max of minimum range or and sum, as d, that is, the τ of the 5th step, priority is according to the big of d Minispread.