CN106649846A

CN106649846A - Geographic space interest point retrieval method based on diversity

Info

Publication number: CN106649846A
Application number: CN201611254804.XA
Authority: CN
Inventors: 才智; 李彤; 兰许; 曹阳; 丁治明
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2016-12-30
Filing date: 2016-12-30
Publication date: 2017-05-10
Anticipated expiration: 2036-12-30
Also published as: CN106649846B

Abstract

The invention discloses a geographic space interest point retrieval method based on diversity in order to obtain front k spatial positions. The method includes the following steps that 1, given position points or given combinations of the position points and keywords are subjected to initialized sorting; 2, other nodes are subjected to weakening of geographic space according to the geographic position where a selected node with the highest grade is located; 3, when end conditions are not met, a new node is selected. In conclusion, new grades of remaining nodes in R obtained after weakening of texts and the space are calculated, and the node with the highest grade is selected from the nodes. Finally, the front k spatial positions are obtained through an algorithm for the position points or the combinations of the position points and the keywords input by a user, and k pieces of most comprehensive information are returned to the user according to the weights of the texts and the spatial positions.

Description

Based on multifarious geographical space interest point search method

Technical field

The invention belongs to Data Mining, is related to a kind of based on multifarious geographical space interest point search method.

Background technology

In recent years, due to the popularization of global position system GPS on mobile device (such as smart mobile phone), location Based service (LBS) extensive concern of academia and industrial quarters has been obtained.Many location Based services are obtained for popularization and apply, and bring The related retrieval experience of customer location.

Existing LBS systems help user that position correlation is found from spatial database by the way of keyword retrieval As a result.Specifically, it is assumed that have one group of point of interest (POI points) in spatial database, wherein each POI point includes positional information With certain text message.The position of given user and a group polling keyword, LBS systems return from space and text all with The related POI points of inquiry.But now most LBS systems are that k bars before fraction ranking are directly extracted from database Information, in order to make up without the deficiency for comprehensively considering locus, present invention proposition is a kind of all to cut to text and space Weak algorithm, so as to get final result is as far as possible comprising on each direction.

The technology introduces tuple-set (Object Summaries, be abbreviated as OS), it be comprising positional information and The set based on locus and the information tuple of text generated in the spatial database of certain text message.One OS can To be with the data tuple comprising given text message and locus as root, with locus and the adjacent segments of the information of text Point is the tree structure of its descendant nodes.In order to generate OS, one is possessed with regard to inquiring about data subject (Data Subjects, is abbreviated as DS) relation of information, this relation is abbreviated as R^DS, it is the root of tree structure；Another need with R^DSThe relation of link, that is, generate R^DSDescendants.For each R^DSFor can form a DS ideograph, that is, G^DS.This technology be according to generate OS come constantly carry out beta pruning optimization finally draw important information.

There may be thousands of bar tuple informations in one complete OS, these information are all included not only to disappear More times are consumed, and it is also extremely difficult to choose useful information for oneself wherein to user, so selecting Choose the most useful tuple information of k bars；To the natural number k being input into, will obtain with algorithm (referring to step 3.3) in whole OS To the more comprehensive information of k bars, in order to avoid a plurality of similar information repeats, this k bars information is set to go up to greatest extent The more diversified information of user is presented to, allows users to more fully understand information, present invention introduces Spatial diversity and text This method with two kinds of balance information importances of weight shared by space.This method can not only greatly reduce the consumption of time, Improve return information efficiency, and disclosure satisfy that user to search for information diversified demand, so as to get locus point Not only only it is partial to a certain orientation.

The content of the invention

It is an object of the invention to provide a kind of be based on multifarious geographical space interest point search method, it is defeated to user institute The location point for entering or location point and crucial contamination, obtain front k locus, further according to text and space bit with algorithm Put shared weight and return to the most comprehensive information of user k bars.

For achieving the above object, the technical solution used in the present invention is based on multifarious geographical space interest point search side Method, to obtaining front k locus, method realizes that step is as follows：

Step one：For given location point or location point carry out initialization sequence with crucial contamination；

Step 1.1：Collect and disposal data collection, build data relationship.At this moment digraph G (V, E), wherein V are defined (v₁,...,v_n) it is node (summit) collection, node on behalf various information here, E is the set of representative edge (arc), E=<v_i, v_j>|v_i,v_j∈ V },<v_i,v_j>Represent from v_iTo v_jA line (arc), v₁,...,v_nThe arbitrary node in digraph is represented, this In n be natural number；

Step 1.2：By below equation to calculate R in each node v_iFraction：

DF(v_i)=[fs (v_i)*ds(v_i)]^as*[ft(v_i)*dt(v_i)]^at*[fg(v_i)*dg(v_i)]^ag (1)

Wherein fs (.), ft (.), fg (.) are respectively social (social) parameter, text (textual) parameter and geography (geographical) fraction of parameter, ds (.), dt (.), dg (.) is respectively corresponding diversity fraction, the sum of as, at, ag For 1, affect for controlling each parameter.

Diversity fraction is calculated by below equation：

Wherein ss (v_i,v_j) it is v_iAnd v_jThe difference of social parameters, is calculated using Jaccard distances Ibid, the value of dt (.) and dg (.) is calculated.

To sum up, the fraction of each node in data set is iterated to calculate out, and selects node mid-score highest node v₀。

Step 2：Geographical space is carried out to other nodes according to the geographical position that the fraction highest node for selecting is located Weaken；

Step 2.1：Fraction highest node according to selecting in step one is associated the weakening of relation to other summits While be also carried out the weakening of geographical space, it is assumed that fraction highest node v₀Location point to initial position p distance be d (p,v₀), the distance of initial position to other nodes is d (p, v_i), v₀Distance to other nodes is d (v₀,v_i), then pass through Below equation is calculating geographical space value：

Knowable in formula 3, d (v₀,v_i) it is v₀Distance to other nodes is bigger, and required geographical space value is bigger, says Bright node v_iBigger with the nodal distance for selecting, two node directions spatially are also just different.

To sum up, selected node is calculated successively to geographical space value d of remaining remaining node_i。

Step 3：When termination condition is unsatisfactory for, new node is selected；

Step 3.1：Assume that the result after weakening to incidence relation is a, weight shared by text is α, then remaining node weakens Textual value afterwards is a × α；

Step 3.2：Assume to weight shared by space to be β, wherein alpha+beta=1, then the spatial value after remaining node weakens is d ×β；

Step 3.3：The fraction after remaining node weakens to text and space is calculated by below equation：

DF′(v_i)=DF (v_i)×(a×α+d×β) (4)

To sum up, calculate in R new fraction of the remaining node after the weakening to text and space, then therefrom select point Number highest node.So the process for selecting k result is：

1.) queue H is initialized_kFor sky, input position point or location point and crucial contamination；

2.) according to input information, data relationship is built；

3. the fraction of each node) is calculated；

4.) obtain fraction highest node and add H_kIn, l=1；

5.) l is worked as<Turn 6.), otherwise to turn 9.) during k；

6.) weakening of relation is associated according to selected node, and calculates d_iValue；

7.) weakening according to text and space and shared weight, calculate new fraction；

8.) obtain fraction highest node and add H_kIn, 5.) l++ turns；

9.) queue H is returned_k；

The H for now returning_kThe i.e. required k bar information that will be retrieved.

Jing the results shows, the experiment effect that this method is obtained is notable.

Description of the drawings

Fig. 1 is the implementing procedure figure of the inventive method.

Fig. 2 is the locus schematic diagram of retrieval result information

Specific embodiment

With reference to relevant drawings 1-2 method involved in the present invention is explained and illustrated：

The initial value of each node of data set is calculated according to formula (1).

Assume that given position point is " Tian'anmen Square ", keyword is " university ", and k=5 calculates initial point according to formula Number, as a result as shown in table 1：

The initialization fraction of 1 13 nodes of table

Node	Fraction
		Central Drama Institute	9.5
Central Conservatory of Music	9
		Beijing commerce Professional School	8.7
Beijing Normal University north school district	8.1
		The Chinese College of Buddhism	7.5
China Concord Medical Science University's nursing college	7.3
		China Islamism Scripture Institute	6
Xuan Wu branch of Beijing Institute of Education	5.8
		Beijing Jiaotong University	5.3
Beijing University of Technology	5
		The Central University Of Finance and Economics	4.6
Chinese department of traditional Chinese medicine institute	3
		China University of Political Science ＆ Law	2

Step 2.1：Fraction highest node according to selecting in step one is associated the weakening of relation to other summits；

Fraction highest node " Central Drama Institute " is chosen, according to associating for " Central Drama Institute " and other nodes System is weakened, as a result as shown in table 2.

Step 2.2：Calculate the spatial value of each node；

The distance (as shown in table 3) of each node is arrived according to " Tian'anmen Square " and " Central Drama Institute " arrives remaining node Distance (as shown in table 4) can calculate the spatial value of each node, wherein

Table 2 weakens result according to the incidence relation of " Central Drama Institute " and other nodes

Node	Incidence relation weakens
		Central Conservatory of Music	0.255
Beijing commerce Professional School	0.538
		Beijing Normal University north school district	0.435
The Chinese College of Buddhism	0.856
		China Concord Medical Science University's nursing college	0.801
China Islamism Scripture Institute	0.756
		Xuan Wu branch of Beijing Institute of Education	0.522
Beijing Jiaotong University	0.373
		Beijing University of Technology	0.689
The Central University Of Finance and Economics	0.617
		Chinese department of traditional Chinese medicine institute	0.493
China University of Political Science ＆ Law	0.345

Distance of the table 3 " Tian'anmen Square " to node

Node	Distance (km)
		Central Drama Institute	3.69
Central Conservatory of Music	3.27
		Beijing commerce Professional School	3.08
Beijing Normal University north school district	3.78
		The Chinese College of Buddhism	3.22
China Concord Medical Science University's nursing college	2.08
		China Islamism Scripture Institute	3.30
Xuan Wu branch of Beijing Institute of Education	3.23
		Beijing Jiaotong University	7.05
Beijing University of Technology	7.87
		The Central University Of Finance and Economics	7.84
Chinese department of traditional Chinese medicine institute	4.65
		China University of Political Science ＆ Law	7.78

Distance of the table 4 " Central Drama Institute " to remaining node

Node	Distance (km)
		Central Conservatory of Music	5.40
Beijing commerce Professional School	2.24
		Beijing Normal University north school district	1.18
The Chinese College of Buddhism	5.72
		China Concord Medical Science University's nursing college	3.09
China Islamism Scripture Institute	6.58
		Xuan Wu branch of Beijing Institute of Education	6.90
Beijing Jiaotong University	5.53
		Beijing University of Technology	9.66
The Central University Of Finance and Economics	1.97
		Chinese department of traditional Chinese medicine institute	5.80
China University of Political Science ＆ Law	5.39

Step 3：When termination condition is unsatisfactory for, new node is selected

Weight value α=β=0.5 shared by hypothesis text and space, so trying to achieve new dividing according to formula (1), (2), (3) Number, such as DF ' (Central Conservatory of Music)=9 × (0.5 × 0.255+0.5 × 0.729)=4.428, DF ' (Beijing commerce occupations Institute)=8.7 × (0.5 × 0.538+0.5 × 0.331)=3.780 result is as shown in table 5：

Table 5 selects fractional result new after " Central Drama Institute " node

Node	Fraction
		Central Conservatory of Music	4.428
Beijing commerce Professional School	3.780
		Beijing Normal University north school district	2.402
The Chinese College of Buddhism	6.315
		China Concord Medical Science University's nursing college	5.034
China Islamism Scripture Institute	5.091
		Xuan Wu branch of Beijing Institute of Education	4.405
Beijing Jiaotong University	2.353
		Beijing University of Technology	3.813
The Central University Of Finance and Economics	1.812
		Chinese department of traditional Chinese medicine institute	1.782
China University of Political Science ＆ Law	0.185

Fraction highest node " the Chinese College of Buddhism " is obtained according to the result of table 5, " the central authorities' play of two nodes has been obtained now Acute institute " and " the Chinese College of Buddhism ", because 2<K=5, continuation tries to achieve 4 nodes according to algorithm.

Selecting, the new fractional result of " the Chinese College of Buddhism " remaining node afterwards is as shown in table 6：

Table 6 selects fractional result new after " the Chinese College of Buddhism " node

Node	Fraction
		Central Conservatory of Music	1.242
Beijing commerce Professional School	2.767
		Beijing Normal University north school district	1.546
China Concord Medical Science University's nursing college	4.367
		China Islamism Scripture Institute	1.392
Xuan Wu branch of Beijing Institute of Education	1.821
		Beijing Jiaotong University	1.320
Beijing University of Technology	2.926
		The Central University Of Finance and Economics	1.242
Chinese department of traditional Chinese medicine institute	1.295
		China University of Political Science ＆ Law	0.477

Fraction highest node " China Concord Medical Science University's nursing college " is obtained according to the result of table 6, remaining node New fractional result is as shown in table 7：

Table 7 selects fractional result new after " China Concord Medical Science University's nursing college " node

Node	Fraction
		Central Conservatory of Music	0.738
Beijing commerce Professional School	0.876
		Beijing Normal University north school district	0.843
China Islamism Scripture Institute	1.027
		Xuan Wu branch of Beijing Institute of Education	1.216
Beijing Jiaotong University	0.725
		Beijing University of Technology	1.719
The Central University Of Finance and Economics	0.806
		Chinese department of traditional Chinese medicine institute	0.520
China University of Political Science ＆ Law	0.256

Fraction highest node " Beijing University of Technology ", the new fractional result of remaining node are obtained according to the result of table 7 As shown in table 8：

Table 8 selects fractional result new after " Beijing University of Technology " node

Node	Fraction
		Central Conservatory of Music	0435
Beijing commerce Professional School	0.493
		Beijing Normal University north school district	0.523
China Islamism Scripture Institute	0.613
		Xuan Wu branch of Beijing Institute of Education	0.580
Beijing Jiaotong University	0.394
		The Central University Of Finance and Economics	0.645
Chinese department of traditional Chinese medicine institute	0.261
		China University of Political Science ＆ Law	0.136

Fraction highest node " The Central University Of Finance and Economics " is obtained according to the result of table 8, present l=5=k obtains 5 letters Breath, " Central Drama Institute ", " the Chinese College of Buddhism ", " China Concord Medical Science University's nursing college ", " Beijing University of Technology ", " in Its concrete locus of centre finance and economics university " is as shown in Figure 2：Fig. 2 is the locus schematic diagram of retrieval result information.According to Fig. 2 It can be seen that 5 information can be caused to cover for the retrieving all directions of " Tian'anmen Square " periphery, do not limit to some direction.

Claims

1. multifarious geographical space interest point search method is based on, it is characterised in that：

This method realizes that step is as follows to obtain front k locus：

Step 1.1：Collect and disposal data collection, build data relationship；At this moment digraph G (V, E), wherein V (v are defined₁,..., v_n) it is set of node, node on behalf various information here, E is the set of representative edge, E=<v_i,v_j>|v_i,v_j∈ V },<v_i,v_j >Represent from v_iTo v_jA line, v₁,...,v_nThe arbitrary node in digraph is represented, here n is natural number；

Step 1.2：By below equation to calculate R in each node v_iFraction：

DF(v_i)=[fs (v_i)*ds(v_i)]^as*[ft(v_i)*dt(v_i)]^at*[fg(v_i)*dg(v_i)]^ag (1)

Wherein fs (.), ft (.), fg (.) are respectively the fraction of social parameter, text parameter and geographic factor, ds (.), dt (.), dg (.) is respectively corresponding diversity fraction, as, at, ag and for 1, affect for controlling each parameter；

Diversity fraction is calculated by below equation：

d s (v_{i}) = Σ_{v_{j} &Element; R, v_{i} &NotEqual; v_{j}} \frac{s s (v_{i}, v_{j})}{k - 1} - - - (2)

Wherein ss (v_i,v_j) it is v_iAnd v_jThe difference of social parameters, is calculated using Jaccard distances Ibid, the value of dt (.) and dg (.) is calculated；

To sum up, the fraction of each node in data set is iterated to calculate out, and selects node mid-score highest node v₀；

Step 2：The geographical position being located according to the fraction highest node for selecting carries out geographical space and cuts to other nodes It is weak；

Step 2.1：Fraction highest node according to selecting in step one is associated the same of the weakening of relation to other summits When be also carried out the weakening of geographical space, it is assumed that fraction highest node v₀Location point to initial position p distance be d (p, v₀), Initial position to the distance of other nodes is d (p, v_i), v₀Distance to other nodes is d (v₀,v_i), then by following public affairs Formula is calculating geographical space value：

d_{i} = \frac{d (v_{0}, v_{i})}{d (p, v_{0}) + d (p, v_{i})} - - - (3)

Knowable in formula 3, d (v₀,v_i) it is v₀Distance to other nodes is bigger, and required geographical space value is bigger, illustrates section Point v_iBigger with the nodal distance for selecting, two node directions spatially are also just different；

To sum up, selected node is calculated successively to geographical space value d of remaining remaining node_i；

Step 3.1：Assume that the result after weakening to incidence relation is a, weight shared by text is α, then after remaining node weakens Textual value is a × α；

Step 3.2：Assume to weight shared by space to be β, wherein alpha+beta=1, then the spatial value after remaining node weakens is d × β；

DF’(v_i)=DF (v_i)×(a×α+d×β) (4)

To sum up, new fraction of the remaining node after the weakening to text and space in R is calculated, then therefrom selects fraction most High node.

2. according to claim 1 based on multifarious geographical space interest point search method, it is characterised in that：Select k The process of individual result is：

2.) according to input information, data relationship is built；

3. the fraction of each node) is calculated；

4.) obtain fraction highest node and add H_kIn, l=1；

5.) l is worked as<Turn 6.), otherwise to turn 9.) during k；

8.) obtain fraction highest node and add H_kIn, l++ turns 5；

9.) queue H is returned_k；