CN103077201B - A kind of unknown position evaluation method based on the detection of internet active iteration - Google Patents

A kind of unknown position evaluation method based on the detection of internet active iteration Download PDF

Info

Publication number
CN103077201B
CN103077201B CN201210579579.2A CN201210579579A CN103077201B CN 103077201 B CN103077201 B CN 103077201B CN 201210579579 A CN201210579579 A CN 201210579579A CN 103077201 B CN103077201 B CN 103077201B
Authority
CN
China
Prior art keywords
location
search
internet
location expression
expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210579579.2A
Other languages
Chinese (zh)
Other versions
CN103077201A (en
Inventor
呙维
黄亮
朱欣焰
陈旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201210579579.2A priority Critical patent/CN103077201B/en
Publication of CN103077201A publication Critical patent/CN103077201A/en
Application granted granted Critical
Publication of CN103077201B publication Critical patent/CN103077201B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of unknown position evaluation method based on the detection of internet active iteration.Comprise the following steps: 1) check user's input position, if data base querying failure, then utilize network engine to obtain the relevant collections of web pages in position; 2) extract the location expression in webpage and classify; 3) the credible rate <i>C of Search Results is calculated s</i>, if <i>C s</i> meets threshold value <i>C min</i>, skips to step 5; 4) step 1 is repeated to step 3 to the ambiguous location in Search Results, until credible rate meets threshold value or reaches number of times restriction; 5) calculate the geographic range of location expression, merge the approximate geographic range obtaining target location; The present invention makes full use of internet, and middle quantity is enriched, the geographical knowledge resource of dynamic change, the approximate extents of estimation unknown position.Describe for text position various informative in internet, adopt the multiple dimensioned position extracting method based on semanteme, and utilize the approximate geographic range of Point-Radius algorithm estimated position.

Description

A kind of unknown position evaluation method based on the detection of internet active iteration
Technical field
The present invention relates to a kind of unknown position evaluation method, especially relate to a kind of unknown position evaluation method based on the detection of internet active iteration.
Background technology
Along with the development of the location technologies such as GPS and perfect, location Based service LBS(Location-BasedService) application constantly expand, such as various electronic map service platform (Baidu's map, Google Maps, Bing map etc.), Tourism Information Applying System, daily life point of interest query system, traffic query system, social networks etc.These Location Service Platform or system provide the method for place information inquiry to mainly contain two kinds: one utilizes GPS location, map operation etc. to obtain comparatively accurate position coordinates to inquire about; Another kind utilizes natural language location expression to inquire about, and this qualitative or semiquantitative location expression exists multiple uncertainty, but compares the communicative habits and cognition that meet the mankind.Towards natural language position enquiring, location database needs the mapping relations between memory location title and geographic range, and existing location database is due to reasons such as high, the consuming time length of construction cost, dimension-limited, renewal difficulties, be difficult to store all location names, but mainly concentrate on collection and the preservation of the critical positions such as main place name, address, conspicuousness POI.Therefore, substantial amounts, conspicuousness is little, importance is relatively low position in life are carried out to inquiry and become and cannot realize, thus contradict with the location-based service demand of comprehensive, multi-level, many granularities.(list of references: Gu Jing, location-based information service applied system research and exploitation [D]. Xian Electronics Science and Technology University, 2004; Xia Baoguo, based on the design and implimentation [D] of Wuhan City's Tourism Information Applying System of GIS. the Central China University of Science and Technology, 2006; Gao Weisi, the design [D] of location Based service and City Traffic Navigation System. Yunnan University, 2011; Yang Yuyao etc., a kind of mobile Internet social model [J] based on geographical location information. Journal of Computer Research and Development, 2011; )
Internet provides abundant geographical knowledge as large scale knowledge base, can as the growth data source of location-based service.The reference position information of web search, needs to utilize natural language understanding extracting position from large amount of text information to describe.Natural language understanding is the various Theories and methods that can realize carrying out with natural language between people and computing machine efficient communication, and the natural language understanding of location expression is mainly to the identification of position Name & Location relation.About the identification of location name, existing research lays particular emphasis on extracts geographical named entity or place name, mainly contain two kinds of methods: one is rule-based method, set up corpus and the formation rule of geographical named entity or place name, adopt the mode of rule match to identify, this method requires strict to concept formation rule, can improve the accuracy rate extracting result, but it is a lot of to make recall ratio decline, be difficult to the problem solving ambiguous location and reposition identification; Another kind is Statistics-Based Method, owing to not considering syntax, information semantically, the noise inevitably introduced acquisition and the adjacent high frequency words of some low frequency languages exists some problems.About the identification of position relationship, existing research mainly lays particular emphasis on extracts basic spatial relationships (topological relation, metric relation, position relation etc.), mainly contain two kinds of methods: a kind of is method based on Sentence analysis, this method needs thoroughly to understand syntactic structure and sentence semantics, there is fragility and many ambiguity problems; Be the method based on pattern, can avoid carrying out exhaustive analysis to statement, but rich due to natural language expressing, and there is multiple expression way in same information, the quantity of pattern can be made sharply to expand.(list of references: happy little legendary small dragon with horns etc., based on the natural language spatial Concept Extraction [J] of spatial semantic role. Wuhan University Journal information science version, 2005; Jiang Lin etc., the acquisition of geographical entity concept and position relationship thereof and checking [J]. computer science, 2007; Li Lishuan etc., based on place name identification [J] in the Chinese text of support vector machine. Dalian University of Science & Engineering journal, 2007; Li Hanjing, based on the concept of space Modeling Research [D] of natural language processing. Harbin Institute of Technology, 2007; Li Yusen, towards the geographical named entity recognition research [J] of GIS. Chongqing Mail and Telephones Unvi's journal (natural science edition), 2008; Malong, based on the research [D] of the Chinese place names recognition of conditional random fields model. Dalian University of Technology, 2009; Tang Xu etc., the Chinese place names recognition based on chapter is studied [J]. Journal of Chinese Information Processing, 2010; Jiang Wenming, the directional spatial relationships abstracting method towards Chinese text studies [D]. Nanjing Normal University, 2010; Shen Qijun, Chinese text spatial relationship mask method research [D]. Nanjing Normal University, 2010; Zhang Xueying etc., rule-based Chinese address analysis of essentials method [J]. Earth Information Science journal, 2010; Li Haiguang, Chinese named entity Relation extraction research [D] of position-based and semantic feature. HeFei University of Technology, 2011; Du Ping etc., Chinese place names recognition and ambiguity are eliminated---be called example [J] with China's administrative division above county level. remote sensing technology and application, 2011.)
There is dimension-limited, upgrade the problem of difficulty in location database, the geographic location information query (especially ambiguous location inquiry) of position-based database there will be location name and is difficult to identify or the situation of coverage disappearance, is not enough to meet consumers' demand.Contain abundant geographical knowledge in internet, the descriptor of a large amount of interested position can be provided for estimation " the unknown " position coverage.And the information how searching position is relevant from internet, and therefrom obtain the approximate geographic range of " the unknown " position, be groundwork of the present invention.
Summary of the invention
The present invention mainly solves the technical matters existing for prior art; Provide and a kind ofly can make full use of internet that middle quantity is enriched, the geographical knowledge resource of dynamic change, realize estimating the approximate extents of target location.
Above-mentioned technical matters of the present invention is mainly solved by following technical proposals:
Based on a unknown position evaluation method for internet active iteration detection, it is characterized in that, comprise the following steps:
Step 1, checks user's input position query word; If position cannot obtain geographical covering from spatial database, then initiatively start internet iteration detection, be namely the theme with target location and utilize network search engines to crawl target location relevant information from internet;
Step 2, is the theme with position enquiring word and carries out initial probe, utilizes network engine to obtain from internet to comprise the collections of web pages that target location describes;
Step 3, the network documentation that the target location obtained for step 2 describes carries out geographic position parsing, and namely from network documentation, extract natural language location expression, described natural language location expression comprises reference position and spatial relationship;
Step 4, the natural language location expression adopting step 3 to obtain carries out location expression classification; If the reference position of location expression can obtain geographical covering from location database, location expression stored in accurate description collections P, otherwise stored in vague description set A;
Step 5, the credible rate C of assessment current search s; If C sbe less than the credible threshold value C of search min, be the theme with the reference position in vague description set A and carry out new round internet text search; If Cs is greater than or equal to the credible threshold value C of search min, then step 7 is skipped to;
Step 6, repeats step 1 to step 5, till often the credible rate of wheel Search Results meets threshold value or reaches searching times restriction;
Step 7, calculates approximate geographic range and the confidence level thereof of all location expressions;
Step 8, the multiple location expression geography of integrated and refinement covers, and obtains the geographic range of target location;
In step 3 described in above-mentioned a kind of unknown position evaluation method detected based on internet active iteration, the identification of natural language location expression mainly comprises the identification of location name identification and spatial relationship, adopt the multiple dimensioned extracting method based on semanteme to extract natural language location expression, specifically comprise following sub-step:
Step 3.1, sets up the corpus of location expression, stores and express location name and the feature vocabulary of spatial relationship and the syntactic pattern of location expression in corpus; Here, set up corpus to be set up by the mode of artificial conclusion and machine learning.
Step 3.2, under the support of corpus, carries out pattern match to network text, obtains location expression;
Step 3.3, eliminates place name ambiguity based on geography with the semanteme of non-geographic.
In the step 4 described in above-mentioned a kind of unknown position evaluation method detected based on internet active iteration, the prerequisite utilizing reference position and spatial relationship estimation target location is that reference position can obtain accurate geographic range from location database, set single location expression to express according to formula one, RO is reference position title, SR is locational space relation, T is the time of origin of location expression, and C is the confidence level that location expression has, and S is the searching for reference of references object RO; Extract K location expression Loc before in result i, and classify according to precondition, work as Loc iwhen .RO meeting precondition, Loc istored in accurate description collections P, otherwise stored in vague description set A;
Loc={RO, SR, T, C, S} formula one
In the step 5 described in above-mentioned a kind of unknown position evaluation method based on internet initiatively iteration detection, the credible rate C of assessment current search sconcrete grammar be: definition search credible rate C sas evaluation index, searching for credible rate is the confidence level sum of all location expressions in P and the ratio of location expression sum, and shown in two, m is location expression number in P, and K is location expression sum, Loc i.C be the confidence level of certain location expression.
C s = &Sigma; i = 0 m - 1 Loc i . C K Formula two
The confidence level of location expression calculates according to formula three, and wherein ε is attenuation parameter, and n is searching times, and it is 1 when searching for first that desired location describes confidence level, and decays along with the increase of searching times;
Loc i.C=1* (ε) nformula three
Work as C smeet minimum credible threshold value C mintime, directly export accurate description collections P and carry out target location estimation; Work as C swhen not satisfying condition, the method based on the search of internet successive ignition is adopted to ensure to search for credible rate, new round internet hunt is carried out in the fuzzy reference position of namely getting in A, first estimates reference position geographic range by Internet resources, and then utilizes estimation target location, reference position.
Step 6 described in above-mentioned a kind of unknown position evaluation method based on internet initiatively iteration detection is fuzzy reference position iterative search; The process of foundation step 4 and step 5, setting search result adopts formula four to express, and n is searching times, and m is that P is accurate description collections, and A is vague description set, C when time position number of search sit is the credible rate of search.
WS [n] [m]={ P, A, C sformula four
Described iterative search procedures comprises following sub-step:
Step 6.1, describes WS [0] [0] .A stored in search set Q, if n=0, m=0 by the ambiguous location of target location Search Results;
Step 6.2, gets vague description set WS [n] [m] .A in Q, judges whether n+1 reaches searching times restriction, if it is exits search;
Step 6.3, gets location expression Loc in WS [n] [m] .A successively icarry out (n+1)th search, obtain Search Results WS [n+1] [i], and the references object RO search being associated with location expression is quoted, be i.e. Loc i.S=WS [n+1] [i];
Step 6.4, has removed vague description set WS [n] [m] .A of search from Q, checks
WS [n+1] [i] .C swhether meet threshold value C minif do not meet, WS [n+1] [i] .A is put into search set Q;
Step 6.5, checks in Q whether there is vague description set, if had, repeats step 6.2 and carries out iterative search to step 6.4.
In the step 7 described in above-mentioned a kind of unknown position evaluation method based on internet initiatively iteration detection, ambiguous location due to kth Search Results describes to be needed with reference to kth+1 Search Results, adopt the mode that backward calculates, namely from last search, carry out geographic range calculating, specifically comprise following sub-step:
Step 7.1, in definition Search Results WS, searching times is n, and n-th searching position number is m, m=WS [n-1] .size; Definition geographic range set FC stores the geographic range of each Search Results;
Step 7.2, gets Search Results WS [n-1] [m-1] of n-th search m position;
Step 7.3, gets the position Loc in WS [n-1] [m-1] .P successively y, position-based data base querying reference position coordinate, utilizes Point-Radius algorithm to calculate geographical covering FP (y) and confidence level CP (y) thereof;
Step 7.4, gets the position Loc in WS [n-1] [m-1] .A successively x, utilize Loc x.S in geographic range set FC, inquire about reference position coordinate, if successfully obtain coordinate, then utilize Point-Radius algorithm to calculate geographical covering FA (y) and confidence level CA (y) thereof;
Step 7.5, merges the geographic range of all positions in P and A, obtains the geographic range FC (WS [n-1] [m-1]) when time Search Results;
Step 7.6, judges whether m-1 is greater than 0; If be greater than 0, then carry out the position calculation of next Search Results, make m=m-1, skip to step b); If be less than or equal to 0, then carry out next step;
Step 7.7, judges whether n-1 is greater than 0; If be greater than 0, then carry out the position calculation of a front Search Results, make n=n-1, m=WS [n-1] .size, skips to step b); If be less than or equal to 0, then carry out next step;
Step 7.8, exports FC (WS [0] [0]).
Therefore, tool of the present invention has the following advantages: can making full use of internet, middle quantity is enriched, the geographical knowledge resource of dynamic change, realizes estimating the approximate extents of target location.Because positional information in internet associates complexity with non-position information, and information representation diversification of forms, the present invention is directed to the natural language text information in internet, adopt the multiple dimensioned extracting method based on semanteme to extract location expression from web page text, and utilize the approximate geographic range of Point-Radius algorithm calculated target positions.。
Accompanying drawing explanation
Fig. 1 is the process flow diagram of internet active searching method.
Fig. 2 is based on the process flow diagram of the position calculation of the Internet search results.
Embodiment
Below by embodiment, and by reference to the accompanying drawings, technical scheme of the present invention is described in further detail.
Embodiment:
1, theoretical foundation.
1.1, geographic information retrieval (GeographicInformationRetrieval, GIR).
Geographic information retrieval is the restriction according to geographic query scope, returns the document relevant to geographical information query.Basic ideas utilize web crawlers from search and webpage set internet, by the place name in named entity recognition and classification and grammatical analysis identification webpage, thus determine the geographic range of query word and document, finally the degree of association (comprising textual association and space correlation) calculated between document and query word returns and sorts result for retrieval.Current most of geographic information retrieval mainly adopts Keywords matching algorithm, place name in term and network documentation all needs to have clear and definite coverage area and carries out corresponding technology, this mode is difficult to the situation adapting to fuzzy place name (such as the middle and lower reach of Yangtze River), thus cannot be directly used in the unknown position estimation of search Network Based.The present invention is with reference to the thinking of geographic information retrieval, propose a kind of multiple dimensioned Iterative search algorithm (as Fig. 1), the relevant network documentation of unknown position is obtained based on internet, and extract the location expression comprising unknown position, and then the reference position in location expression and spatial relationship is utilized to calculate the approximate geographic range of unknown position.Main flow obtains after collections of web pages by Meta Search Engine from internet, based on the location expression comprising query word in extraction of semantics webpage, if location expression is discontented can believe that rate carries out query word position estimation completely, then the ambiguous location identified is carried out to the Internal retrieval of a new round, this process is the process of an iteration, as long as credible rate condition does not meet or do not reach search restriction, obtain with regard to constantly carrying out web search the reference information can estimating ambiguous location geographic range.
1.2, location expression geographic registration (GeoreferencingLocalityDescriptions, GLD).
Location expression geographic registration position is described from text the numerical value converted to certain coordinate system describe.Desirable location expression geographic registration process be by text describe change into numeral describe can and be mapped on map, and express the spatial dimension of position and the uncertainty of position distribution, algorithm popular is at present Point-Radius algorithm and Probability algorithm.Point-Radius method utilizes a point and maximum error to describe position and uncertainty thereof, the uncertainty source of main consideration comprises reference position (spatial dimension of reference position, geodetic datum, coordinate precision, map scale) and spatial relationship (uncertainty of distance relation uncertainty and direction relations), all uncertainty Metric Projections, to the maximum error of a dimension as target location, express target location to put the border circular areas formed as radius with maximum error.Probability method adopts uncertainty probability density surface to express target location and uncertainty thereof, and main consideration uncertainty source comprises the space distribution of destination object, the out of true of spatial relationship and ambiguity, the imperfection of references object and the uncertainty of location expression itself.Point-Radius method belongs to the position calculation of quantification manner, can obtain target location and likely there is geography a little and cover, be applicable to semiquantitative text position and describe; Probability method cannot quantitatively calculated target positions geography cover, but the probability distribution of target location can be provided, be applicable to qualitatively text position describe.
2, implementation procedure.
(1), check that user inputs target location query word; Search query word in location database, if position does not exist or location geographic covers disappearance, then initiatively carry out the inquiry of search pattern Network Based, being namely the theme with target location utilizes network search engines to crawl target location relevant information from internet;
(2), identify and extract the natural language location expression (comprising reference position and spatial relationship) in network documentation; The identification of natural language location expression mainly comprises the identification of location name identification and spatial relationship, and the present invention adopts the multiple dimensioned extracting method based on semanteme to extract natural language location expression.First, by manually to conclude and the mode of machine learning sets up the corpus of location expression, in corpus, store expression location name and the feature vocabulary of spatial relationship and the syntactic pattern of location expression; Then, under the support of corpus, pattern match is carried out to network text, obtain location expression; Finally, place name ambiguity is eliminated based on geography with the semanteme of non-geographic;
(3), location expression classification; The prerequisite utilizing reference position and spatial relationship estimation target location is that reference position can obtain accurate geographic range from location database, set single location expression to express according to formula (1), RO is reference position title, SR is locational space relation, T is the time of origin of location expression, C is the confidence level that location expression has, and S is the searching for reference of references object RO.Extract K location expression Loc before in result i, and classify according to precondition, work as Loc iwhen .RO meeting precondition, Loc istored in accurate description collections P, otherwise stored in vague description set A;
Loc={RO,SR,T,C,S}(1)
(4) the credible rate C of search, is calculated s; In Search Results, the confidence level of location expression must reach certain level and could be used for estimating target location, and the present invention proposes to search for credible rate C sas evaluation index, searching for credible rate is the confidence level sum of all location expressions in P and the ratio of location expression sum, and as shown in Equation (2), m is location expression number in P, and K is location expression sum, Loc i.C be the confidence level of certain location expression.
C s = &Sigma; i = 0 m - 1 Loc i . C K - - - ( 2 )
The confidence level of location expression calculates according to formula (3), and wherein ε is attenuation parameter, and n is searching times, and it is 1 when searching for first that desired location describes confidence level, and decays along with the increase of searching times.
Loc i.C=1*(ε) n(3)
Work as C smeet minimum credible threshold value C mintime, directly export accurate description collections P and carry out target location estimation; Work as C swhen not satisfying condition, the present invention adopts the method based on the search of internet successive ignition to ensure to search for credible rate, new round internet hunt is carried out in the fuzzy reference position of namely getting in A, first estimates reference position geographic range by Internet resources, and then utilizes estimation target location, reference position;
(5), fuzzy reference position iterative search; The process of foundation step 3 and step 4, setting search result adopts formula (4) to express, and n is searching times, and m is that P is accurate description collections, and A is vague description set, C when the secondary position number searched for sit is the credible rate of search.
WS[n][m]={P,A,C s}(4)
Iterative search procedures is as follows:
A). the ambiguous location of target location Search Results is described WS [0] [0] .A stored in search set Q, if n=0, m=0;
B). get vague description set WS [n] [m] .A in Q, judge whether n+1 reaches searching times restriction, if it is exits search;
C). get location expression Loc in WS [n] [m] .A successively icarry out (n+1)th search, obtain Search Results WS [n+1] [i], and the references object RO search being associated with location expression is quoted, be i.e. Loc i.S=WS [n+1] [i];
D). from Q, remove vague description set WS [n] [m] .A of search, checked
WS [n+1] [i] .C swhether meet threshold value C minif do not meet, WS [n+1] [i] .A is put into search set Q;
E). check in Q whether there is vague description set, if had, repeat step b) to steps d) carry out iterative search;
(6) approximate geographic range and the confidence level thereof of all location expressions, is calculated; Ambiguous location due to kth Search Results describes to be needed with reference to kth+1 Search Results, and therefore the present invention's mode of adopting backward to calculate, namely carries out geographic range calculating from searching for for the last time.As shown in Figure 2, computation process is as follows:
A). in definition Search Results WS, searching times is n, and n-th searching position number is m,
M=WS [n-1] .size; Definition geographic range set FC stores the geographic range of each Search Results;
B). get Search Results WS [n-1] [m-1] of n-th search m position;
C). get the position Loc in WS [n-1] [m-1] .P successively y, position-based data base querying reference position coordinate, utilizes Point-Radius algorithm to calculate geographical covering FP (y) and confidence level CP (y) thereof;
D). get the position Loc in WS [n-1] [m-1] .A successively x, utilize Loc x.S in geographic range set FC, inquire about reference position coordinate, if successfully obtain coordinate, then utilize Point-Radius algorithm to calculate geographical covering FA (y) and confidence level CA (y) thereof;
E). merge the geographic range of all positions in P and A, obtain the geographic range FC (WS [n-1] [m-1]) when time Search Results;
F). judge whether m-1 is greater than 0; If be greater than 0, then carry out the position calculation of next Search Results, make m=m-1, skip to step b); If be less than or equal to 0, then carry out next step;
G). judge whether n-1 is greater than 0; If be greater than 0, then carry out the position calculation of a front Search Results, make n=n-1, m=WS [n-1] .size, skips to step b); If be less than or equal to 0, then carry out next step;
H). export FC (WS [0] [0]);
Specific embodiment described herein is only to the explanation for example of the present invention's spirit.Those skilled in the art can make various amendment or supplement or adopt similar mode to substitute to described specific embodiment, but can't depart from spirit of the present invention or surmount the scope that appended claims defines.

Claims (3)

1., based on a unknown position evaluation method for internet active iteration detection, it is characterized in that, comprise the following steps:
Step 1, checks user's input position query word; If position cannot obtain geographical covering from spatial database, then initiatively start internet iteration detection, be namely the theme with target location and utilize network search engines to crawl target location relevant information from internet;
Step 2, is the theme with position enquiring word and carries out initial probe, utilizes network engine to obtain from internet to comprise the collections of web pages that target location describes;
Step 3, the network documentation that the target location obtained for step 2 describes carries out geographic position parsing, and namely from network documentation, extract natural language location expression, described natural language location expression comprises reference position and spatial relationship;
Step 4, the natural language location expression adopting step 3 to obtain carries out location expression classification; If the reference position of location expression can obtain geographical covering from location database, location expression stored in accurate description collections P, otherwise stored in vague description set A;
Step 5, the credible rate Cs of assessment current search; If Cs is less than the credible threshold value C of search min, be the theme with the reference position in vague description set A and carry out new round internet text search, if Cs is greater than or equal to the credible threshold value C of search min, then skipping to the concrete grammar that step 7 assesses the credible rate Cs of current search is: the credible rate C of definition search sas evaluation index, searching for credible rate is the confidence level sum of all location expressions in P and the ratio of location expression sum, and shown in two, m is location expression number in P, and K is location expression sum, Loc i.C be the confidence level of certain location expression:
C s = &Sigma; i = 0 m - 1 Loc i . C K Formula two
The confidence level of location expression calculates according to formula three, and wherein ε is attenuation parameter, and n is searching times, and it is 1 when searching for first that desired location describes confidence level, and decays along with the increase of searching times;
Loc i.C=1* (ε) nformula three
Work as C smeet minimum credible threshold value C mintime, directly export accurate description collections P and carry out target location estimation; Work as C swhen not satisfying condition, the method based on the search of internet successive ignition is adopted to ensure to search for credible rate, new round internet hunt is carried out in the fuzzy reference position of namely getting in A, first estimates reference position geographic range by Internet resources, and then utilizes estimation target location, reference position.
Step 6, repeats step 1 to step 5, till often the credible rate of wheel Search Results meets threshold value or reaches searching times restriction;
Step 7, calculates approximate geographic range and the confidence level thereof of all location expressions;
Step 8, the multiple location expression geography of integrated and refinement covers, and obtains the geographic range of target location;
Described step 6 is fuzzy reference position iterative search; The process of foundation step 4 and step 5, setting search result adopts formula four to express, and n is searching times, and m is that P is accurate description collections, and A is vague description set, C when time position number of search sthe credible rate of search:
WS [n] [m]={ P, A, C sformula four
Described iterative search procedures comprises following sub-step:
Step 4.1, describes WS [0] [0] .A stored in search set Q, if n=0, m=0 by the ambiguous location of target location Search Results;
Step 4.2, gets vague description set WS [n] [m] .A in Q, judges whether n+1 reaches searching times restriction, if it is exits search;
Step 4.3, gets location expression Loc in WS [n] [m] .A successively icarry out (n+1)th search, obtain Search Results WS [n+1] [i], and the references object RO search being associated with location expression is quoted, be i.e. Loc i.S=WS [n+1] [i];
Step 4.4, has removed vague description set WS [n] [m] .A of search from Q, checks
WS [n+1] [i] .C swhether meet threshold value C minif do not meet, WS [n+1] [i] .A is put into search set Q;
Step 4.5, checks in Q whether there is vague description set, if had, repeats step 4.2 and carries out iterative search to step 4.4;
Described step 7, the ambiguous location due to kth Search Results describes to be needed, with reference to kth+1 Search Results, to adopt the mode that backward calculates, and namely from last search, carries out geographic range calculating, specifically comprises following sub-step:
Step 5.1, in definition Search Results WS, searching times is n, and n-th searching position number is m, m=WS [n-1] .size; Definition geographic range set FC stores the geographic range of each Search Results;
Step 5.2, gets Search Results WS [n-1] [m-1] of n-th search m position;
Step 5.3, gets the position Loc in WS [n-1] [m-1] .P successively y, position-based data base querying reference position coordinate, utilizes Point-Radius algorithm to calculate geographical covering FP (y) and confidence level CP (y) thereof;
Step 5.4, gets the position Loc in WS [n-1] [m-1] .A successively x, utilize Loc x.S in geographic range set FC, inquire about reference position coordinate, if successfully obtain coordinate, then utilize Point-Radius algorithm to calculate geographical covering FA (y) and confidence level CA (y) thereof;
Step 5.5, merges the geographic range of all positions in P and A, obtains the geographic range FC (WS [n-1] [m-1]) when time Search Results;
Step 5.6, judges whether m-1 is greater than 0; If be greater than 0, then carry out the position calculation of next Search Results, make m=m-1, skip to step 5.2; If be less than or equal to 0, then carry out next step;
Step 5.7, judges whether n-1 is greater than 0; If be greater than 0, then carry out the position calculation of a front Search Results, make n=n-1, m=WS [n-1] .size, skips to step 5.2; If be less than or equal to 0, then carry out next step;
Step 5.8, exports FC (WS [0] [0]);
2. a kind of unknown position evaluation method based on the detection of internet active iteration according to claim 1, it is characterized in that, in described step 3, the identification of natural language location expression mainly comprises the identification of location name identification and spatial relationship, adopt the multiple dimensioned extracting method based on semanteme to extract natural language location expression, specifically comprise following sub-step:
Step 3.1, sets up the corpus of location expression, stores and express location name and the feature vocabulary of spatial relationship and the syntactic pattern of location expression in corpus;
Step 3.2, under the support of corpus, carries out pattern match to network text, obtains location expression;
Step 3.3, eliminates place name ambiguity based on geography with the semanteme of non-geographic.
3. a kind of unknown position evaluation method based on the detection of internet active iteration according to claim 1, it is characterized in that, in described step 3, the prerequisite utilizing reference position and spatial relationship estimation target location is that reference position can obtain accurate geographic range from location database, set single location expression to express according to formula one, RO is reference position title, SR is locational space relation, T is the time of origin of location expression, C is the confidence level that location expression has, and S is the searching for reference of references object RO; Extract K location expression Loc before in result i, and classify according to precondition, work as Loc iwhen .RO meeting precondition, Loc istored in accurate description collections P, otherwise stored in vague description set A;
Loc={RO, SR, T, C, S} formula one.
CN201210579579.2A 2012-12-27 2012-12-27 A kind of unknown position evaluation method based on the detection of internet active iteration Expired - Fee Related CN103077201B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210579579.2A CN103077201B (en) 2012-12-27 2012-12-27 A kind of unknown position evaluation method based on the detection of internet active iteration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210579579.2A CN103077201B (en) 2012-12-27 2012-12-27 A kind of unknown position evaluation method based on the detection of internet active iteration

Publications (2)

Publication Number Publication Date
CN103077201A CN103077201A (en) 2013-05-01
CN103077201B true CN103077201B (en) 2016-03-30

Family

ID=48153731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210579579.2A Expired - Fee Related CN103077201B (en) 2012-12-27 2012-12-27 A kind of unknown position evaluation method based on the detection of internet active iteration

Country Status (1)

Country Link
CN (1) CN103077201B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309992B (en) * 2013-06-20 2016-03-16 武汉大学 A kind of positional information extracting method towards natural language
CN103347064A (en) * 2013-06-25 2013-10-09 百度在线网络技术(北京)有限公司 Method and system for displaying user location
US10802485B2 (en) * 2017-10-09 2020-10-13 Here Global B.V. Apparatus, method and computer program product for facilitating navigation of a vehicle based upon a quality index of the map data
CN109858508A (en) * 2018-10-23 2019-06-07 重庆邮电大学 IP localization method based on Bayes and deep neural network
CN109582792A (en) * 2018-11-16 2019-04-05 北京奇虎科技有限公司 A kind of method and device of text classification
CN113254627B (en) * 2021-04-16 2023-07-25 国网河北省电力有限公司经济技术研究院 Data reading method, device and terminal
CN113297456B (en) * 2021-05-20 2023-04-07 北京三快在线科技有限公司 Searching method, searching device, electronic equipment and storage medium
CN116562234B (en) * 2023-03-30 2024-08-09 深圳市规划和自然资源数据管理中心 Multi-source data fusion voice indoor positioning method and related equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101147079A (en) * 2005-03-24 2008-03-19 SiRF技术公司 System and method for providing location based services over a network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8285696B2 (en) * 2006-06-09 2012-10-09 Routecentric, Inc. Apparatus and methods for providing route-based advertising and vendor-reported business information over a network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101147079A (en) * 2005-03-24 2008-03-19 SiRF技术公司 System and method for providing location based services over a network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种基于地理位置信息的移动互联网社交模型;杨煜尧等;《计算机研究与发展》;20111231;全文 *
基于规则的中文地址要素解析方法;李海光等;《地理信息科学学报》;20101231;全文 *

Also Published As

Publication number Publication date
CN103077201A (en) 2013-05-01

Similar Documents

Publication Publication Date Title
CN103077201B (en) A kind of unknown position evaluation method based on the detection of internet active iteration
CN108388559B (en) Named entity identification method and system under geographic space application and computer program
Rae et al. Mining the web for points of interest
JP5390840B2 (en) Information analyzer
US8682646B2 (en) Semantic relationship-based location description parsing
US20150356088A1 (en) Tile-based geocoder
EP2209073A1 (en) Location based system utilizing geographical information from documents in natural language
CN110472066A (en) A kind of construction method of urban geography semantic knowledge map
Chen et al. Georeferencing places from collective human descriptions using place graphs
CN115129719B (en) Qualitative position space range construction method based on knowledge graph
Drymonas et al. Geospatial route extraction from texts
Musleh et al. Let's speak trajectories
Shi et al. Extraction of geospatial information on the Web for GIS applications
Cheng et al. Quickly locating POIs in large datasets from descriptions based on improved address matching and compact qualitative representations
Shi et al. Thematic data extraction from Web for GIS and applications
Dong et al. Learning the spatial co-occurrence for browsing interests extraction of domain users on public map service platforms
CN116431839B (en) Regional network generation method, system, computer equipment and storage medium
CN117473025A (en) Address matching method, device, equipment and storage medium based on deep learning
Fränti et al. Location-based search engine for multimedia phones
Bui Automatic construction of POI address lists at city streets from geo-tagged photos and web data: a case study of San Jose City
CN111680122B (en) Space data active recommendation method and device, storage medium and computer equipment
CN116431625A (en) Positioning analysis method and device for geographic entity and computer equipment
Formica et al. Constraint relaxation of the polygon-polyline topological relation for geographic pictorial query languages
Martins Geographically aware web text mining
Venkateswaran et al. Exploring and visualizing differences in geographic and linguistic web coverage

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160330

Termination date: 20211227

CF01 Termination of patent right due to non-payment of annual fee