CN100478960C - Method for locating unknown place name in network map service - Google Patents

Method for locating unknown place name in network map service Download PDF

Info

Publication number
CN100478960C
CN100478960C CNB2007101205475A CN200710120547A CN100478960C CN 100478960 C CN100478960 C CN 100478960C CN B2007101205475 A CNB2007101205475 A CN B2007101205475A CN 200710120547 A CN200710120547 A CN 200710120547A CN 100478960 C CN100478960 C CN 100478960C
Authority
CN
China
Prior art keywords
place name
keyword
address
webpage
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2007101205475A
Other languages
Chinese (zh)
Other versions
CN101110080A (en
Inventor
罗英伟
汪小林
周晓鲁
许卓群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CNB2007101205475A priority Critical patent/CN100478960C/en
Publication of CN101110080A publication Critical patent/CN101110080A/en
Application granted granted Critical
Publication of CN100478960C publication Critical patent/CN100478960C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for positioning the unlogged place name in a network map service, which is characterized in that: it first of all collects all webpages containing the keywords of the unlogged place name input users, extracts the logged address information in a space database, calculates the space relativity according to the place name keywords and the logged address information, then modifies the space relativity via space clustering calculation and finally selects a number of address ranking the top as the positing result to be marked on the map and be fed back to the user. The method in the invention is able to, under the condition of not expanding and renewing the address data, fast and effectively provide the address information of the unlogged place name and realize the positioning on the map according to the address information in words, thus enhancing the quality of map search and positioning service to some extent.

Description

The localization method of unknown place name in a kind of network map service
Technical field
The present invention relates to the network information and excavate and the map retrieval service field, especially the related location technology of unknown place name in the network map service.
Background technology
In the end of the year 2004, Google releases the network map service, when Google Maps allows the network map service really enter in numerous netizens' the life, has also driven the developing steps of domestic network Map Services.The network map service industry presents the development of explosion type, and Map Services miscellaneous website emerges like the mushrooms after rain.Map Services is the most attractive also to be that most important function is exactly a place name search positioning function, and promptly the user imports the keyword of purpose place name, is identified the position of destination and shown relevant information on map by the Map Services website then.But the place name search service can't be satisfied people's demand fully in the existing network map service, is mainly reflected in: if there is not the relevant information of purpose place name in the correlation space database of Map Services, then can't position the purpose place name.
The basic process of place name search positioning service all is that the user imports the place name keyword in interested place and submits to the Map Services website, just can obtain a map that indicates the purpose place name from the Map Services website then.At present well-known Map Services website nearly all positions according to place name keyword (word) coupling, and the object of keyword coupling is exactly a spatial database---a database that comprises a large amount of place names and positional information and other attribute informations.Server end in Map Services, location process mainly experiences following step: at first in the spatial database of server end, search with the place name of keyword coupling or attribute information in comprise the address of keyword, on map, identify this address and be shown to the user at coordinate then by this address.But a common city is just comprising the address information of up to ten thousand even hundreds of thousands, and wanting to gather out whole addresses and coordinate thereof is the very work of very complicated of part, and often has new address generation or old address disappearance.Therefore owing to reasons such as the spatial data database data are complete, renewal is untimely, many users can run into the situation that can not find out place name when using map search.
With " diamond mansion " is example, and the inquiry positioning service meeting of server end has been searched in spatial database does not have " diamond mansion " this address; If do not have, then can search the result who comprises " diamond mansion " in other associated description information of address, comprise " being positioned at diamond mansion Building A 2 floor " in the descriptor such as certain company, and this company is arranged in spatial database, the position of this company as a result of will be returned to the user so.If above two kinds of addresses are not all found, this place name is not found in the website that has directly prompting user, can show " sorry, as not find and ' diamond mansion ' relevant place " as Baidu's map.Also some Map Services website can be done to inquire about once more after some are handled to keyword, " diamond mansion " can be carried out cutting such as Google Maps, then cutting result " diamond " and " mansion " are inquired about in spatial database as new keyword again, the result who comprises " diamond " and " mansion " in the place name description is returned to the user, even these two speech are not the appearance that connects together in place name is described.The latter's way is equivalent to the method for a kind of analysis of key speech and spatial database content relevance, and under the situation of the content that can't find fully coupling, address that will " the most close " returns to the user.The address information that combines for a plurality of place names is done like this and is had certain effect, and the result such as can not find out with " software centre, Zhong Guan-cun diamond mansion " uses " software centre, Zhong Guan-cun " and " diamond mansion " just may find relevant information respectively.But,, still be difficult to find the result relevant behind the participle with keyword as " diamond mansion " for non-knockdown place name.
At this moment, the user can transfer to adopt other modes to obtain the positional information of geographical entity usually, as the position of using search engine to search this geographical entity.But present search engine is not taked special search strategy at the search in geographic position, and such as searching at Business Name of search engine input and click, search engine can return to the user to all webpages that comprises the said firm's title so.But about the more detailed concrete information of this company such as address, phone etc., then wanting user oneself to open one by one webpage judges and searches these information, greatly reduce the search of geographical entity and the efficient of location, increased the user and located the required time.
Summary of the invention
From top analysis as can be seen, do not have in spatial database the network map service place name---unknown place name positions, also do not have good method at present.Yes expands and upgrade data in the spatial database for radical solving method.But spatial data upgrades mainly by manually finishing at present, has intrinsic complicacy and hysteresis quality.The problem to be solved in the present invention is exactly under the data with existing in the spatial database not being carried out more news, only under the support of existing spatial database, utilize search engine searches to obtain comprising the webpage of unknown place name, the webpage that search obtains is analyzed and excavated, that obtain to describe the unknown place name position and be present in address information in the existing spatial database, realize the location of unknown place name, thereby improve the quality of place name search positioning service.
The object of the present invention is to provide the localization method of unknown place name in a kind of network map service.In order to solve the situation that present each map site for service can't be handled non-existent unknown place name in the spatial database, the present invention proposes by a large amount of address informations that comprise on the internet are analyzed, find out wherein both in existing spatial database, the address information of unknown place name can be described again, and the method that the unknown place name that does not have in the spatial database is positioned by these address informations.
The inventive method may further comprise the steps (as shown in Figure 1):
(1) at first collects the webpage that all comprise the unknown place name keyword of user's input.The obtain manner of webpage can retrieve the webpage that comprises keyword from the existing web page library in this locality, also can download to this locality again by search engine retrieving to the web page interlinkage that comprises keyword.Extract the contextual information that comprises keyword then from webpage, context is a plain text information, size at 200 words with interior being advisable (keyword before and after each 100 words).We pass through manual research, to a given unknown place name, in the web page text set that comprises this unknown place name, address information (or the claim space correlation address) overwhelming majority in text that can describe this unknown place name position all appears at apart from 100 words of this unknown place name with in the interior context, and other address informations (or claiming uncorrelated address, space) that can not describe this unknown place name position then major part appear at apart from the context beyond 100 words of this entity title.So can under the situation that influences space correlation address extraction effect hardly, get rid of space-independent address information preferably when context is got 100 words, the harmful effect that the space irrelevant information that reduction identifies brings.
(2) spatial database that site for service had according to the map makes up corresponding dictionary of place name.Place name vocabulary in this dictionary of place name all derives from the address information in the spatial database, and correspondingly, each place name vocabulary all can have concrete coordinate position.Employing is based on the matching process (Zan Hongying of dictionary of place name, " based on the Chinese web page retrieval research of entity attribute ", Peking University's PhD dissertation, 2004), extracting all from the unknown place name keyword context of all webpages can be by the address information (address information that occurs in spatial database in other words) of the direct location of spatial database.
(3) quantitative Analysis goes out the spatial correlation of the unknown place name keyword of these address informations and user's input.Spatial correlation is meant the spatial coherence of address and unknown place name keyword, and the geographic position of the address of promptly identifying and user import the adjacent degree in the geographic position of unknown place name keyword.Spatial correlation is calculated mainly and is calculated (Luo Yingwei etc., " extracting method of entity address message in a kind of text context ", patented claim) according to the distance of the literal between address in the text and the unknown place name keyword.
(4) according to the space cluster analysis of address information is revised spatial correlation.Because the address relevant with the unknown place name keyword space has aggregation on the geographic position,, also be spatial neighbor each other promptly with the address of same ground spot correlation, then there are not this characteristics between space-independent address.Therefore according to spatial database the address translation that identifies is become concrete geographic position, space clustering by Map Services calculates (Alan T.Murray and Vladimir Estivill-Castro, Cluster discovery techniques for exploratory spatial data analysis, International Journal ofGeographical Information Science, 1998,12 (5): 431-443.), it is densely distributed therefrom to find out the address, the initial high again zone of the degree of correlation, and think that the address in this zone most possibly is the space correlation address, significantly promote their degree of correlation.The modification method of a simple spatial correlation value is exactly: the spatial correlation of each address that should the zone adds up, and note is ∑ R, and the revised spatial correlation value in each address in this zone adds ∑ R for its original spatial correlation value.
(5), get the preceding several addresses of rank and return to the user, and on map, identify all return results for user's selection as positioning result according to the rank of address spatial correlation.Because the address of identifying all is the known address in the spatial database, therefore can directly on map, positions and identify out.
For achieving the above object, the present invention adopts following technical scheme.
The localization method of unknown place name the steps include: in a kind of network map service
1) collects the webpage that comprises the unknown place name keyword of user's input in the existing web page library;
2) from webpage, extract the contextual information that comprises the place name keyword;
3) from the place name keyword context of all webpages, extract the address information that comprises in all dictionaries of place name;
4) calculate the spatial correlation of unknown place name keyword of above-mentioned address information and user input;
5) spatial correlation is carried out rank, get before the rank several addresses and identify on map as positioning result and return to the user.
In the described method collection mode of webpage for from the existing web page library in this locality, retrieve the webpage that comprises keyword or by search engine retrieving to the web page interlinkage that comprises keyword, download to this locality again.
The place name keyword context of described webpage is a plain text information, in each 100 word of keyword front and back.
Described dictionary of place name is for setting up according to the spatial database of network map site for service, and each place name vocabulary all has concrete coordinate position.
Adopt matching process to extract the address information that comprises in all dictionaries of place name in the described method based on dictionary of place name.
All address informations of being extracted from web page text in the described method can position in the network map service according to coordinate position.
Adopting space clustering to calculate in the described method revises spatial correlation.
The advantage and the good effect of invention
Compare with the place name search positioning service in the existing network Map Services, the localization method that the present invention proposes can well be handled the orientation problem of the unknown place name that does not have in the spatial database in the network map service, can provide the true address or the neighbor address of unknown place name well.In order to test the effect of the inventive method, we are example with Beijing, adopt the Address Recognition method based on the dictionary of place name coupling, place name and the organization names that does not have in 174 spatial databases positioned, and partial results is listed in the table 1.When looking into " many one-tenth Xinda trade Co., Ltd " as can be seen, the maximally related address that we find is " No. 7 institutes in pine elm North Road, Chaoyang District ", and its spatial correlation is higher than other addresses far away after revising.And in the recognition result to " Jing Pu garden bioengineering company limited ", though the degree of correlation of preceding two addresses is more or less the same, but because first place " the Haidian District Chinese Academy of Agricultural Sciences " and third " No. 12, Zhong Guan-cun, Haidian District Nanjing University street " expression is same address, and the user can tend to the address of believing that aggregation is more intense, so still can reach the purpose of accurate location.Look into " the old palace new great achievement of will Furniture Factory ", though first result's geographic range is bigger, second result can carry out accurate localization by assisted user.And look into " chatterbox coffee shop " also is that preceding two results are not only with a high credibility, and space clustering is strong, can play the effect of correct location.
The partial test result that the place name that does not have in the table 1 pair spatial database positions
Figure C20071012054700071
Description of drawings
Fig. 1 has illustrated one at the positioning flow figure that is not present in the unknown place name in the spatial database.
The map locating effect of Fig. 2 unknown place name
Embodiment:
Illustrate that below by a concrete example how implementing the described method of this patent comes the unknown place name that does not have in the spatial database is positioned.Suppose this place of user inquiring " punk's cosmetology ", at first get access to the webpage that all comprise " punk's cosmetology ", and be saved in (2) the individual module among Fig. 1 by web page collection module (the 1st module among Fig. 1).After by webpage pre-service and context interception module (the 3rd module among Fig. 1) label information in the webpage being removed, intercept out the contextual information of 100 words in front and back of all " punk's cosmetologys " in the webpage and give the address information extraction resume module.The address information extraction module adopts the matching process based on dictionary of place name to extract existing address information in all spatial databases from context, such as " Tsing-Hua University's east gate ", " Hua Qingjia garden, five road junction, Haidian District ", " No. 48, West Road, North 3rd Ring Road, Haidian District ", " five road junctions, Haidian, Beijing " or the like, then by the relatedness computation module (the 5th module among Fig. 1) of address according to these addresses its spatial correlation of distance calculation apart from " punk's cosmetology ".Such as for " punk's cosmetology Haidian Distinguish Hua Qingjia garden, five road junctionsBuilding 8 north one deck, Tsing-Hua University's east gate300 meters in south " this section context, the part of mark underscore is the address of identifying, " Hua Qingjia garden, five road junction, Haidian District " is because of closer apart from keyword " punk's cosmetology ", so the degree of correlation is just higher.And " Tsing-Hua University's east gate " is distant because of distance, and then the degree of correlation can be hanged down.Some space-independent address is many because of occurrence number, or nearer apart from keyword, thereby can obtain higher spatial correlation.But after carrying out the space clustering analysis based on the address degree of correlation correcting module (the 6th module among Fig. 1) of space clustering, we can find " Tsing-Hua University's east gate ", this several addresses distance of " Hua Qingjia garden, five road junction, Haidian District " and " five road junctions, Haidian, Beijing " is very near, has tangible space clustering (calculate the distance that can obtain each other by space clustering and have only hundreds of rice), " No. 48, West Road, North 3rd Ring Road, Haidian District " then several kilometers apart from these several addresses, thus we will think that these addresses adjacent one another are more may be the spatial correlations (concrete method for improving is: the spatial correlation value of each address all adds the accumulated value of the degree of correlation of these space clustering addresses) that the space correlation address of place name keyword promotes them.At last show that by the 7th module result among Fig. 1 several addresses that interface is the most forward with rank and their spatial correlation represent to the user in the mode of literal and map, help the user to determine to select which address as destination address (as shown in Figure 2).

Claims (7)

1. the localization method of unknown place name the steps include: during a network map was served
1) collects the webpage that comprises the unknown place name keyword of user's input in the existing web page library;
2) from webpage, extract the contextual information that comprises the place name keyword;
3) from the place name keyword context of all webpages, extract the address information that comprises in all dictionaries of place name;
4) calculate the spatial correlation of unknown place name keyword of above-mentioned address information and user input;
5) spatial correlation is carried out rank, get before the rank several addresses and identify on map as positioning result and return to the user.
2. localization method as claimed in claim 1, the collection mode that it is characterized in that webpage for from the existing web page library in this locality, retrieve the webpage that comprises keyword or by search engine retrieving to the web page interlinkage that comprises keyword, download to this locality again.
3. localization method as claimed in claim 1 is characterized in that the place name keyword context of described webpage is a plain text information, in each 100 word of keyword front and back.
4. localization method as claimed in claim 1 is characterized in that described dictionary of place name for setting up according to the spatial database of network map site for service, and each place name vocabulary all has concrete coordinate position.
5. localization method as claimed in claim 1 is characterized in that adopting the matching process based on dictionary of place name to extract the address information that comprises in all dictionaries of place name.
6. as claim 1 or 5 described localization methods, all address informations that it is characterized in that from web page text being extracted can position in the network map service according to coordinate position.
7. localization method as claimed in claim 1 is characterized in that adopting space clustering to calculate spatial correlation is revised.
CNB2007101205475A 2007-08-21 2007-08-21 Method for locating unknown place name in network map service Expired - Fee Related CN100478960C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2007101205475A CN100478960C (en) 2007-08-21 2007-08-21 Method for locating unknown place name in network map service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2007101205475A CN100478960C (en) 2007-08-21 2007-08-21 Method for locating unknown place name in network map service

Publications (2)

Publication Number Publication Date
CN101110080A CN101110080A (en) 2008-01-23
CN100478960C true CN100478960C (en) 2009-04-15

Family

ID=39042153

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2007101205475A Expired - Fee Related CN100478960C (en) 2007-08-21 2007-08-21 Method for locating unknown place name in network map service

Country Status (1)

Country Link
CN (1) CN100478960C (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100573506C (en) * 2008-06-25 2009-12-23 中国科学院地理科学与资源研究所 A kind of space-time fusion method of natural language expressing dynamic traffic information
CN101840406B (en) * 2009-03-20 2015-10-14 富士通株式会社 Place name searching device and system
CN101777082A (en) * 2010-03-01 2010-07-14 苏州数字地图网络科技有限公司 Correlation method of text information and geological information and system
US8812734B2 (en) 2010-09-01 2014-08-19 Microsoft Corporation Network feed content
CN103150313A (en) * 2012-03-05 2013-06-12 苏州盛景数字技术服务有限公司 Address locating method based on space interpolation
CN103955505B (en) * 2014-04-24 2017-09-26 中国科学院信息工程研究所 A kind of event method of real-time and system based on microblogging
CN105224622A (en) * 2015-09-22 2016-01-06 中国搜索信息科技股份有限公司 The place name address extraction of Internet and standardized method
CN105335468B (en) * 2015-09-28 2019-09-13 北京信息科技大学 A kind of geographical location entity norm method based on Baidu map API
CN109827590A (en) * 2019-01-11 2019-05-31 北京猎户星空科技有限公司 A kind of control method of robot, device, equipment and medium
CN111859849B (en) * 2020-07-01 2023-11-24 邦道科技有限公司 Management method and device for electricity utilization address
CN112836146B (en) * 2021-03-09 2024-05-14 威创集团股份有限公司 Geographic space coordinate information acquisition method and device based on network message

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1272657A (en) * 1999-03-23 2000-11-08 索尼国际(欧洲)股份有限公司 Automatic management of map bearing information and map information system related information system and method
CN1770155A (en) * 2005-09-23 2006-05-10 赵忠华 Electronic map making and using method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1272657A (en) * 1999-03-23 2000-11-08 索尼国际(欧洲)股份有限公司 Automatic management of map bearing information and map information system related information system and method
CN1770155A (en) * 2005-09-23 2006-05-10 赵忠华 Electronic map making and using method

Also Published As

Publication number Publication date
CN101110080A (en) 2008-01-23

Similar Documents

Publication Publication Date Title
CN100478960C (en) Method for locating unknown place name in network map service
CN101918945B (en) Automatic expanded language search
CN102043833B (en) Search method and device based on query word
CN101647020B (en) Searching structured geographical data
CN101136028B (en) Position enquiring system based on free-running speech and position enquiring system based on key words
Kao et al. WISDOM: Web intrapage informative structure mining based on document object model
US8682882B2 (en) System and method for automatically identifying classified websites
WO2006133538A1 (en) System and method for ranking web content
CN100507918C (en) Automatic positioning method of network key resource page
CN101777082A (en) Correlation method of text information and geological information and system
CN101350013A (en) Method and system for searching geographical information
CN102722498A (en) Search engine and implementation method thereof
US7668859B2 (en) Method and system for enhanced web searching
CN102169503A (en) Method and device for obtaining searching result corresponding with user query sequence
CN101794277B (en) Method for embedding geographical labels in network character information and system
CN102722499A (en) Search engine and implementation method thereof
Ahlers et al. Location-based Web search
CN100470549C (en) Form locating data mining method
CN101676901A (en) Search dispatching method and search server
KR20020022977A (en) Internet resource retrieval and browsing method based on expanded web site map and expanded natural domain names assigned to all web resources
KR20130131657A (en) Method and system for brand naming, and recording medium thereof
JP5639549B2 (en) Information retrieval apparatus, method, and program
CN106649883B (en) cross-language theme website automatic discovery method
KR20170132376A (en) Method and Apparatus for Recommending Service Provider Using Social Data
Asadi et al. Using local popularity of web resources for geo-ranking of search engine results

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090415

Termination date: 20130821