CN109933797A - Geocoding and system based on Jieba participle and address dictionary - Google Patents

Geocoding and system based on Jieba participle and address dictionary Download PDF

Info

Publication number
CN109933797A
CN109933797A CN201910220419.0A CN201910220419A CN109933797A CN 109933797 A CN109933797 A CN 109933797A CN 201910220419 A CN201910220419 A CN 201910220419A CN 109933797 A CN109933797 A CN 109933797A
Authority
CN
China
Prior art keywords
address
matching
participle
geocoding
character string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910220419.0A
Other languages
Chinese (zh)
Inventor
童蔚苹
张嘉旭
张悦
韦茵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201910220419.0A priority Critical patent/CN109933797A/en
Publication of CN109933797A publication Critical patent/CN109933797A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses it is a kind of based on Jieba participle and address dictionary Geocoding and system.The method comprise the steps that step 1: acquisition address date establishes address database;Step 2: the address character string of user's input is segmented;Step 3: carrying out two-wheeled address matching and Address Standardization;Step 4: normal address is mapped as geographical coordinate.System of the invention includes: address database, for saving collected eight grade standards address date and its geographical coordinate;Word segmentation module, the address character string for inputting user are split;Accurate matching module, for carrying out accurate matching, and completion parent address step by step to the address array after fractionation;Fuzzy matching module for carrying out fuzzy matching to inaccurate matched address character string, and completes the standardization of address;Mapping block for being geographical coordinate by standardized address of cache, and returns to user.Inventive algorithm is easily understood, and is easily programmed realization.

Description

Geocoding and system based on Jieba participle and address dictionary
Technical field:
The present invention relates to it is a kind of based on Jieba participle and address dictionary Geocoding and system, belong to geographical volume Code address resolution technical field.
Background technique:
Geocoding is to be related to the basis of the GIS-Geographic Information System work of address and geographical coordinate conversion.We are usually used The criteria of right and wrong address, any software for obtaining geographical coordinate by user's input address, which will be realized, passes through nonstandardized technique Address obtains correct geographical coordinate.Correct geographical coordinate in order to obtain, it is necessary to off-gauge address is standardized, and Its geographical coordinate is parsed, to carry out further geoanalysis and location-based service.
The research origin of geocoding system in the U.S., protect in environment by the geocoding software tool developed on its basis The various fields such as shield, urban planning have played important function.But since Chinese address classification is different from English address, the word for including Language ambiguity, the features such as grammer is complicated, the geocoding software established on the basis of English geocoding system can not answer completely With on Chinese geographical information library.
In order to promote Comprehensive management of civil engineering Information System configuration, Chinese geocoding standards come into being.And with The development of GIS technology, more and more needs of work obtain geographical coordinate using nonstandardized technique address, according to Chinese geocoding The nonstandardized technique address of standard user input, and obtain correct geographical coordinate and become the common need of many work It asks.
More common geocoding can be divided into rule-based and statistics according to matching way at present, and street matching obscures Address matching etc..In addition to this, there are many more other matching ways.But complexity and different regions address due to Chinese The otherness of composition, these matching ways all there is also some problems.For the geocoding and geography for meeting present substantial amounts The demand of coordinate conversion proposes that an efficient Geocoding and system are very necessary.
Summary of the invention
The object of the present invention is to provide a kind of Geocoding and system based on Jieba participle and address dictionary, calculation Method is easily understood, and is easily programmed realization, and the exchange for being conducive to geography information is propagated, and promotes industry and social development.
Above-mentioned purpose is achieved through the following technical solutions:
A kind of Geocoding based on Jieba participle and address dictionary, this method comprises the following steps:
Step 1: acquisition address date establishes address database;
Step 2: the address character string of user's input is segmented;
Step 3: carrying out two-wheeled address matching and Address Standardization;
Step 4: normal address is mapped as geographical coordinate.
The Geocoding based on Jieba participle and address dictionary, it is eight that the address database, which is divided into, Grade, respectively country, province or municipality directly under the Central Government, city, district, small towns or street, road section, POI, detailed description, the master of every level-one Code is its ID, and outer code is the ID of its parent.
The described Geocoding based on Jieba participle and address dictionary, the record in the address database are pressed Its word frequency and initial sequence.
The Geocoding based on Jieba participle and address dictionary, the address character to user's input It is to segment " accurate model " using Jieba to carry out Chinese address character string participle, and utilize Jieba participle that string, which carries out participle, " Custom Dictionaries " import the dictionary in address database, improve word segmentation accuracy.
The Geocoding based on Jieba participle and address dictionary, the two-wheeled address matching include:
The first round accurately matches: address array after traversal participle is sentenced in equal rule and address database using character string Address record is step by step accurately matched, until institute can matched lowermost level until, and with this its all parent of completion step by step Location.
Second wheel fuzzy matching: the traversal first round inaccurate matched character string measures phase using string editing distance Fuzzy matching is carried out like degree, finally matching degree is ranked up, selects similarity high as matching result.
The described Geocoding based on Jieba participle and address dictionary, in the two-wheeled address matching process, benefit Next stage is matched with parent matching result and generates constraint.
Another aspect of the present invention proposes a kind of geocoding system based on Jieba participle and address dictionary, comprising:
Address database, for saving collected eight grade standards address date and its geographical coordinate;Word segmentation module is used for The address character string that user inputs is split;Accurate matching module, it is smart step by step for being carried out to the address array after fractionation Really matching, and completion parent address;Fuzzy matching module, for carrying out fuzzy matching to inaccurate matched address character string, And complete the standardization of address;Mapping block for being geographical coordinate by standardized address of cache, and returns to user.
The described geocoding system based on Jieba participle and address dictionary, the mapping block by address record with Its center longitude corresponds.
The utility model has the advantages that
Compared with prior art, the present invention has following technical effect:
1. there is good applicability, recall precision with higher for medium and small area.
2. algorithm is easily understood, it is easily programmed realization.
3. improving the user experience of GIS-Geographic Information System, the quality of geographic information services is promoted.
4. the exchange for being conducive to geography information is propagated, promote industry and social development.
Detailed description of the invention
Fig. 1 is Geocoding flow chart of the present invention;
Fig. 2 is each tables of data table relational graph of address database of the present invention;
Fig. 3 is exact matching algorithm flow chart;
Fig. 4 is fuzzy matching algorithm flow chart.
Specific embodiment
Present invention is further described in detail with specific embodiment with reference to the accompanying drawing:
Fig. 1 is the flow chart of the system, and the Geocoding based on Jieba participle and eight grades of address models includes four Key step: step 1: establishing the address database of eight grades of hierarchical structures, step 2: segmenting to user's input address based on Jieba Address participle is carried out with address dictionary, step 3: accurate matching and fuzzy matching and Address Standardization, step 4: normal address is reflected It penetrates as geographical coordinate and is visualized on map.
Step 1, the address database of eight grades of hierarchical structures is established.Eight grades of address hierarchy models are to address according to following table Data are cleaned and are handled and establish address database.
Address rank element
The hierarchy model refers to " People's Republic of China's urban construction standard " its " Comprehensive management of civil engineering information system Geocoding " (CJ/T215 --- 2005).Address factor rank is divided into top-down eight grades, is respectively as follows: country, saves, is straight Have jurisdiction over city, provincial capital, prefecture-level city, district, street, small towns, road section, POI, better address.It is each in the database of normal address Grade address is made of seven address elements, wherein ID: address code (primary key), fatherID: affiliated upper level address coding is (outer Code), level: Address factor rank, name: title, alias: address aliases, longitude: longitude, latitude: Latitude.
Step 2, Jieba participle is based on to user's input address and address dictionary carries out address participle.Jieba is in one Text participle library, it is widely used in terms of natural language processing.Preferably, creation includes address database title and alias Custom Dictionaries carry out essence to the Chinese address character string of input to improve participle accuracy, using the accurate model in the library Jieba Really participle.
Step 3, by accurately matching and fuzzy matching and Address Standardization.Two-wheeled matching in address is carried out to word segmentation result, First accurate matching, rear fuzzy matching.Accurate matching is sentenced etc. to be matched to identical title or not using character string Name carries out fuzzy matching under the constraint of accurate matched minimum level-one, using editing distance algorithm find out address to be matched with The similarity of character string of normal address element, it is preferable that return to the highest address of matching degree.Optionally, it is dropped by matching degree Sequence returns to multiple matching results.Finally the lowermost level that can be matched to is recalled according to fatherID, completion address is to obtain Standardized address.
Exact matching algorithm is as follows:
Step1: initialization normal address array S [8], to fuzzy matching character string list P.It defines and initializes i=0 table Show i-th of character string, j=0 indicates j-th stage tables of data, and k=0 indicates kth item record;
Step2: the address character tandem table L after traversal participle has n character string, takes out address character string wi
Step3: since highest address date table, the address record in address database is readWherein j indicates the J grades of tables of data, k indicate kth item record, recordNumberjIndicate the total number of records in j grades of tables of data;
Step4: ifThenGo to Step5;
IfAnd k < recordNumberj, then k=k+1, returns to Step3;
IfAnd k >=recordNumberjAnd j≤8, then j=j+1, returns to Step3;
IfAnd j >=8, then by wiP is added, goes to Step5;
Step5: if i < n, i=i+1, j=0, k=0, Step3 is returned to;If i >=n goes to Step6;
Step6: according to the lowermost level address (setting lowermost level as v) of successful match accurate in S, recall its parent, mend step by step Its complete all parent address;
Fuzzy matching algorithm is as follows:
Step1: traversal takes out address character string w to fuzzy matching character string exception list Pi, enable j=0;
Step2: reading all records in address database under v grades of matching constraints, takes out record Rj
Step3: w is calculatediWith RjSimilarity s then record R if more than threshold value T (T=0.2)jAnd its similarity s, if small In threshold value T, then cast out Rj;If j < recordNumber, j=j+1, Step2 is returned;If j >=recordNumber is returned Step1 is returned, until traversal terminates;
Step4: being ranked up the similarity recorded, and selects the highest preceding m item record of matching degree, its name is inserted The corresponding position of S, and return to the position coordinates of the lowermost level address of S and its successful match;
Step 4, normal address is mapped as geographical coordinate and visualizes on map.Address Standardization refers to by address word Allusion quotation converts the address character string of general type to the group of words of structuring.Since single-level address every in database has wherein The longitude and latitude of the heart returns to the longitude and latitude for the lowermost level that can be matched to, according to this if afterbody is not matched to lowermost level Longitude and latitude realizes space orientation on map.
System design is divided into three levels, and system database is realized by SQL Server 2014, each table relational graph such as Fig. 2 It is shown;Forms program is built using .net frame;Amap is called to interact using HTML in front end.
Geocoding system of the invention can solve a variety of geocoding problems, and following table, which illustrates, (but simultaneously not only to be limited In this):
The above is only a preferred embodiment of the present invention, it should be pointed out that: for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (8)

1. a kind of Geocoding based on Jieba participle and address dictionary, which is characterized in that this method includes following step It is rapid:
Step 1: acquisition address date establishes address database;
Step 2: the address character string of user's input is segmented;
Step 3: carrying out two-wheeled address matching and Address Standardization;
Step 4: normal address is mapped as geographical coordinate.
2. the Geocoding according to claim 1 based on Jieba participle and address dictionary, which is characterized in that institute It is eight grades that the address database stated, which is divided into, respectively country, province or municipality directly under the Central Government, city, district, small towns or street, road section, POI, detailed description, the primary key of every level-one are its ID, and outer code is the ID of its parent.
3. the Geocoding according to claim 1 based on Jieba participle and address dictionary, which is characterized in that institute The record stated in address database sorts by its word frequency and initial.
4. the Geocoding according to claim 1 based on Jieba participle and address dictionary, which is characterized in that institute Stating and carrying out participle to the address character string of user's input is to segment " accurate model " using Jieba to carry out Chinese address character string Participle, and using " Custom Dictionaries " of Jieba participle, the dictionary in address database is imported, word segmentation accuracy is improved.
5. the Geocoding according to claim 1 based on Jieba participle and address dictionary, which is characterized in that institute Stating two-wheeled address matching includes:
The first round accurately matches: address array after traversal participle, sentences the address in equal rule and address database using character string Record is accurately matched step by step, until institute can matched lowermost level until, and with this its all parent address of completion step by step.
Second wheel fuzzy matching: the traversal first round inaccurate matched character string measures similar journey using string editing distance Degree carries out fuzzy matching, is finally ranked up to matching degree, selects similarity high as matching result.
6. the Geocoding according to claim 1 based on Jieba participle and address dictionary, which is characterized in that institute It states in two-wheeled address matching process, next stage is matched using parent matching result and generates constraint.
7. a kind of geocoding system based on Jieba participle and address dictionary characterized by comprising
Address database, for saving collected eight grade standards address date and its geographical coordinate;Word segmentation module, for that will use The address character string of family input is split;Accurate matching module, for carrying out accurate step by step to the address array after fractionation Match, and completion parent address;Fuzzy matching module, for carrying out fuzzy matching to inaccurate matched address character string, and it is complete At the standardization of address;Mapping block for being geographical coordinate by standardized address of cache, and returns to user.
8. the geocoding system according to claim 7 based on Jieba participle and address dictionary, the mapping block will Address record is corresponded with its center longitude.
CN201910220419.0A 2019-03-21 2019-03-21 Geocoding and system based on Jieba participle and address dictionary Pending CN109933797A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910220419.0A CN109933797A (en) 2019-03-21 2019-03-21 Geocoding and system based on Jieba participle and address dictionary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910220419.0A CN109933797A (en) 2019-03-21 2019-03-21 Geocoding and system based on Jieba participle and address dictionary

Publications (1)

Publication Number Publication Date
CN109933797A true CN109933797A (en) 2019-06-25

Family

ID=66988122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910220419.0A Pending CN109933797A (en) 2019-03-21 2019-03-21 Geocoding and system based on Jieba participle and address dictionary

Country Status (1)

Country Link
CN (1) CN109933797A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569322A (en) * 2019-07-26 2019-12-13 苏宁云计算有限公司 Address information analysis method, device and system and data acquisition method
CN110688851A (en) * 2019-09-26 2020-01-14 税友软件集团股份有限公司 Method, device and medium for extracting key information of address text
CN110826318A (en) * 2019-10-14 2020-02-21 浙江数链科技有限公司 Method, device, computer device and storage medium for logistics information identification
CN111125076A (en) * 2019-12-17 2020-05-08 武汉海云健康科技股份有限公司 Big data based medicine universal name cleaning method and system, server and medium
CN111222345A (en) * 2020-01-15 2020-06-02 合肥慧图软件有限公司 Place name address visualization analysis method based on semantic word segmentation technology
CN111797182A (en) * 2020-05-29 2020-10-20 深圳市跨越新科技有限公司 Address code analysis method and system
CN112115144A (en) * 2020-09-15 2020-12-22 中电科华云信息技术有限公司 Method for comparing address matching based on standard address matrix weighted mapping
CN112612863A (en) * 2020-12-23 2021-04-06 武汉大学 Address matching method and system based on Chinese word segmentation device
WO2021189977A1 (en) * 2020-08-31 2021-09-30 平安科技(深圳)有限公司 Address coding method and apparatus, and computer device and computer-readable storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719128A (en) * 2009-12-31 2010-06-02 浙江工业大学 Fuzzy matching-based Chinese geo-code determination method
CN101882163A (en) * 2010-06-30 2010-11-10 中国科学院地理科学与资源研究所 Fuzzy Chinese address geographic evaluation method based on matching rule
CN104156415A (en) * 2014-07-31 2014-11-19 沈阳锐易特软件技术有限公司 Mapping processing system and method for solving problem of standard code control of medical data
CN105005577A (en) * 2015-05-08 2015-10-28 裴克铭管理咨询(上海)有限公司 Address matching method
CN105404686A (en) * 2015-12-10 2016-03-16 湖南科技大学 Method for matching place name and address in news event based on geographical feature hierarchical segmented words
CN108416062A (en) * 2018-03-26 2018-08-17 国家电网公司客户服务中心 A kind of electric network data correlating method based on address matching technology
CN109033086A (en) * 2018-08-03 2018-12-18 银联数据服务有限公司 A kind of address resolution, matched method and device
CN109145073A (en) * 2018-08-28 2019-01-04 成都市映潮科技股份有限公司 A kind of address resolution method and device based on segmentation methods
CN109344213A (en) * 2018-08-28 2019-02-15 浙江工业大学 A kind of Chinese Geocoding based on dictionary tree
CN109359200A (en) * 2018-10-11 2019-02-19 北京国信达数据技术有限公司 Place name address date intelligently parsing system
CN109359186A (en) * 2018-10-25 2019-02-19 杭州时趣信息技术有限公司 A kind of method, apparatus and computer readable storage medium of determining address information

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719128A (en) * 2009-12-31 2010-06-02 浙江工业大学 Fuzzy matching-based Chinese geo-code determination method
CN101882163A (en) * 2010-06-30 2010-11-10 中国科学院地理科学与资源研究所 Fuzzy Chinese address geographic evaluation method based on matching rule
CN104156415A (en) * 2014-07-31 2014-11-19 沈阳锐易特软件技术有限公司 Mapping processing system and method for solving problem of standard code control of medical data
CN105005577A (en) * 2015-05-08 2015-10-28 裴克铭管理咨询(上海)有限公司 Address matching method
CN105404686A (en) * 2015-12-10 2016-03-16 湖南科技大学 Method for matching place name and address in news event based on geographical feature hierarchical segmented words
CN108416062A (en) * 2018-03-26 2018-08-17 国家电网公司客户服务中心 A kind of electric network data correlating method based on address matching technology
CN109033086A (en) * 2018-08-03 2018-12-18 银联数据服务有限公司 A kind of address resolution, matched method and device
CN109145073A (en) * 2018-08-28 2019-01-04 成都市映潮科技股份有限公司 A kind of address resolution method and device based on segmentation methods
CN109344213A (en) * 2018-08-28 2019-02-15 浙江工业大学 A kind of Chinese Geocoding based on dictionary tree
CN109359200A (en) * 2018-10-11 2019-02-19 北京国信达数据技术有限公司 Place name address date intelligently parsing system
CN109359186A (en) * 2018-10-25 2019-02-19 杭州时趣信息技术有限公司 A kind of method, apparatus and computer readable storage medium of determining address information

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569322A (en) * 2019-07-26 2019-12-13 苏宁云计算有限公司 Address information analysis method, device and system and data acquisition method
WO2021017679A1 (en) * 2019-07-26 2021-02-04 苏宁易购集团股份有限公司 Address information parsing method and apparatus, system and data acquisition method
CN110688851A (en) * 2019-09-26 2020-01-14 税友软件集团股份有限公司 Method, device and medium for extracting key information of address text
CN110826318A (en) * 2019-10-14 2020-02-21 浙江数链科技有限公司 Method, device, computer device and storage medium for logistics information identification
CN111125076A (en) * 2019-12-17 2020-05-08 武汉海云健康科技股份有限公司 Big data based medicine universal name cleaning method and system, server and medium
CN111222345A (en) * 2020-01-15 2020-06-02 合肥慧图软件有限公司 Place name address visualization analysis method based on semantic word segmentation technology
CN111797182A (en) * 2020-05-29 2020-10-20 深圳市跨越新科技有限公司 Address code analysis method and system
CN111797182B (en) * 2020-05-29 2024-01-30 深圳市跨越新科技有限公司 Address code analysis method and system
WO2021189977A1 (en) * 2020-08-31 2021-09-30 平安科技(深圳)有限公司 Address coding method and apparatus, and computer device and computer-readable storage medium
CN112115144A (en) * 2020-09-15 2020-12-22 中电科华云信息技术有限公司 Method for comparing address matching based on standard address matrix weighted mapping
CN112612863A (en) * 2020-12-23 2021-04-06 武汉大学 Address matching method and system based on Chinese word segmentation device
CN112612863B (en) * 2020-12-23 2023-03-31 武汉大学 Address matching method and system based on Chinese word segmentation device

Similar Documents

Publication Publication Date Title
CN109933797A (en) Geocoding and system based on Jieba participle and address dictionary
CN109145169B (en) Address matching method based on statistical word segmentation
CN101350012B (en) Method and system for matching address
CN104866593B (en) A kind of database search method of knowledge based collection of illustrative plates
CN107145577A (en) Address standardization method, device, storage medium and computer
US20030165254A1 (en) Adapting point geometry for storing address density
CN111324679B (en) Method, device and system for processing address information
US20030158661A1 (en) Programmatically computing street intersections using street geometry
CN112612863B (en) Address matching method and system based on Chinese word segmentation device
CN103514235B (en) A kind of method for building up of incremental code library and device
CN109145073A (en) A kind of address resolution method and device based on segmentation methods
US6658356B2 (en) Programmatically deriving street geometry from address data
CN106874287A (en) A kind of processing method and processing device of point of interest POI geocodings
CN108062365B (en) Method for improving address resolution accuracy
CN107463711A (en) A kind of tag match method and device of data
CN112528174A (en) Address finishing and complementing method based on knowledge graph and multiple matching and application
CN107908627A (en) A kind of multilingual map POI search systems
CN116414823A (en) Address positioning method and device based on word segmentation model
CN111522892A (en) Geographic element retrieval method and device
CN114168705B (en) Chinese address matching method based on address element index
Mokhtari et al. Tagging address queries in maps search
CN114201480A (en) Multi-source POI fusion method and device based on NLP technology and readable storage medium
CN110060472A (en) Road traffic accident localization method, system, readable storage medium storing program for executing and equipment
CN101567150A (en) Method for accurately positioning digital map
CN111325235B (en) Multilingual-oriented universal place name semantic similarity calculation method and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination