CN109933797A - Geocoding and system based on Jieba participle and address dictionary - Google Patents
Geocoding and system based on Jieba participle and address dictionary Download PDFInfo
- Publication number
- CN109933797A CN109933797A CN201910220419.0A CN201910220419A CN109933797A CN 109933797 A CN109933797 A CN 109933797A CN 201910220419 A CN201910220419 A CN 201910220419A CN 109933797 A CN109933797 A CN 109933797A
- Authority
- CN
- China
- Prior art keywords
- address
- matching
- participle
- geocoding
- character string
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The invention discloses it is a kind of based on Jieba participle and address dictionary Geocoding and system.The method comprise the steps that step 1: acquisition address date establishes address database;Step 2: the address character string of user's input is segmented;Step 3: carrying out two-wheeled address matching and Address Standardization;Step 4: normal address is mapped as geographical coordinate.System of the invention includes: address database, for saving collected eight grade standards address date and its geographical coordinate;Word segmentation module, the address character string for inputting user are split;Accurate matching module, for carrying out accurate matching, and completion parent address step by step to the address array after fractionation;Fuzzy matching module for carrying out fuzzy matching to inaccurate matched address character string, and completes the standardization of address;Mapping block for being geographical coordinate by standardized address of cache, and returns to user.Inventive algorithm is easily understood, and is easily programmed realization.
Description
Technical field:
The present invention relates to it is a kind of based on Jieba participle and address dictionary Geocoding and system, belong to geographical volume
Code address resolution technical field.
Background technique:
Geocoding is to be related to the basis of the GIS-Geographic Information System work of address and geographical coordinate conversion.We are usually used
The criteria of right and wrong address, any software for obtaining geographical coordinate by user's input address, which will be realized, passes through nonstandardized technique
Address obtains correct geographical coordinate.Correct geographical coordinate in order to obtain, it is necessary to off-gauge address is standardized, and
Its geographical coordinate is parsed, to carry out further geoanalysis and location-based service.
The research origin of geocoding system in the U.S., protect in environment by the geocoding software tool developed on its basis
The various fields such as shield, urban planning have played important function.But since Chinese address classification is different from English address, the word for including
Language ambiguity, the features such as grammer is complicated, the geocoding software established on the basis of English geocoding system can not answer completely
With on Chinese geographical information library.
In order to promote Comprehensive management of civil engineering Information System configuration, Chinese geocoding standards come into being.And with
The development of GIS technology, more and more needs of work obtain geographical coordinate using nonstandardized technique address, according to Chinese geocoding
The nonstandardized technique address of standard user input, and obtain correct geographical coordinate and become the common need of many work
It asks.
More common geocoding can be divided into rule-based and statistics according to matching way at present, and street matching obscures
Address matching etc..In addition to this, there are many more other matching ways.But complexity and different regions address due to Chinese
The otherness of composition, these matching ways all there is also some problems.For the geocoding and geography for meeting present substantial amounts
The demand of coordinate conversion proposes that an efficient Geocoding and system are very necessary.
Summary of the invention
The object of the present invention is to provide a kind of Geocoding and system based on Jieba participle and address dictionary, calculation
Method is easily understood, and is easily programmed realization, and the exchange for being conducive to geography information is propagated, and promotes industry and social development.
Above-mentioned purpose is achieved through the following technical solutions:
A kind of Geocoding based on Jieba participle and address dictionary, this method comprises the following steps:
Step 1: acquisition address date establishes address database;
Step 2: the address character string of user's input is segmented;
Step 3: carrying out two-wheeled address matching and Address Standardization;
Step 4: normal address is mapped as geographical coordinate.
The Geocoding based on Jieba participle and address dictionary, it is eight that the address database, which is divided into,
Grade, respectively country, province or municipality directly under the Central Government, city, district, small towns or street, road section, POI, detailed description, the master of every level-one
Code is its ID, and outer code is the ID of its parent.
The described Geocoding based on Jieba participle and address dictionary, the record in the address database are pressed
Its word frequency and initial sequence.
The Geocoding based on Jieba participle and address dictionary, the address character to user's input
It is to segment " accurate model " using Jieba to carry out Chinese address character string participle, and utilize Jieba participle that string, which carries out participle,
" Custom Dictionaries " import the dictionary in address database, improve word segmentation accuracy.
The Geocoding based on Jieba participle and address dictionary, the two-wheeled address matching include:
The first round accurately matches: address array after traversal participle is sentenced in equal rule and address database using character string
Address record is step by step accurately matched, until institute can matched lowermost level until, and with this its all parent of completion step by step
Location.
Second wheel fuzzy matching: the traversal first round inaccurate matched character string measures phase using string editing distance
Fuzzy matching is carried out like degree, finally matching degree is ranked up, selects similarity high as matching result.
The described Geocoding based on Jieba participle and address dictionary, in the two-wheeled address matching process, benefit
Next stage is matched with parent matching result and generates constraint.
Another aspect of the present invention proposes a kind of geocoding system based on Jieba participle and address dictionary, comprising:
Address database, for saving collected eight grade standards address date and its geographical coordinate;Word segmentation module is used for
The address character string that user inputs is split;Accurate matching module, it is smart step by step for being carried out to the address array after fractionation
Really matching, and completion parent address;Fuzzy matching module, for carrying out fuzzy matching to inaccurate matched address character string,
And complete the standardization of address;Mapping block for being geographical coordinate by standardized address of cache, and returns to user.
The described geocoding system based on Jieba participle and address dictionary, the mapping block by address record with
Its center longitude corresponds.
The utility model has the advantages that
Compared with prior art, the present invention has following technical effect:
1. there is good applicability, recall precision with higher for medium and small area.
2. algorithm is easily understood, it is easily programmed realization.
3. improving the user experience of GIS-Geographic Information System, the quality of geographic information services is promoted.
4. the exchange for being conducive to geography information is propagated, promote industry and social development.
Detailed description of the invention
Fig. 1 is Geocoding flow chart of the present invention;
Fig. 2 is each tables of data table relational graph of address database of the present invention;
Fig. 3 is exact matching algorithm flow chart;
Fig. 4 is fuzzy matching algorithm flow chart.
Specific embodiment
Present invention is further described in detail with specific embodiment with reference to the accompanying drawing:
Fig. 1 is the flow chart of the system, and the Geocoding based on Jieba participle and eight grades of address models includes four
Key step: step 1: establishing the address database of eight grades of hierarchical structures, step 2: segmenting to user's input address based on Jieba
Address participle is carried out with address dictionary, step 3: accurate matching and fuzzy matching and Address Standardization, step 4: normal address is reflected
It penetrates as geographical coordinate and is visualized on map.
Step 1, the address database of eight grades of hierarchical structures is established.Eight grades of address hierarchy models are to address according to following table
Data are cleaned and are handled and establish address database.
Address rank element
The hierarchy model refers to " People's Republic of China's urban construction standard " its " Comprehensive management of civil engineering information system
Geocoding " (CJ/T215 --- 2005).Address factor rank is divided into top-down eight grades, is respectively as follows: country, saves, is straight
Have jurisdiction over city, provincial capital, prefecture-level city, district, street, small towns, road section, POI, better address.It is each in the database of normal address
Grade address is made of seven address elements, wherein ID: address code (primary key), fatherID: affiliated upper level address coding is (outer
Code), level: Address factor rank, name: title, alias: address aliases, longitude: longitude, latitude:
Latitude.
Step 2, Jieba participle is based on to user's input address and address dictionary carries out address participle.Jieba is in one
Text participle library, it is widely used in terms of natural language processing.Preferably, creation includes address database title and alias
Custom Dictionaries carry out essence to the Chinese address character string of input to improve participle accuracy, using the accurate model in the library Jieba
Really participle.
Step 3, by accurately matching and fuzzy matching and Address Standardization.Two-wheeled matching in address is carried out to word segmentation result,
First accurate matching, rear fuzzy matching.Accurate matching is sentenced etc. to be matched to identical title or not using character string
Name carries out fuzzy matching under the constraint of accurate matched minimum level-one, using editing distance algorithm find out address to be matched with
The similarity of character string of normal address element, it is preferable that return to the highest address of matching degree.Optionally, it is dropped by matching degree
Sequence returns to multiple matching results.Finally the lowermost level that can be matched to is recalled according to fatherID, completion address is to obtain
Standardized address.
Exact matching algorithm is as follows:
Step1: initialization normal address array S [8], to fuzzy matching character string list P.It defines and initializes i=0 table
Show i-th of character string, j=0 indicates j-th stage tables of data, and k=0 indicates kth item record;
Step2: the address character tandem table L after traversal participle has n character string, takes out address character string wi;
Step3: since highest address date table, the address record in address database is readWherein j indicates the
J grades of tables of data, k indicate kth item record, recordNumberjIndicate the total number of records in j grades of tables of data;
Step4: ifThenGo to Step5;
IfAnd k < recordNumberj, then k=k+1, returns to Step3;
IfAnd k >=recordNumberjAnd j≤8, then j=j+1, returns to Step3;
IfAnd j >=8, then by wiP is added, goes to Step5;
Step5: if i < n, i=i+1, j=0, k=0, Step3 is returned to;If i >=n goes to Step6;
Step6: according to the lowermost level address (setting lowermost level as v) of successful match accurate in S, recall its parent, mend step by step
Its complete all parent address;
Fuzzy matching algorithm is as follows:
Step1: traversal takes out address character string w to fuzzy matching character string exception list Pi, enable j=0;
Step2: reading all records in address database under v grades of matching constraints, takes out record Rj;
Step3: w is calculatediWith RjSimilarity s then record R if more than threshold value T (T=0.2)jAnd its similarity s, if small
In threshold value T, then cast out Rj;If j < recordNumber, j=j+1, Step2 is returned;If j >=recordNumber is returned
Step1 is returned, until traversal terminates;
Step4: being ranked up the similarity recorded, and selects the highest preceding m item record of matching degree, its name is inserted
The corresponding position of S, and return to the position coordinates of the lowermost level address of S and its successful match;
Step 4, normal address is mapped as geographical coordinate and visualizes on map.Address Standardization refers to by address word
Allusion quotation converts the address character string of general type to the group of words of structuring.Since single-level address every in database has wherein
The longitude and latitude of the heart returns to the longitude and latitude for the lowermost level that can be matched to, according to this if afterbody is not matched to lowermost level
Longitude and latitude realizes space orientation on map.
System design is divided into three levels, and system database is realized by SQL Server 2014, each table relational graph such as Fig. 2
It is shown;Forms program is built using .net frame;Amap is called to interact using HTML in front end.
Geocoding system of the invention can solve a variety of geocoding problems, and following table, which illustrates, (but simultaneously not only to be limited
In this):
The above is only a preferred embodiment of the present invention, it should be pointed out that: for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (8)
1. a kind of Geocoding based on Jieba participle and address dictionary, which is characterized in that this method includes following step
It is rapid:
Step 1: acquisition address date establishes address database;
Step 2: the address character string of user's input is segmented;
Step 3: carrying out two-wheeled address matching and Address Standardization;
Step 4: normal address is mapped as geographical coordinate.
2. the Geocoding according to claim 1 based on Jieba participle and address dictionary, which is characterized in that institute
It is eight grades that the address database stated, which is divided into, respectively country, province or municipality directly under the Central Government, city, district, small towns or street, road section,
POI, detailed description, the primary key of every level-one are its ID, and outer code is the ID of its parent.
3. the Geocoding according to claim 1 based on Jieba participle and address dictionary, which is characterized in that institute
The record stated in address database sorts by its word frequency and initial.
4. the Geocoding according to claim 1 based on Jieba participle and address dictionary, which is characterized in that institute
Stating and carrying out participle to the address character string of user's input is to segment " accurate model " using Jieba to carry out Chinese address character string
Participle, and using " Custom Dictionaries " of Jieba participle, the dictionary in address database is imported, word segmentation accuracy is improved.
5. the Geocoding according to claim 1 based on Jieba participle and address dictionary, which is characterized in that institute
Stating two-wheeled address matching includes:
The first round accurately matches: address array after traversal participle, sentences the address in equal rule and address database using character string
Record is accurately matched step by step, until institute can matched lowermost level until, and with this its all parent address of completion step by step.
Second wheel fuzzy matching: the traversal first round inaccurate matched character string measures similar journey using string editing distance
Degree carries out fuzzy matching, is finally ranked up to matching degree, selects similarity high as matching result.
6. the Geocoding according to claim 1 based on Jieba participle and address dictionary, which is characterized in that institute
It states in two-wheeled address matching process, next stage is matched using parent matching result and generates constraint.
7. a kind of geocoding system based on Jieba participle and address dictionary characterized by comprising
Address database, for saving collected eight grade standards address date and its geographical coordinate;Word segmentation module, for that will use
The address character string of family input is split;Accurate matching module, for carrying out accurate step by step to the address array after fractionation
Match, and completion parent address;Fuzzy matching module, for carrying out fuzzy matching to inaccurate matched address character string, and it is complete
At the standardization of address;Mapping block for being geographical coordinate by standardized address of cache, and returns to user.
8. the geocoding system according to claim 7 based on Jieba participle and address dictionary, the mapping block will
Address record is corresponded with its center longitude.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910220419.0A CN109933797A (en) | 2019-03-21 | 2019-03-21 | Geocoding and system based on Jieba participle and address dictionary |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910220419.0A CN109933797A (en) | 2019-03-21 | 2019-03-21 | Geocoding and system based on Jieba participle and address dictionary |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109933797A true CN109933797A (en) | 2019-06-25 |
Family
ID=66988122
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910220419.0A Pending CN109933797A (en) | 2019-03-21 | 2019-03-21 | Geocoding and system based on Jieba participle and address dictionary |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109933797A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110569322A (en) * | 2019-07-26 | 2019-12-13 | 苏宁云计算有限公司 | Address information analysis method, device and system and data acquisition method |
CN110688851A (en) * | 2019-09-26 | 2020-01-14 | 税友软件集团股份有限公司 | Method, device and medium for extracting key information of address text |
CN110826318A (en) * | 2019-10-14 | 2020-02-21 | 浙江数链科技有限公司 | Method, device, computer device and storage medium for logistics information identification |
CN111125076A (en) * | 2019-12-17 | 2020-05-08 | 武汉海云健康科技股份有限公司 | Big data based medicine universal name cleaning method and system, server and medium |
CN111222345A (en) * | 2020-01-15 | 2020-06-02 | 合肥慧图软件有限公司 | Place name address visualization analysis method based on semantic word segmentation technology |
CN111797182A (en) * | 2020-05-29 | 2020-10-20 | 深圳市跨越新科技有限公司 | Address code analysis method and system |
CN112115144A (en) * | 2020-09-15 | 2020-12-22 | 中电科华云信息技术有限公司 | Method for comparing address matching based on standard address matrix weighted mapping |
CN112612863A (en) * | 2020-12-23 | 2021-04-06 | 武汉大学 | Address matching method and system based on Chinese word segmentation device |
WO2021189977A1 (en) * | 2020-08-31 | 2021-09-30 | 平安科技(深圳)有限公司 | Address coding method and apparatus, and computer device and computer-readable storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101719128A (en) * | 2009-12-31 | 2010-06-02 | 浙江工业大学 | Fuzzy matching-based Chinese geo-code determination method |
CN101882163A (en) * | 2010-06-30 | 2010-11-10 | 中国科学院地理科学与资源研究所 | Fuzzy Chinese address geographic evaluation method based on matching rule |
CN104156415A (en) * | 2014-07-31 | 2014-11-19 | 沈阳锐易特软件技术有限公司 | Mapping processing system and method for solving problem of standard code control of medical data |
CN105005577A (en) * | 2015-05-08 | 2015-10-28 | 裴克铭管理咨询(上海)有限公司 | Address matching method |
CN105404686A (en) * | 2015-12-10 | 2016-03-16 | 湖南科技大学 | Method for matching place name and address in news event based on geographical feature hierarchical segmented words |
CN108416062A (en) * | 2018-03-26 | 2018-08-17 | 国家电网公司客户服务中心 | A kind of electric network data correlating method based on address matching technology |
CN109033086A (en) * | 2018-08-03 | 2018-12-18 | 银联数据服务有限公司 | A kind of address resolution, matched method and device |
CN109145073A (en) * | 2018-08-28 | 2019-01-04 | 成都市映潮科技股份有限公司 | A kind of address resolution method and device based on segmentation methods |
CN109344213A (en) * | 2018-08-28 | 2019-02-15 | 浙江工业大学 | A kind of Chinese Geocoding based on dictionary tree |
CN109359200A (en) * | 2018-10-11 | 2019-02-19 | 北京国信达数据技术有限公司 | Place name address date intelligently parsing system |
CN109359186A (en) * | 2018-10-25 | 2019-02-19 | 杭州时趣信息技术有限公司 | A kind of method, apparatus and computer readable storage medium of determining address information |
-
2019
- 2019-03-21 CN CN201910220419.0A patent/CN109933797A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101719128A (en) * | 2009-12-31 | 2010-06-02 | 浙江工业大学 | Fuzzy matching-based Chinese geo-code determination method |
CN101882163A (en) * | 2010-06-30 | 2010-11-10 | 中国科学院地理科学与资源研究所 | Fuzzy Chinese address geographic evaluation method based on matching rule |
CN104156415A (en) * | 2014-07-31 | 2014-11-19 | 沈阳锐易特软件技术有限公司 | Mapping processing system and method for solving problem of standard code control of medical data |
CN105005577A (en) * | 2015-05-08 | 2015-10-28 | 裴克铭管理咨询(上海)有限公司 | Address matching method |
CN105404686A (en) * | 2015-12-10 | 2016-03-16 | 湖南科技大学 | Method for matching place name and address in news event based on geographical feature hierarchical segmented words |
CN108416062A (en) * | 2018-03-26 | 2018-08-17 | 国家电网公司客户服务中心 | A kind of electric network data correlating method based on address matching technology |
CN109033086A (en) * | 2018-08-03 | 2018-12-18 | 银联数据服务有限公司 | A kind of address resolution, matched method and device |
CN109145073A (en) * | 2018-08-28 | 2019-01-04 | 成都市映潮科技股份有限公司 | A kind of address resolution method and device based on segmentation methods |
CN109344213A (en) * | 2018-08-28 | 2019-02-15 | 浙江工业大学 | A kind of Chinese Geocoding based on dictionary tree |
CN109359200A (en) * | 2018-10-11 | 2019-02-19 | 北京国信达数据技术有限公司 | Place name address date intelligently parsing system |
CN109359186A (en) * | 2018-10-25 | 2019-02-19 | 杭州时趣信息技术有限公司 | A kind of method, apparatus and computer readable storage medium of determining address information |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110569322A (en) * | 2019-07-26 | 2019-12-13 | 苏宁云计算有限公司 | Address information analysis method, device and system and data acquisition method |
WO2021017679A1 (en) * | 2019-07-26 | 2021-02-04 | 苏宁易购集团股份有限公司 | Address information parsing method and apparatus, system and data acquisition method |
CN110688851A (en) * | 2019-09-26 | 2020-01-14 | 税友软件集团股份有限公司 | Method, device and medium for extracting key information of address text |
CN110826318A (en) * | 2019-10-14 | 2020-02-21 | 浙江数链科技有限公司 | Method, device, computer device and storage medium for logistics information identification |
CN111125076A (en) * | 2019-12-17 | 2020-05-08 | 武汉海云健康科技股份有限公司 | Big data based medicine universal name cleaning method and system, server and medium |
CN111222345A (en) * | 2020-01-15 | 2020-06-02 | 合肥慧图软件有限公司 | Place name address visualization analysis method based on semantic word segmentation technology |
CN111797182A (en) * | 2020-05-29 | 2020-10-20 | 深圳市跨越新科技有限公司 | Address code analysis method and system |
CN111797182B (en) * | 2020-05-29 | 2024-01-30 | 深圳市跨越新科技有限公司 | Address code analysis method and system |
WO2021189977A1 (en) * | 2020-08-31 | 2021-09-30 | 平安科技(深圳)有限公司 | Address coding method and apparatus, and computer device and computer-readable storage medium |
CN112115144A (en) * | 2020-09-15 | 2020-12-22 | 中电科华云信息技术有限公司 | Method for comparing address matching based on standard address matrix weighted mapping |
CN112612863A (en) * | 2020-12-23 | 2021-04-06 | 武汉大学 | Address matching method and system based on Chinese word segmentation device |
CN112612863B (en) * | 2020-12-23 | 2023-03-31 | 武汉大学 | Address matching method and system based on Chinese word segmentation device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109933797A (en) | Geocoding and system based on Jieba participle and address dictionary | |
CN109145169B (en) | Address matching method based on statistical word segmentation | |
CN101350012B (en) | Method and system for matching address | |
CN104866593B (en) | A kind of database search method of knowledge based collection of illustrative plates | |
CN107145577A (en) | Address standardization method, device, storage medium and computer | |
US20030165254A1 (en) | Adapting point geometry for storing address density | |
CN111324679B (en) | Method, device and system for processing address information | |
US20030158661A1 (en) | Programmatically computing street intersections using street geometry | |
CN112612863B (en) | Address matching method and system based on Chinese word segmentation device | |
CN103514235B (en) | A kind of method for building up of incremental code library and device | |
CN109145073A (en) | A kind of address resolution method and device based on segmentation methods | |
US6658356B2 (en) | Programmatically deriving street geometry from address data | |
CN106874287A (en) | A kind of processing method and processing device of point of interest POI geocodings | |
CN108062365B (en) | Method for improving address resolution accuracy | |
CN107463711A (en) | A kind of tag match method and device of data | |
CN112528174A (en) | Address finishing and complementing method based on knowledge graph and multiple matching and application | |
CN107908627A (en) | A kind of multilingual map POI search systems | |
CN116414823A (en) | Address positioning method and device based on word segmentation model | |
CN111522892A (en) | Geographic element retrieval method and device | |
CN114168705B (en) | Chinese address matching method based on address element index | |
Mokhtari et al. | Tagging address queries in maps search | |
CN114201480A (en) | Multi-source POI fusion method and device based on NLP technology and readable storage medium | |
CN110060472A (en) | Road traffic accident localization method, system, readable storage medium storing program for executing and equipment | |
CN101567150A (en) | Method for accurately positioning digital map | |
CN111325235B (en) | Multilingual-oriented universal place name semantic similarity calculation method and application thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |