CN101719128B - Fuzzy matching-based Chinese geo-code determination method - Google Patents
Fuzzy matching-based Chinese geo-code determination method Download PDFInfo
- Publication number
- CN101719128B CN101719128B CN2009101566504A CN200910156650A CN101719128B CN 101719128 B CN101719128 B CN 101719128B CN 2009101566504 A CN2009101566504 A CN 2009101566504A CN 200910156650 A CN200910156650 A CN 200910156650A CN 101719128 B CN101719128 B CN 101719128B
- Authority
- CN
- China
- Prior art keywords
- address
- matching
- chinese
- original
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000011156 evaluation Methods 0.000 claims abstract description 7
- 238000005520 cutting process Methods 0.000 claims description 21
- 230000008878 coupling Effects 0.000 claims description 16
- 238000010168 coupling process Methods 0.000 claims description 16
- 238000005859 coupling reaction Methods 0.000 claims description 16
- 230000000875 corresponding effect Effects 0.000 claims description 15
- 238000009826 distribution Methods 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 6
- 238000003860 storage Methods 0.000 claims description 5
- 244000188472 Ilex paraguariensis Species 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 3
- 238000011002 quantification Methods 0.000 claims description 3
- 230000002596 correlated effect Effects 0.000 claims description 2
- 230000013011 mating Effects 0.000 claims description 2
- 238000013517 stratification Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 238000000605 extraction Methods 0.000 description 5
- 238000004321 preservation Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012732 spatial analysis Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The method discloses a fuzzy matching-based Chinese geo-code determination method, which comprises the following steps: A1, reading descriptive Chinese address information in and adopting a forward maximum searching method to split an original address to obtain an original address element array in a way that the levels of administrative regions are taken as breakpoints; A2, standardizing original address elements through an address dictionary; and A3, reading a standard address tree, adopting a branch-bound algorithm to match the original address element array, simultaneously, utilizing fuzzy rules to control the matching operation, and after acquiring keywords after the original address is split, taking a matching result with the highest evaluation score as the most approximate matching result to obtain a more accurate matched address. The invention provides the fuzzy matching-based Chinese geo-code determination method, which has the advantages of rational address model, relatively higher matching rate and high speed.
Description
Technical field
The present invention relates to a kind of geographic information data processing, computer application field, in particular, a kind of geocoding method based on fuzzy matching.
Background technology
Geocoding is a process of setting up address descriptor and coordinate corresponding relation, that is to say the crossover tool between the description of locus, place and place.Owing to lack the support of effective spatial analysis technology, the analyzing and processing of spatial data can't satisfy the needs of science decision and management, causes the value of spatial data in decision-making management can not embody all the time for a long time.Can realize the fusion of GIS-Geographic Information System and spatial information through matching addresses, promote the city space information-based, and then carry out spatial analysis and decision application more effectively, more easily.
In recent years, along with the continuous development of geographical information technology and perfect, the geocoding technology is also being updated.External research in this respect is comparative maturity; A kind of theory of multi-mode cross bearing has been proposed like Davis; But just to the zone that has the geocoding standard, and a plurality of spatial information database also caused spatial information redundant, reduced matching efficiency; Duncan has proposed homalographic cell Unified coding scheme, but the geocoding standard of Chinese city each department has nothing in common with each other, and the coding criterion of this complicacy is once formation, in case change the large-scale change that involves, cost is too high; People such as Bakshi have proposed a kind of geocoding technology based on text mark splitting scheme; This matching scheme has been obtained effect preferably concerning English address; But it is, therefore also not obvious for the matching addresses effect of Chinese because Chinese typing mode and English exists than big-difference.For domestic, the matching addresses technology is at the early-stage, has only done many work in application facet.Like Beijing " addressing refreshing " of Computer Company longways, the Map Searcher of Founder Digit etc., but this type of application system exists in the application to town problems such as the address model is single, matching rate is high inadequately.
Therefore, existing technology exists defective at the Chinese address encoding context to the town, needs to improve.
Summary of the invention
Single for the address model that overcomes existing Chinese geographic position coding method, matching rate is not high enough, slow-footed deficiency, the present invention provides the Chinese geocoding based on fuzzy matching that a kind of address model is reasonable, matching rate is higher, rapidity is good to confirm method.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of Chinese geocoding based on fuzzy matching is confirmed method, may further comprise the steps:
A1, reading in descriptive Chinese address information, is breakpoint with the administrative area rank, adopts the forward maximum searching method, and original address is carried out cutting, obtains the original address element array;
A2, the original address element is carried out standardization through the address dictionary;
A3, read normal address tree; Adopt branch-bound algorithm; The original address element array is mated: set up the address database of number of addresses storage format, divide, set up tree-shaped address storage tree according to the stratification in china administration district; Highest-ranking administrative area unit is as the root node of number of addresses, and preserve as child node in its subordinate administrative area; Foundation is to address key element and number after the cutting of descriptive Chinese address information; In matching process; At first read normal address tree R, judge that through other key word of highest line political affairs level in the candidate site key element after the cutting, the address node of setting the corresponding administrative grade of R with the normal address matees; Give up uncorrelated branch trees after mating successfully, keep the correlated branch tree and carry out next administrative grade coupling;
Simultaneously, using fuzzy rule controls matching operation: behind the key word after obtaining the original address cutting, also comprise:
Adopt the fuzzy matching rule that matching operation is optimized, the fuzzy matching rule definition is following: the supposition matching field is character string address, and length is h; Criteria field is character string std_address, and length is H; The std_address set that address ∩ std_address ≠ Φ is satisfied in definition is the set of Satisfying Matching Conditions; Wherein, Address ∩ std_address ≠ Φ representes that character string address and criteria field character string std_address occur simultaneously not for empty, keep the high set element of degree of membership at last; Define following matched rule:
1. i character is identical among standard characters std_address and the matched character string address, and then degree of membership is i/H;
2. standard characters std_address comprises matched character string address, and then degree of membership is 1;
Obtain after the degree of membership, set μ and be the coupling degree of membership, be converted into the quantification score value according to mapping ruler f:sc → μ, mapping function: f (μ)=10 * μ, with the evaluation score of sc as this candidate record;
The most close matching result of conduct that evaluation score is the highest promptly obtains more accurate match address.
As preferred a kind of scheme: said Chinese geocoding confirms that method also comprises:
If the number that the A4 match address comprises is carried out space orientation: set the urban road number with following regular distribution: according to the both sides of odd or even number regular distribution in road, be odd numbers just to the left, the right side is an even numbers; Be odd numbers just to the right, the left side is an even numbers; Record road flex point number with and geographic coordinate information; After obtaining the number information in the original address, judge to be between any two flex points, suppose that the match address number is between flex point A, B; With A, B is reference point; Carry out the least square method linear interpolation, obtain the particular geographic coordinates that this number is positioned at road, navigate to map at last.
Further, in the said steps A 3, through normalizing operation, the candidate site array define of obtaining after the original address standardization is address [i], 0<i<N; The normal address node is made as sc with the coupling score value of corresponding level candidate element
i, i representes the affiliated level of this node, N representes the degree of depth of initial address tree; Coupling is passed judgment on rule as follows:
Rule 1: number of addresses node and candidate's element accurately mate, Y → accurately mate N → fuzzy matching;
Rule 2: accurately search feasible solution after the coupling, Y → matching algorithm moves down, N → return the upper level node to search approximate solution;
Rule 3: judge whether to exist default, Y → preservation upper level branch trees, N → preservation is when the prime branch trees;
Rule 4: judge whether to exist default, sc
i=0, i is default the place number of plies;
Rule 5: the candidate record final score is its each layer node matching score sum:
sc=∑sc
i。
Further again, in the said steps A 3, auxiliary geographical name data bank is set, use comparatively frequent geographic position to build the storehouse separately simultaneously for having the important of the second characteristic identity.
In steps A 1, the original address that obtains is a starting point with first character of original address; Address database search is searched corresponding normal address title, exist and then read the address information reservation, simultaneously this character is excised in the original address character string; Otherwise read next character and last character composition character string; Corresponding normal address title is searched in continuation in address database, read successively, confirms the address key element of all administrative grades.
In steps A 2, if there be default in the candidate site array after the cutting,, obtain its higher level address at address database according to other address element of next stage, write in the candidate site key element array.
In steps A 2, be called for short the design address, the another name information database, preserves the specialized information database of current all normal address information and its another name, abbreviation.
In steps A 2; The wrongly written or mispronounced characters error correction of the address element after the cutting; Suppose in the address information of typing and have wrongly written or mispronounced characters; Be address element after the cutting can't find complete correspondence in the dictionary of address normal address title, get the normal address title the most close and return, and replace the address information of typing with the address information of typing.
Technical conceive of the present invention is: at first obtain original typing address information, adopt then and divide word algorithm that the original address of words input is carried out cutting, obtain the description key word with the corresponding locus of original address; The normal address data in city are pitched tree-like formula with K stores; Wherein the K value is by the concrete quantity decision of each rank administrative unit; Key word to obtaining matees in the tree of normal address; Adopt branch-bound algorithm that matching algorithm is optimized in the matching process, use simultaneously that fuzzy rule is accurately controlled matching operation and, obtain at least one and conform to fully with original address or be similar to the address information that conforms to the matching result screening of marking.Application has reduced the scale of number of addresses based on the branch-and-bound matching algorithm of tree-shaped address information memory module, has optimized the algorithm complex of matching addresses process, has improved the efficient and the accuracy rate of address.
Beneficial effect of the present invention mainly shows: the present invention has optimized the algorithm complex of geocoding process, has improved the efficient and the accuracy rate of geocoding.
Description of drawings
The Chinese geocoding that Fig. 1 is based on fuzzy matching is confirmed the process flow diagram of method.
Fig. 2 is the synoptic diagram of normal address tree.
Fig. 3 is the synoptic diagram of matched rule.
Fig. 4 is the synoptic diagram of the odd or even number regular distribution of road.
Fig. 5 loads the initial address tree, and mate accurately successfully that the back extracts with " Zhejiang " is the branch trees of root node, deletes the synoptic diagram that invalid branch is set.
Fig. 6 judges address [2]=" Hangzhou ", accurately mate successfully after, extraction is the branch trees of root node with " Hangzhou "; Judge address [3]=" East Lake " again, accurately mate successfully after, extraction is the synoptic diagram of the branch trees of root node with " East Lake ".
Fig. 7 judges address [4]=" staying "; Current branch trees does not have feasible solution, returns the father node in current root node " East Lake ", launches the fuzzy matching pattern; Be met the branch trees of part matching condition, mate the synoptic diagram that keyword " stays " again.
Fig. 8 judges address [5]=" stay and close "; The child node of current branch trees root node can't accurately mate, and starts the fuzzy matching pattern, obtains part coupling branch trees; Judge address [6]=" 288 ", the synoptic diagram that all part coupling branch trees are mated.
Embodiment
Below in conjunction with accompanying drawing the present invention is further described.
With reference to Fig. 1~Fig. 8,
A kind of Chinese geocoding method based on fuzzy matching, as shown in Figure 1, wherein comprise following steps:
A1, reading in descriptive Chinese address information, is breakpoint with the administrative area rank, adopts the forward maximum searching method, and original address is carried out cutting, obtains the original address element array.A2, the original address element is carried out standardization through the address dictionary, obtain through being called for short or another name is corrected, misspelling is revised, address element array behind default normalizing operation such as filling.A3, read normal address tree, adopt branch-bound algorithm, the original address element array is mated, use fuzzy rule simultaneously matching operation is controlled, obtain more accurate match address.A4, the number that comprises for match address adopt flex point to carry out space orientation with reference to interpolation algorithm.
Described method, wherein, in steps A 1, to Chinese address information, with reference to china administration area dividing standard, established standards typing pattern:
Administrative address pattern: province (municipality directly under the Central Government) → city → district (county, county-level city); Regional address pattern: street (town) → village (road) term position → number.Like normal address information: Hangzhou, Zhejiang province city Xihu District stays the town and stays and No. 288, North Road.
Described method, wherein, in steps A 1; The original address that obtains is a starting point with first character of original address, and address database search is searched corresponding normal address title; Exist and then read the address information reservation; Simultaneously this character is excised in the original address character string, otherwise read next character and last character composition character string, continue the corresponding normal address of search title in address database.Read successively, confirm the address key element of all administrative grades.
Described method wherein, in steps A 2, if there be default in the candidate site array after the cutting, according to other address element of next stage, is obtained its higher level address at address database, writes in the candidate site key element array.
Described method, wherein, in steps A 2, be called for short the design address, the another name information database, preserves the specialized information database of current all normal address information and its another name, abbreviation.If there is another name in the candidate site after the cutting or is called for short, distinguish and it be standardized as standard name that as " Shandong " is standardized as " Shandong ", " Shanghai " is standardized as " Shanghai ".
Described method; Wherein, in steps A 2, the wrongly written or mispronounced characters error correction of the address element after the cutting; Suppose in the address information of typing and have wrongly written or mispronounced characters; Be address element after the cutting can't find complete correspondence in the dictionary of address normal address title, get the normal address title the most close and return, and replace the address information of typing with the address information of typing.Like typing " Liu Helu ", do not exist in the dictionary of address " Liu Helu ", only there be " Liu Helu ", get " Liu Helu " replacement " Liu Helu ".
Described method, wherein, in steps A 3; Comprise following steps, read address database, and address database is stored with the number of addresses form; Highest-ranking administrative area unit is as the root node of number of addresses, and preserve as child node in its subordinate administrative area, as shown in Figure 2.
Described method, wherein, in steps A 3; Also comprise following steps; Under address information tree-like storage prerequisite, adopt branch-bound algorithm that matching process is optimized, the address information of corresponding level during promptly at first other key word of highest line political affairs level in the matching candidate address element is set with corresponding address; Mate the matched nodes and the branch trees thereof that successfully then keep in the corresponding address tree, give up other uncorrelated address information node at the same level and branch trees thereof.Through normalizing operation, the candidate site array define of obtaining after the original address standardization is address [i], 0<i<N.The normal address node is made as sc with the coupling score value of corresponding level candidate element
i, i representes the affiliated level of this node, N representes the degree of depth of initial address tree.Coupling is passed judgment on rule as follows:
Rule 1: number of addresses node and candidate's element accurately mate, Y → accurately mate N → fuzzy matching;
Rule 2: accurately search feasible solution after the coupling, Y → matching algorithm moves down, N → return the upper level node to search approximate solution;
Rule 3: judge whether to exist default, Y → preservation upper level branch trees, N → preservation is when the prime branch trees;
Rule 4: judge whether to exist default, sc
i=0, i is default the place number of plies;
Rule 5: the candidate record final score is its each layer node matching score sum:
sc=∑sc
i
Described method wherein, in steps A 3, also comprises following steps, uses fuzzy rule control matching operation, if can't mate achievement fully for address information node at the same level in the number of addresses, then launches fuzzy rule, obtains the approximate match result.Like typing key word at county level is " East Lake ", and only there be " West Lake " in node at county level in the number of addresses, then obtains node " West Lake " and branch trees thereof to keep as matching result, gives up other node at the same level and branch trees thereof.
Described method wherein, in steps A 3, also comprises following steps, and matching result is quantized scoring.Coupling is given different score values with approximate match fully, and the most close matching result of conduct that score value is high returns, and the comparatively close matching result of the conduct that score value is low returns.Quantizing rule is following:
Suppose that matching field is character string address, length is h; Criteria field is character string std_address, and length is H.The std_address set that address ∩ std_address ≠ Φ is satisfied in definition is the set of Satisfying Matching Conditions; Wherein, Address ∩ std_address ≠ Φ representes that character string address and criteria field character string std_address occur simultaneously not for empty, keep the high set element of degree of membership at last.Define following matched rule Fig. 3):
1. i character is identical among standard characters std_address and the matched character string address, and then degree of membership is i/H;
2. standard characters std_address comprises matched character string address, and then degree of membership is 1.
Obtain after the degree of membership, set μ and be the coupling degree of membership, be converted into the quantification score value according to mapping ruler f:sc → μ, mapping function: f (μ)=1O * μ, with the evaluation score of sc as this candidate record.
Described method; Wherein, In steps A 3, also comprise following steps, auxiliary geographical name data bank is set; Having the important of the second characteristic identity for some uses comparatively frequent geographic position to build the storehouse separately simultaneously; The second characteristic identity like " Hangzhou, Zhejiang province city Xihu District stays the town and stays and No. 288, road " is " Zhejiang Polytechnical University Ping Feng school district ",, then directly navigates to the geographic position of " Hangzhou, Zhejiang province city Xihu District stays to press down and stays and No. 288, road " if typing original address information be " Zhejiang Polytechnical University Ping Feng school district ".
Described method wherein, in steps A 4, comprises following steps, obtain final matching results after, carry out space interpolation location according to number information.If there is not number information, then navigate to the region geometry center of the minimum administrative unit of original address information, be accurate to the street like original address information, then with the geometric space center of location positioning to this street.If there is number information, sets road and set the urban road number with following regular distribution: according to the both sides of odd or even number regular distribution in road: be odd numbers just to the left, the right side is an even numbers; Be odd numbers just to the right, the left side is even numbers (Fig. 4).Record road flex point number with and geographic coordinate information; After obtaining the number information in the original address, judge to be between any two flex points, suppose that the match address number is between flex point A, B; With A, B is reference point; Carry out the least square method linear interpolation, obtain the particular geographic coordinates that this number is positioned at road, last space and geographical coordinate setting is to map.
Branch-and-bound matching algorithm average time complexity based on tree-shaped address information memory module among the present invention is log
K N, wherein N representes the leafy node number of K fork number of addresses.
In the present embodiment, set original typing address information and after cutting, obtain candidate site array address [] (table 1) for " Hangzhou, Zhejiang province city Donghu District stays to press down to stay and closes the road No. 288 " original address.
Table 1 candidate site array
Level | Economize | The city | The district | The town | The road | Number |
Codomain | Zhejiang | Hangzhou | East Lake | Stay | Stay and close | 288 |
Consider better expression algorithm thought, add some in the match address tree and upset data that matching process is following behind the introducing branch and bound algorithms:
Step1: load the initial address tree, judge address [1]=" Zhejiang ", accurately mate successfully after; Extraction is the branch trees of root node with " Zhejiang "; Deletion invalid branch tree, wherein sc representes the PTS after each node and candidate site speech section are mated, and is as shown in Figure 5.
Step2: judge address [2]=" Hangzhou ", accurately mate successfully after, extraction is the branch trees of root node with " Hangzhou ".Judge address [3]=" East Lake ", accurately mate successfully after, extraction is the branch trees of root node with " East Lake ", and is as shown in Figure 6.
Step3: judge address [4]=" staying ", current branch trees does not have feasible solution, returns the father node in current root node " East Lake "; Launch the fuzzy matching pattern; Be met the branch trees of part matching condition, mate keyword again and " stay ", as shown in Figure 7.
Step4: judge address [5]=" stay and close ", the child node of current branch trees root node can't accurately mate, and starts the fuzzy matching pattern; Obtain part coupling branch trees; Judge address [6]=" 288 ", all part coupling branch trees are mated, and are as shown in Figure 8.
After all speech section couplings were accomplished in the candidate site array, the last evaluation score that each address is write down sorted, and the address record that obtains marking the highest returns as final matching results, shown in Fig. 9 solid line part.
Step5: obtain number information, read the geographical information in final matching address information Middle St road, comprise flex point number data, as shown in Figure 9.Judge that initial number " No. 288 " is positioned between flex point A " No. 268 " and the flex point B " No. 296 ".With flex point A, B is that reference point carries out the least square method interpolation, obtains the locus of original number in the street, sees " * " position among Figure 10.
What more than set forth is the good optimization effect that a embodiment that the present invention provides shows; Obviously the present invention not only is fit to the foregoing description, can do many variations to it under the prerequisite of the related content of flesh and blood of the present invention and implements not departing from essence spirit of the present invention and do not exceed.
Claims (7)
1. the Chinese geocoding based on fuzzy matching is confirmed method, it is characterized in that: said Chinese geocoding confirms that method may further comprise the steps:
A1, reading in descriptive Chinese address information, is breakpoint with the administrative area rank, adopts the forward maximum searching method, and original address is carried out cutting, obtains the original address element array;
A2, the original address element is carried out standardization through the address dictionary;
A3, read normal address tree; Adopt branch-bound algorithm; The original address element array is mated: set up the address database of number of addresses storage format, divide, set up tree-shaped address storage tree according to the stratification in china administration district; Highest-ranking administrative area unit is as the root node of number of addresses, and preserve as child node in its subordinate administrative area; Foundation is to address key element and number after the cutting of descriptive Chinese address information; In matching process; At first read normal address tree R, judge that through other key word of highest line political affairs level in the candidate site key element after the cutting, the address node of setting the corresponding administrative grade of R with the normal address matees; Give up uncorrelated branch trees after mating successfully, keep the correlated branch tree and carry out next administrative grade coupling;
Simultaneously, using fuzzy rule controls matching operation: behind the key word after obtaining the original address cutting, also comprise:
Adopt the fuzzy matching rule that matching operation is optimized, the fuzzy matching rule definition is following: the supposition matching field is character string address, and length is h; Criteria field is character string std_address, and length is H; The std_address set that address ∩ std_address ≠ Φ is satisfied in definition is the set of Satisfying Matching Conditions; Wherein, Address ∩ std_address ≠ Φ representes that character string address and criteria field character string std_address occur simultaneously not for empty, keep the high set element of degree of membership at last; Define following matched rule:
1. i character is identical among standard characters std_address and the matched character string address, and then degree of membership is i/H;
2. standard characters std_address comprises matched character string address, and then degree of membership is 1;
Obtain after the degree of membership, setting μ is a degree of membership, is converted into the quantification score value according to mapping ruler f:sc → μ, mapping function: f (μ)=10 * μ, with the evaluation score of sc as this candidate record;
The most close matching result of conduct that evaluation score is the highest promptly obtains more accurate match address.
2. a kind of Chinese geocoding based on fuzzy matching as claimed in claim 1 is confirmed method, and it is characterized in that: said Chinese geocoding confirms that method also comprises:
If the number that the A4 match address comprises is carried out space orientation: set the urban road number with following regular distribution: according to the both sides of odd or even number regular distribution in road, be odd numbers just to the left, the right side is an even numbers; Be odd numbers just to the right, the left side is an even numbers; Record road flex point number with and geographic coordinate information; After obtaining the number information in the original address, judge to be between any two flex points, suppose that the match address number is between flex point A, B; With A, B is reference point; Carry out the least square method linear interpolation, obtain the particular geographic coordinates that this number is positioned at road, navigate to map at last.
3. according to claim 1 or claim 2 a kind of Chinese geocoding based on fuzzy matching is confirmed method; It is characterized in that: in the said steps A 3; Auxiliary geographical name data bank is set, uses comparatively frequent geographic position to build the storehouse separately simultaneously for having the important of the second characteristic identity.
4. according to claim 1 or claim 2 a kind of Chinese geocoding based on fuzzy matching is confirmed method, it is characterized in that: in steps A 1, and the original address that obtains; First character with original address is a starting point; Address database search is searched corresponding normal address title, exist and then read the address information reservation, simultaneously this character is excised in the original address character string; Otherwise read next character and last character composition character string; Corresponding normal address title is searched in continuation in address database, read successively, confirms the address key element of all administrative grades.
5. according to claim 1 or claim 2 a kind of Chinese geocoding based on fuzzy matching is confirmed method; It is characterized in that: in steps A 2; If there be default in the candidate site array after the cutting; According to other address element of next stage, obtain its higher level address at address database, write in the candidate site key element array.
6. a kind of Chinese geocoding based on fuzzy matching as claimed in claim 5 is confirmed method; It is characterized in that: in steps A 2; Be called for short the design address, the another name information database, preserves the specialized information database of current all normal address information and its another name, abbreviation.
7. a kind of Chinese geocoding based on fuzzy matching as claimed in claim 6 is confirmed method; It is characterized in that: in steps A 2; The wrongly written or mispronounced characters error correction of the address element after the cutting supposes in the address information of typing to have wrongly written or mispronounced characters that promptly the address element after the cutting can't find the normal address title of complete correspondence in the dictionary of address; Get the normal address title the most close and return, and replace the address information of typing with the address information of typing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009101566504A CN101719128B (en) | 2009-12-31 | 2009-12-31 | Fuzzy matching-based Chinese geo-code determination method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009101566504A CN101719128B (en) | 2009-12-31 | 2009-12-31 | Fuzzy matching-based Chinese geo-code determination method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101719128A CN101719128A (en) | 2010-06-02 |
CN101719128B true CN101719128B (en) | 2012-05-23 |
Family
ID=42433702
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2009101566504A Expired - Fee Related CN101719128B (en) | 2009-12-31 | 2009-12-31 | Fuzzy matching-based Chinese geo-code determination method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101719128B (en) |
Families Citing this family (58)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102298585B (en) * | 2010-06-24 | 2016-01-13 | 高德软件有限公司 | A kind of address cutting and rank mask method and address cutting and rank annotation equipment |
CN102402533A (en) * | 2010-09-13 | 2012-04-04 | 方正国际软件有限公司 | Address matching method and system |
CN102446186B (en) * | 2010-10-13 | 2016-03-30 | 上海众恒信息产业股份有限公司 | Chinese geocoding and coding/decoding method and device |
CN101996247B (en) * | 2010-11-10 | 2013-02-20 | 百度在线网络技术(北京)有限公司 | Method and device for constructing address database |
CN102024024B (en) * | 2010-11-10 | 2013-07-10 | 百度在线网络技术(北京)有限公司 | Method and device for constructing address database |
CN101980208A (en) * | 2010-11-10 | 2011-02-23 | 百度在线网络技术(北京)有限公司 | Address query method and system |
CN102169498A (en) * | 2011-04-14 | 2011-08-31 | 中国测绘科学研究院 | Address model constructing method and address matching method and system |
CN102289467A (en) * | 2011-07-22 | 2011-12-21 | 浙江百世技术有限公司 | Method and device for determining target site |
CN102955832B (en) * | 2011-08-31 | 2015-11-25 | 深圳市华傲数据技术有限公司 | A kind of address identification, standardized system |
CN102393937A (en) * | 2011-10-12 | 2012-03-28 | 深圳市络道科技有限公司 | Address matching method and system of address tree based on backward production |
CN103383682B (en) * | 2012-05-01 | 2017-12-26 | 刘龙 | A kind of Geocoding, position enquiring system and method |
CN102880650B (en) * | 2012-08-27 | 2015-11-18 | 中国工商银行股份有限公司 | A kind of data matching method and device |
CN103413215B (en) * | 2013-07-12 | 2017-02-08 | 广州银联网络支付有限公司 | Electronic bank code matching method based on matrix similarity algorithm |
CN103440311A (en) * | 2013-08-27 | 2013-12-11 | 深圳市华傲数据技术有限公司 | Method and system for identifying geographical name entities |
US20150094090A1 (en) * | 2013-09-30 | 2015-04-02 | Samsung Electronics Co., Ltd. | Caching of locations on a device |
CN103558926A (en) * | 2013-11-12 | 2014-02-05 | 金蝶软件(中国)有限公司 | Geographical name entry method and geographical name entry device |
CN103593468B (en) * | 2013-11-27 | 2016-11-16 | 北京金和软件股份有限公司 | A kind of audio content method for pushing |
CN104021184B (en) * | 2014-06-10 | 2017-07-11 | 广州品唯软件有限公司 | A kind of localization method and system |
CN104092613A (en) * | 2014-07-15 | 2014-10-08 | 山东超越数控电子有限公司 | Rapid table lookup method based on fuzzy matching |
CN104182509A (en) * | 2014-08-20 | 2014-12-03 | 国家电网公司 | Object-oriented address modeling method |
CN104182510A (en) * | 2014-08-20 | 2014-12-03 | 国家电网公司 | Object-oriented address modeling method |
CN105528372B (en) | 2014-09-30 | 2019-05-24 | 华为技术有限公司 | A kind of address search method and equipment |
CN105760360B (en) * | 2014-12-16 | 2018-09-11 | 高德软件有限公司 | A kind of address correcting method and device |
CN106156145A (en) * | 2015-04-13 | 2016-11-23 | 阿里巴巴集团控股有限公司 | The management method of a kind of address date and device |
CN106296209B (en) * | 2015-06-05 | 2021-02-02 | 菜鸟智能物流控股有限公司 | Address input control method and device |
CN106055635B (en) * | 2016-05-30 | 2019-11-19 | 深圳市华傲数据技术有限公司 | Address information lookup method and device |
CN106502978A (en) * | 2016-09-19 | 2017-03-15 | 浪潮软件股份有限公司 | A kind of Chinese address segmenting method and device |
CN106649464B (en) * | 2016-09-26 | 2019-08-30 | 深圳市数字城市工程研究中心 | A kind of construction method and device of Chinese address tree |
CN106528605A (en) * | 2016-09-27 | 2017-03-22 | 武汉工程大学 | A rule-based Chinese address resolution method |
CN106874384B (en) * | 2017-01-10 | 2020-12-04 | 航天精一(广东)信息科技有限公司 | Heterogeneous address standard conversion and matching method |
CN106709065B (en) * | 2017-01-19 | 2020-08-04 | 国家电网公司 | Address information standardization processing method and device |
CN106875264A (en) * | 2017-03-31 | 2017-06-20 | 北京京东尚科信息技术有限公司 | Sequence information management method, device and order sorting system |
CN109255564B (en) * | 2017-07-13 | 2022-09-06 | 菜鸟智能物流控股有限公司 | Pick-up point address recommendation method and device |
CN107748778B (en) * | 2017-10-20 | 2021-03-23 | 浪潮软件股份有限公司 | Method and device for extracting address |
CN108369582B (en) * | 2018-03-02 | 2021-06-25 | 福建联迪商用设备有限公司 | Address error correction method and terminal |
CN108959244B (en) * | 2018-06-07 | 2022-08-09 | 北京京东尚科信息技术有限公司 | Address word segmentation method and device |
CN109254964A (en) * | 2018-08-20 | 2019-01-22 | 中国平安人寿保险股份有限公司 | Address Standardization method, apparatus, computer equipment and storage medium |
CN110895651B (en) * | 2018-08-23 | 2024-02-02 | 京东科技控股股份有限公司 | Address standardization processing method, device, equipment and computer readable storage medium |
CN109344213B (en) * | 2018-08-28 | 2021-06-18 | 浙江工业大学 | Chinese geocoding method based on dictionary tree |
CN111414357A (en) * | 2019-01-07 | 2020-07-14 | 阿里巴巴集团控股有限公司 | Address data processing method, device, system and storage medium |
CN109784308B (en) * | 2019-02-01 | 2020-09-29 | 腾讯科技(深圳)有限公司 | Address error correction method, device and storage medium |
CN110099246A (en) * | 2019-02-18 | 2019-08-06 | 深度好奇(北京)科技有限公司 | Monitoring and scheduling method, apparatus, computer equipment and storage medium |
CN109933797A (en) * | 2019-03-21 | 2019-06-25 | 东南大学 | Geocoding and system based on Jieba participle and address dictionary |
CN110674367B (en) * | 2019-09-09 | 2022-02-01 | 广州易起行信息技术有限公司 | Single Chinese character retrieval method and device based on travel industry products |
CN110704564B (en) * | 2019-09-27 | 2024-09-24 | 北京沃东天骏信息技术有限公司 | Address error correction method and device |
CN112925922A (en) * | 2019-12-06 | 2021-06-08 | 农业农村部信息中心 | Method, device, electronic equipment and medium for obtaining address |
CN111144117B (en) * | 2019-12-26 | 2023-08-29 | 同济大学 | Method for disambiguating Chinese address of knowledge graph |
CN111291277A (en) * | 2020-01-14 | 2020-06-16 | 浙江邦盛科技有限公司 | Address standardization method based on semantic recognition and high-level language search |
CN111753515B (en) * | 2020-06-24 | 2024-07-02 | 广东科杰通信息科技有限公司 | Address information extraction and matching method for realizing entity positioning |
CN111859849B (en) * | 2020-07-01 | 2023-11-24 | 邦道科技有限公司 | Management method and device for electricity utilization address |
CN112052413B (en) * | 2020-08-28 | 2024-02-13 | 上海谋乐网络科技有限公司 | URL fuzzy matching method, device and system |
CN112364113A (en) * | 2020-11-13 | 2021-02-12 | 北京明略软件系统有限公司 | Address error correction method and system |
CN112417179A (en) * | 2020-11-23 | 2021-02-26 | 杭州橙鹰数据技术有限公司 | Address processing method and device |
CN113204606A (en) * | 2021-04-30 | 2021-08-03 | 武汉大学 | Address position presumption method based on semantic position network |
CN113656450A (en) * | 2021-07-12 | 2021-11-16 | 大箴(杭州)科技有限公司 | Address processing method and device, electronic equipment and storage medium |
CN114091454A (en) * | 2021-11-29 | 2022-02-25 | 重庆市地理信息和遥感应用中心 | Method for extracting place name information and positioning space in internet text |
CN116910386B (en) * | 2023-09-14 | 2024-02-02 | 深圳市智慧城市科技发展集团有限公司 | Address completion method, terminal device and computer-readable storage medium |
CN117874309B (en) * | 2024-03-12 | 2024-05-24 | 北京全路通信信号研究设计院集团有限公司 | Train control data processing method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101110081A (en) * | 2007-08-21 | 2008-01-23 | 北京大学 | Method for extracting entity address message in text context |
CN101350012A (en) * | 2007-07-18 | 2009-01-21 | 北京灵图软件技术有限公司 | Method and system for matching address |
CN101350013A (en) * | 2007-07-18 | 2009-01-21 | 北京灵图软件技术有限公司 | Method and system for searching geographical information |
CN101393544A (en) * | 2008-10-07 | 2009-03-25 | 南京师范大学 | Chinese address semantic parsing method facing address encode |
-
2009
- 2009-12-31 CN CN2009101566504A patent/CN101719128B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101350012A (en) * | 2007-07-18 | 2009-01-21 | 北京灵图软件技术有限公司 | Method and system for matching address |
CN101350013A (en) * | 2007-07-18 | 2009-01-21 | 北京灵图软件技术有限公司 | Method and system for searching geographical information |
CN101110081A (en) * | 2007-08-21 | 2008-01-23 | 北京大学 | Method for extracting entity address message in text context |
CN101393544A (en) * | 2008-10-07 | 2009-03-25 | 南京师范大学 | Chinese address semantic parsing method facing address encode |
Also Published As
Publication number | Publication date |
---|---|
CN101719128A (en) | 2010-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101719128B (en) | Fuzzy matching-based Chinese geo-code determination method | |
CN103914544A (en) | Method for quickly matching Chinese addresses in multi-level manner on basis of address feature words | |
CN108369582B (en) | Address error correction method and terminal | |
WO2016165538A1 (en) | Address data management method and device | |
CN112528174B (en) | Address trimming and complementing method based on knowledge graph and multiple matching and application | |
CN104679801B (en) | A kind of interest point search method and device | |
CN102375807A (en) | Method and device for proofing characters | |
CN103440311A (en) | Method and system for identifying geographical name entities | |
CN1590964A (en) | Iterative logical renewal of navigable map database | |
CN109933797A (en) | Geocoding and system based on Jieba participle and address dictionary | |
CN103377237B (en) | The neighbor search method of high dimensional data and fast approximate image searching method | |
CN103345496A (en) | Multimedia information searching method and system | |
CN105209858A (en) | Non-deterministic disambiguation and matching of business locale data | |
CN104346438A (en) | Data management service system based on large data | |
CN104376112A (en) | Road network space keyword search method | |
CN108009265B (en) | Spatial data indexing method in cloud computing environment | |
CN111291099B (en) | Address fuzzy matching method and system and computer equipment | |
CN103970842A (en) | Water conservancy big data access system and method for field of flood control and disaster reduction | |
CN104391908A (en) | Locality sensitive hashing based indexing method for multiple keywords on graphs | |
CN114780680A (en) | Retrieval and completion method and system based on place name and address database | |
CN111311173A (en) | National county level unit economic arrangement and spatialization method | |
CN102999548B (en) | Geographical name data extended method and device in electronic chart | |
CN113505190B (en) | Address information correction method, device, computer equipment and storage medium | |
Machanavajjhala et al. | Collective extraction from heterogeneous web lists | |
CN103150632B (en) | Flood control based on water conservation cloud platform is taked precautions against drought the construction method of bulletin generation system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20120523 |