CN108052609A - A kind of address matching method based on dictionary and machine learning - Google Patents
A kind of address matching method based on dictionary and machine learning Download PDFInfo
- Publication number
- CN108052609A CN108052609A CN201711332274.0A CN201711332274A CN108052609A CN 108052609 A CN108052609 A CN 108052609A CN 201711332274 A CN201711332274 A CN 201711332274A CN 108052609 A CN108052609 A CN 108052609A
- Authority
- CN
- China
- Prior art keywords
- address
- module
- dictionary
- matching
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Remote Sensing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of address matching methods based on dictionary and machine learning, including address resolution module, Address Standardization module, address matching module and address screening module, described address parsing module is used for the address information that will be inputted and carries out address resolution, address date after parsing is input to Address Standardization module by standardization, address information that treated is input in address matching module and is matched, and the address information matched obtains final normal address information using the processing of address screening module;The present invention relates to information technology fields;Compared with other existing fuzzy matching based on address dictionary cutting, the method for the fuzzy address dictionary matching that the present invention uses is more flexible, address member dictionary need not be accumulated, avoid the problem that excessive manpower go to safeguard also effectively avoided while the dictionary of address because address information changes and the update of address dictionary caused matching rate declines not in time.
Description
Technical field
The present invention relates to information technology fields, are specially a kind of address matching method based on dictionary and machine learning.
Background technology
In mass text mining process in public security industry, it is often necessary to find map where the address in case information
Position and the distance between address, to improve the visuality of crime address and calculate relevance.This is just needed known
In the case of location, by finding its normal address and corresponding longitude and latitude with the comparison of normal address storehouse, then reflected by longitude and latitude
It is mapped on map and calculates the distance between two addresses.But in actual project application, normal address storehouse is general
There is normal address information more than million or even millions, if untreated to the address information of input directly match
Operation, the matched accuracy rate that can have not only caused huge time cost but also can cause be not high.
So under big data background, a kind of quickly and effectively address matching method will promote artificial intelligence in natural language
The sector application in speech field.
The content of the invention
Main problem to be solved by this invention is to provide a kind of address matching method based on dictionary and machine learning, from
Accurate or immediate address information is quickly matched in the database of normal address, so as to extract the corresponding warp in the address
Latitude.
Technical solution
In order to achieve the above object, the present invention is achieved by the following technical programs:One kind is based on dictionary and machine learning
Address matching method, including address resolution module, Address Standardization module, address matching module and address screening module, institute
The address information that address resolution module is stated for that will input carries out address resolution, and the address date after parsing is input to address standard
Change module by standardization, address information that treated is input in address matching module and is matched, the ground matched
The processing of location Information Pull address screening module obtains final normal address information.
As present invention further optimization scheme, in address resolution module, using address dictionary to the address of input
Information is parsed successively according to district, small towns, village's group, neighbourhood committee, cell, building building.
As present invention further optimization scheme, in Address Standardization module, pass through the address information to parsing
Filling and correction process are standardized, then the address information after processing is input in address matching module.
As present invention further optimization scheme, in address matching module, the address after standardization is believed
Breath, by way of the query criteria address database that successively decreases, finds out close a plurality of address data information, and the address found is believed
Breath data are passed in the screening module of address.
As present invention further optimization scheme, in the screening module of address, to inquiring address information data, first
The address of editing distance minimum is found out using the method for smallest edit distance, if there is multiple smallest edit distances, then using remaining
String theorem is to its COS distance of the address calculation of these smallest edit distances, the address of the maximum COS distance of return.
Advantageous effect
The main feature of the present invention:
1st, due to adding in Address Standardization processing procedure the matching accuracy rate of address is greatly improved.
2nd, address matching speed is caused faster using fuzzy matching algorithm.
3rd, using smallest edit distance and cosine similarity, accurate and unique normal address is filtered out.
The present invention has the following advantages that compared with prior art and advantageous effect:
First, compared with other existing fuzzy matching based on address dictionary cutting, the degression type mould of the invention used
Matching algorithm is pasted, has the advantages that speed is fast, accuracy rate is high, simultaneously because adding the processing of address screening module so that sieve
The address information selected is more accurate.
Secondly, compared with existing address book matching process, the method for the fuzzy address dictionary matching that the present invention uses is more
Add flexibly, address member dictionary need not be accumulated, excessive manpower is avoided to go to safeguard while the dictionary of address and is also effectively avoided
Because address information variation, address dictionary updates the problem of matching rate caused not in time declines.
Description of the drawings
Fig. 1 is the flow chart of the address matching of the present invention;
Fig. 2 is the address resolution module flow chart of the present invention;
Fig. 3 is the address matching module flow diagram of the present invention;
Fig. 4 is the address screening module flow chart of the present invention;
In figure:1- address resolution modules, 2- Address Standardizations module, 3- address matchings module, 4- addresses screening module.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, below in conjunction with attached drawing 1-4 and implementation
Example, the present invention is further described in detail.It should be noted that specific embodiment as used herein is only used for explaining
Illustrate the present invention, be not intended to limit the present invention.
It is of the invention that mainly there are four implementation steps:
Step 1:Load address parsing module solves the address of input using the address element dictinary information of loading
Analysis, parses address according to district, small towns, village's group, neighbourhood committee, cell, building building.Such as address " Jiangsu Xiuqian City Su Yu
County Shun He towns brewery neighbourhood committee XXXX ", the address information parsed is district:Su Yuxian, small towns:Shun He towns, neighbourhood committee:Slot
Mill neighbourhood committee.
Step 2:Load address standardized module carries out correction process to the address information parsed and standardization is filled
Operation makes address date become unified and is operated convenient for subsequent match.For example, people's ordinary practice will " warp in being write in address
Ji technological development zone " is abbreviated as " economic development zone ", is filled with " economic and technological development zone " after Address Standardization processing so that
Data format in the address and normal address storehouse is consistent, and ensures the accuracy of data.
Step 3:Load address matching module, the address information after standardization, composition querying condition is to study plot
Location carries out quick search, if the address of inquiry is sky, successively decreases to querying condition, continues query criteria address base, directly
Until address is inquired, while return to the normal address information inquired.
Step 4:Load address screening module, the normal address information matched may contain one or several addresses
Information from the normal address information matched, it is necessary to find out most accurate or immediate unique address.First using minimum
The computational methods of editing distance screen a plurality of information, find out the address of editing distance minimum.If the minimum volume matched
It is volume unique apart from address, then directly return to the normal address.If the smallest edit distance address matched is not unique, using remaining
String similitude continue the several normal addresses identical to smallest edit distance and address to be matched calculate successively cosine similarity away from
From therefrom filtering out minimum COS distance, while return to the unique address.
It should be noted that herein, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to
Non-exclusive inclusion, so that process, method, article or equipment including a series of elements not only will including those
Element, but also including other elements that are not explicitly listed or further include as this process, method, article or equipment
Intrinsic element.In the absence of more restrictions.
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with
Understanding without departing from the principles and spirit of the present invention can carry out these embodiments a variety of variations, modification, replace
And modification, the scope of the present invention is defined by the appended.
Claims (5)
1. a kind of address matching method based on dictionary and machine learning, including address resolution module (1), Address Standardization module
(2), address matching module (3) and address screening module (4), which is characterized in that described address parsing module (1) is used to input
Address information carry out address resolution, the address date after parsing is input to Address Standardization module (2) by standardization,
Address information that treated is input in address matching module (3) and is matched, and the address information matched is screened using address
The processing of module (4) obtains final normal address information.
2. a kind of address matching method based on dictionary and machine learning according to claim 1, it is characterised in that:On ground
In location parsing module (1), using address dictionary to the address information of input according to district, small towns, village's group, neighbourhood committee, cell, building
Building is parsed successively.
3. a kind of address matching method based on dictionary and machine learning according to claim 1, it is characterised in that:On ground
In location standardized module (2), by being standardized filling and correction process to the address information parsed, it then will be handled
Address information afterwards is input in address matching module (3).
4. a kind of address matching method based on dictionary and machine learning according to claim 1, it is characterised in that:On ground
In location matching module (3), to the address information after standardization, by way of the query criteria address database that successively decreases,
Close a plurality of address data information is found out, the address information data found is passed in address screening module (4).
5. a kind of address matching method based on dictionary and machine learning according to claim 1, it is characterised in that:On ground
In location screening module (4), to inquiring address information data, editing distance is found out most first with the method for smallest edit distance
Small address, if there is multiple smallest edit distances, then using the cosine law to the address calculations of these smallest edit distances its
COS distance returns to the address of maximum COS distance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711332274.0A CN108052609A (en) | 2017-12-13 | 2017-12-13 | A kind of address matching method based on dictionary and machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711332274.0A CN108052609A (en) | 2017-12-13 | 2017-12-13 | A kind of address matching method based on dictionary and machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108052609A true CN108052609A (en) | 2018-05-18 |
Family
ID=62132650
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711332274.0A Pending CN108052609A (en) | 2017-12-13 | 2017-12-13 | A kind of address matching method based on dictionary and machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108052609A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112835894A (en) * | 2021-01-25 | 2021-05-25 | 武汉烽火普天信息技术有限公司 | Address matching method based on address coding and similarity calculation |
WO2021114825A1 (en) * | 2020-05-15 | 2021-06-17 | 平安科技(深圳)有限公司 | Method and device for institution standardization, electronic device, and storage medium |
CN113746946A (en) * | 2020-05-29 | 2021-12-03 | Sap欧洲公司 | Global address resolver |
CN114168705A (en) * | 2021-12-03 | 2022-03-11 | 南京大峡谷信息科技有限公司 | Chinese address matching method based on address element index |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090092323A1 (en) * | 2007-10-04 | 2009-04-09 | Weigen Qiu | Systems and methods for character correction in communication devices |
CN103996021A (en) * | 2014-05-08 | 2014-08-20 | 华东师范大学 | Fusion method of multiple character identification results |
EP2974434A2 (en) * | 2013-03-15 | 2016-01-20 | Bell, Tyler | Apparatus, systems, and methods for analyzing movements of target entities |
CN105868305A (en) * | 2016-03-25 | 2016-08-17 | 西安电子科技大学 | A fuzzy matching-supporting cloud storage data dereplication method |
CN106649803A (en) * | 2016-12-29 | 2017-05-10 | 华南师范大学 | Address matching method and system |
-
2017
- 2017-12-13 CN CN201711332274.0A patent/CN108052609A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090092323A1 (en) * | 2007-10-04 | 2009-04-09 | Weigen Qiu | Systems and methods for character correction in communication devices |
EP2974434A2 (en) * | 2013-03-15 | 2016-01-20 | Bell, Tyler | Apparatus, systems, and methods for analyzing movements of target entities |
CN103996021A (en) * | 2014-05-08 | 2014-08-20 | 华东师范大学 | Fusion method of multiple character identification results |
CN105868305A (en) * | 2016-03-25 | 2016-08-17 | 西安电子科技大学 | A fuzzy matching-supporting cloud storage data dereplication method |
CN106649803A (en) * | 2016-12-29 | 2017-05-10 | 华南师范大学 | Address matching method and system |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021114825A1 (en) * | 2020-05-15 | 2021-06-17 | 平安科技(深圳)有限公司 | Method and device for institution standardization, electronic device, and storage medium |
CN113746946A (en) * | 2020-05-29 | 2021-12-03 | Sap欧洲公司 | Global address resolver |
US11803748B2 (en) * | 2020-05-29 | 2023-10-31 | Sap Se | Global address parser |
CN113746946B (en) * | 2020-05-29 | 2023-12-12 | Sap欧洲公司 | Global address resolver |
CN112835894A (en) * | 2021-01-25 | 2021-05-25 | 武汉烽火普天信息技术有限公司 | Address matching method based on address coding and similarity calculation |
CN114168705A (en) * | 2021-12-03 | 2022-03-11 | 南京大峡谷信息科技有限公司 | Chinese address matching method based on address element index |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108052609A (en) | A kind of address matching method based on dictionary and machine learning | |
CN108369582B (en) | Address error correction method and terminal | |
CN109101474B (en) | Address aggregation method, package aggregation method and equipment | |
CN107609154A (en) | Method and device for processing multi-source heterogeneous data | |
CN106547770B (en) | User classification and user identification method and device based on user address information | |
CN106021336A (en) | A method for automatic administrative district division for mass address information | |
CN102289467A (en) | Method and device for determining target site | |
CN106326303A (en) | Spoken language semantic analysis system and method | |
CN104252507B (en) | A kind of business data matching process and device | |
CN110968654A (en) | Method, equipment and system for determining address category of text data | |
CN107577744A (en) | Nonstandard Address automatic matching model, matching process and method for establishing model | |
CN109635084A (en) | A kind of real-time quick De-weight method of multi-source data document and system | |
CN106991090A (en) | The analysis method and device of public sentiment event entity | |
CN105488471B (en) | A kind of font recognition methods and device | |
CN105224610A (en) | The method and apparatus that a kind of address is compared | |
CN105989015A (en) | Capacity expanding method of database and database accessing method and device | |
CN103246655A (en) | Text categorizing method, device and system | |
CN106469144A (en) | Text similarity computing method and device | |
CN112241458A (en) | Text knowledge structuring processing method, device, equipment and readable storage medium | |
CN106940711B (en) | URL detection method and detection device | |
CN105159885A (en) | Point-of-interest name identification method and device | |
CN109522335B (en) | Information acquisition method and device and computer readable storage medium | |
CN105138708A (en) | Method and device for identifying names of points of interest (POI) | |
CN110046341B (en) | Method and system for matching information | |
CN107577667B (en) | Entity word processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180518 |