CN108052609A - A kind of address matching method based on dictionary and machine learning - Google Patents

A kind of address matching method based on dictionary and machine learning Download PDF

Info

Publication number
CN108052609A
CN108052609A CN201711332274.0A CN201711332274A CN108052609A CN 108052609 A CN108052609 A CN 108052609A CN 201711332274 A CN201711332274 A CN 201711332274A CN 108052609 A CN108052609 A CN 108052609A
Authority
CN
China
Prior art keywords
address
module
dictionary
matching
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711332274.0A
Other languages
Chinese (zh)
Inventor
金勇�
李元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUHAN FENGHUO PUTIAN IT Co Ltd
Original Assignee
WUHAN FENGHUO PUTIAN IT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUHAN FENGHUO PUTIAN IT Co Ltd filed Critical WUHAN FENGHUO PUTIAN IT Co Ltd
Priority to CN201711332274.0A priority Critical patent/CN108052609A/en
Publication of CN108052609A publication Critical patent/CN108052609A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Remote Sensing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of address matching methods based on dictionary and machine learning, including address resolution module, Address Standardization module, address matching module and address screening module, described address parsing module is used for the address information that will be inputted and carries out address resolution, address date after parsing is input to Address Standardization module by standardization, address information that treated is input in address matching module and is matched, and the address information matched obtains final normal address information using the processing of address screening module;The present invention relates to information technology fields;Compared with other existing fuzzy matching based on address dictionary cutting, the method for the fuzzy address dictionary matching that the present invention uses is more flexible, address member dictionary need not be accumulated, avoid the problem that excessive manpower go to safeguard also effectively avoided while the dictionary of address because address information changes and the update of address dictionary caused matching rate declines not in time.

Description

A kind of address matching method based on dictionary and machine learning
Technical field
The present invention relates to information technology fields, are specially a kind of address matching method based on dictionary and machine learning.
Background technology
In mass text mining process in public security industry, it is often necessary to find map where the address in case information Position and the distance between address, to improve the visuality of crime address and calculate relevance.This is just needed known In the case of location, by finding its normal address and corresponding longitude and latitude with the comparison of normal address storehouse, then reflected by longitude and latitude It is mapped on map and calculates the distance between two addresses.But in actual project application, normal address storehouse is general There is normal address information more than million or even millions, if untreated to the address information of input directly match Operation, the matched accuracy rate that can have not only caused huge time cost but also can cause be not high.
So under big data background, a kind of quickly and effectively address matching method will promote artificial intelligence in natural language The sector application in speech field.
The content of the invention
Main problem to be solved by this invention is to provide a kind of address matching method based on dictionary and machine learning, from Accurate or immediate address information is quickly matched in the database of normal address, so as to extract the corresponding warp in the address Latitude.
Technical solution
In order to achieve the above object, the present invention is achieved by the following technical programs:One kind is based on dictionary and machine learning Address matching method, including address resolution module, Address Standardization module, address matching module and address screening module, institute The address information that address resolution module is stated for that will input carries out address resolution, and the address date after parsing is input to address standard Change module by standardization, address information that treated is input in address matching module and is matched, the ground matched The processing of location Information Pull address screening module obtains final normal address information.
As present invention further optimization scheme, in address resolution module, using address dictionary to the address of input Information is parsed successively according to district, small towns, village's group, neighbourhood committee, cell, building building.
As present invention further optimization scheme, in Address Standardization module, pass through the address information to parsing Filling and correction process are standardized, then the address information after processing is input in address matching module.
As present invention further optimization scheme, in address matching module, the address after standardization is believed Breath, by way of the query criteria address database that successively decreases, finds out close a plurality of address data information, and the address found is believed Breath data are passed in the screening module of address.
As present invention further optimization scheme, in the screening module of address, to inquiring address information data, first The address of editing distance minimum is found out using the method for smallest edit distance, if there is multiple smallest edit distances, then using remaining String theorem is to its COS distance of the address calculation of these smallest edit distances, the address of the maximum COS distance of return.
Advantageous effect
The main feature of the present invention:
1st, due to adding in Address Standardization processing procedure the matching accuracy rate of address is greatly improved.
2nd, address matching speed is caused faster using fuzzy matching algorithm.
3rd, using smallest edit distance and cosine similarity, accurate and unique normal address is filtered out.
The present invention has the following advantages that compared with prior art and advantageous effect:
First, compared with other existing fuzzy matching based on address dictionary cutting, the degression type mould of the invention used Matching algorithm is pasted, has the advantages that speed is fast, accuracy rate is high, simultaneously because adding the processing of address screening module so that sieve The address information selected is more accurate.
Secondly, compared with existing address book matching process, the method for the fuzzy address dictionary matching that the present invention uses is more Add flexibly, address member dictionary need not be accumulated, excessive manpower is avoided to go to safeguard while the dictionary of address and is also effectively avoided Because address information variation, address dictionary updates the problem of matching rate caused not in time declines.
Description of the drawings
Fig. 1 is the flow chart of the address matching of the present invention;
Fig. 2 is the address resolution module flow chart of the present invention;
Fig. 3 is the address matching module flow diagram of the present invention;
Fig. 4 is the address screening module flow chart of the present invention;
In figure:1- address resolution modules, 2- Address Standardizations module, 3- address matchings module, 4- addresses screening module.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, below in conjunction with attached drawing 1-4 and implementation Example, the present invention is further described in detail.It should be noted that specific embodiment as used herein is only used for explaining Illustrate the present invention, be not intended to limit the present invention.
It is of the invention that mainly there are four implementation steps:
Step 1:Load address parsing module solves the address of input using the address element dictinary information of loading Analysis, parses address according to district, small towns, village's group, neighbourhood committee, cell, building building.Such as address " Jiangsu Xiuqian City Su Yu County Shun He towns brewery neighbourhood committee XXXX ", the address information parsed is district:Su Yuxian, small towns:Shun He towns, neighbourhood committee:Slot Mill neighbourhood committee.
Step 2:Load address standardized module carries out correction process to the address information parsed and standardization is filled Operation makes address date become unified and is operated convenient for subsequent match.For example, people's ordinary practice will " warp in being write in address Ji technological development zone " is abbreviated as " economic development zone ", is filled with " economic and technological development zone " after Address Standardization processing so that Data format in the address and normal address storehouse is consistent, and ensures the accuracy of data.
Step 3:Load address matching module, the address information after standardization, composition querying condition is to study plot Location carries out quick search, if the address of inquiry is sky, successively decreases to querying condition, continues query criteria address base, directly Until address is inquired, while return to the normal address information inquired.
Step 4:Load address screening module, the normal address information matched may contain one or several addresses Information from the normal address information matched, it is necessary to find out most accurate or immediate unique address.First using minimum The computational methods of editing distance screen a plurality of information, find out the address of editing distance minimum.If the minimum volume matched It is volume unique apart from address, then directly return to the normal address.If the smallest edit distance address matched is not unique, using remaining String similitude continue the several normal addresses identical to smallest edit distance and address to be matched calculate successively cosine similarity away from From therefrom filtering out minimum COS distance, while return to the unique address.
It should be noted that herein, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to Non-exclusive inclusion, so that process, method, article or equipment including a series of elements not only will including those Element, but also including other elements that are not explicitly listed or further include as this process, method, article or equipment Intrinsic element.In the absence of more restrictions.
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with Understanding without departing from the principles and spirit of the present invention can carry out these embodiments a variety of variations, modification, replace And modification, the scope of the present invention is defined by the appended.

Claims (5)

1. a kind of address matching method based on dictionary and machine learning, including address resolution module (1), Address Standardization module (2), address matching module (3) and address screening module (4), which is characterized in that described address parsing module (1) is used to input Address information carry out address resolution, the address date after parsing is input to Address Standardization module (2) by standardization, Address information that treated is input in address matching module (3) and is matched, and the address information matched is screened using address The processing of module (4) obtains final normal address information.
2. a kind of address matching method based on dictionary and machine learning according to claim 1, it is characterised in that:On ground In location parsing module (1), using address dictionary to the address information of input according to district, small towns, village's group, neighbourhood committee, cell, building Building is parsed successively.
3. a kind of address matching method based on dictionary and machine learning according to claim 1, it is characterised in that:On ground In location standardized module (2), by being standardized filling and correction process to the address information parsed, it then will be handled Address information afterwards is input in address matching module (3).
4. a kind of address matching method based on dictionary and machine learning according to claim 1, it is characterised in that:On ground In location matching module (3), to the address information after standardization, by way of the query criteria address database that successively decreases, Close a plurality of address data information is found out, the address information data found is passed in address screening module (4).
5. a kind of address matching method based on dictionary and machine learning according to claim 1, it is characterised in that:On ground In location screening module (4), to inquiring address information data, editing distance is found out most first with the method for smallest edit distance Small address, if there is multiple smallest edit distances, then using the cosine law to the address calculations of these smallest edit distances its COS distance returns to the address of maximum COS distance.
CN201711332274.0A 2017-12-13 2017-12-13 A kind of address matching method based on dictionary and machine learning Pending CN108052609A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711332274.0A CN108052609A (en) 2017-12-13 2017-12-13 A kind of address matching method based on dictionary and machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711332274.0A CN108052609A (en) 2017-12-13 2017-12-13 A kind of address matching method based on dictionary and machine learning

Publications (1)

Publication Number Publication Date
CN108052609A true CN108052609A (en) 2018-05-18

Family

ID=62132650

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711332274.0A Pending CN108052609A (en) 2017-12-13 2017-12-13 A kind of address matching method based on dictionary and machine learning

Country Status (1)

Country Link
CN (1) CN108052609A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112835894A (en) * 2021-01-25 2021-05-25 武汉烽火普天信息技术有限公司 Address matching method based on address coding and similarity calculation
WO2021114825A1 (en) * 2020-05-15 2021-06-17 平安科技(深圳)有限公司 Method and device for institution standardization, electronic device, and storage medium
CN113746946A (en) * 2020-05-29 2021-12-03 Sap欧洲公司 Global address resolver
CN114168705A (en) * 2021-12-03 2022-03-11 南京大峡谷信息科技有限公司 Chinese address matching method based on address element index

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090092323A1 (en) * 2007-10-04 2009-04-09 Weigen Qiu Systems and methods for character correction in communication devices
CN103996021A (en) * 2014-05-08 2014-08-20 华东师范大学 Fusion method of multiple character identification results
EP2974434A2 (en) * 2013-03-15 2016-01-20 Bell, Tyler Apparatus, systems, and methods for analyzing movements of target entities
CN105868305A (en) * 2016-03-25 2016-08-17 西安电子科技大学 A fuzzy matching-supporting cloud storage data dereplication method
CN106649803A (en) * 2016-12-29 2017-05-10 华南师范大学 Address matching method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090092323A1 (en) * 2007-10-04 2009-04-09 Weigen Qiu Systems and methods for character correction in communication devices
EP2974434A2 (en) * 2013-03-15 2016-01-20 Bell, Tyler Apparatus, systems, and methods for analyzing movements of target entities
CN103996021A (en) * 2014-05-08 2014-08-20 华东师范大学 Fusion method of multiple character identification results
CN105868305A (en) * 2016-03-25 2016-08-17 西安电子科技大学 A fuzzy matching-supporting cloud storage data dereplication method
CN106649803A (en) * 2016-12-29 2017-05-10 华南师范大学 Address matching method and system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021114825A1 (en) * 2020-05-15 2021-06-17 平安科技(深圳)有限公司 Method and device for institution standardization, electronic device, and storage medium
CN113746946A (en) * 2020-05-29 2021-12-03 Sap欧洲公司 Global address resolver
US11803748B2 (en) * 2020-05-29 2023-10-31 Sap Se Global address parser
CN113746946B (en) * 2020-05-29 2023-12-12 Sap欧洲公司 Global address resolver
CN112835894A (en) * 2021-01-25 2021-05-25 武汉烽火普天信息技术有限公司 Address matching method based on address coding and similarity calculation
CN114168705A (en) * 2021-12-03 2022-03-11 南京大峡谷信息科技有限公司 Chinese address matching method based on address element index

Similar Documents

Publication Publication Date Title
CN108052609A (en) A kind of address matching method based on dictionary and machine learning
CN108369582B (en) Address error correction method and terminal
CN109101474B (en) Address aggregation method, package aggregation method and equipment
CN107609154A (en) Method and device for processing multi-source heterogeneous data
CN106547770B (en) User classification and user identification method and device based on user address information
CN106021336A (en) A method for automatic administrative district division for mass address information
CN102289467A (en) Method and device for determining target site
CN106326303A (en) Spoken language semantic analysis system and method
CN104252507B (en) A kind of business data matching process and device
CN110968654A (en) Method, equipment and system for determining address category of text data
CN107577744A (en) Nonstandard Address automatic matching model, matching process and method for establishing model
CN109635084A (en) A kind of real-time quick De-weight method of multi-source data document and system
CN106991090A (en) The analysis method and device of public sentiment event entity
CN105488471B (en) A kind of font recognition methods and device
CN105224610A (en) The method and apparatus that a kind of address is compared
CN105989015A (en) Capacity expanding method of database and database accessing method and device
CN103246655A (en) Text categorizing method, device and system
CN106469144A (en) Text similarity computing method and device
CN112241458A (en) Text knowledge structuring processing method, device, equipment and readable storage medium
CN106940711B (en) URL detection method and detection device
CN105159885A (en) Point-of-interest name identification method and device
CN109522335B (en) Information acquisition method and device and computer readable storage medium
CN105138708A (en) Method and device for identifying names of points of interest (POI)
CN110046341B (en) Method and system for matching information
CN107577667B (en) Entity word processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180518