CN110442603A

CN110442603A - Address matching method, apparatus, computer equipment and storage medium

Info

Publication number: CN110442603A
Application number: CN201910601364.8A
Authority: CN
Inventors: 申超波; 阮晓雯; 徐亮
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-07-03
Filing date: 2019-07-03
Publication date: 2019-11-12
Anticipated expiration: 2039-07-03
Also published as: CN110442603B; WO2021000831A1

Abstract

This application discloses address matching method, apparatus, computer equipment and storage mediums, wherein the first address of address matching method is the address to be retrieved of user's input, second address is stored in index server, method includes: to call preset matching algorithm, the first address and the second address are segmented according to the first preset rules respectively, obtain the corresponding second participle group of the corresponding first participle group in the first address and the second address, wherein preset matching algorithm includes participle calculating and matching primitives；The first address is divided into multiple first segmentations according to first participle group, the second address is divided by multiple second segmentations according to the second participle group；The matching result of the first segmentation with the second segmentation is obtained according to the second preset rules, and judges whether the first address and the second address are identical.It for sectional address first four administrative grade address, is accurately matched according to national county and town, province, city and region address base (tree-shaped), effective completion is carried out for excalation.

Description

Address matching method, apparatus, computer equipment and storage medium

Technical field

This application involves computer field is arrived, address matching method, apparatus, computer equipment and storage are especially related to Medium.

Background technique

Traditional address fuzzy matching often carries out fuzzy matching based on NLP using address as a complete individual, but There are following defects for this mode: 1) structure of address is the tree structure of address name, closer to the bottom of tree structure The similar ability of layer more closely, but matched address name is that parallel construction compares as a whole, compare and do not meet address name Actual distribution structure；2) can be poor for short address comparative effectiveness, but most of short address is that have compared with sound value.3) same The address name of a address is inconsistent in practice, such as Shenzhen/Nanshan District/rises as word individual value congruency Mansion is interrogated, wherein address name Tencent mansion obviously can be more valuable as effective address.

Summary of the invention

The main purpose of the application is to provide address matching method, it is intended to solve the technology of existing address matching existing defects Problem.

The application proposes a kind of address matching method, and the first address is the address to be retrieved of user's input, and the second address is deposited It is stored in index server, method includes:

Preset matching algorithm is called, respectively carries out first address and second address according to the first preset rules Participle, obtains the corresponding second participle group of the corresponding first participle group in first address and second address, wherein described Preset matching algorithm includes participle calculating and matching primitives；

First address is divided into multiple first segmentations according to the first participle group, according to the second participle group Second address is divided into multiple second segmentations；

The matching result of all first segmentations with all second segmentations is obtained according to the second preset rules；

Judge whether first address and second address are identical according to the matching result.

Present invention also provides a kind of address matching device, the first address is the address to be retrieved of user's input, the second ground Location is stored in index server, and device includes:

Word segmentation module, for calling preset matching algorithm, respectively by first address and second address according to One preset rules are segmented, and obtain the corresponding first participle group in first address and second address is second point corresponding Phrase, wherein the preset matching algorithm includes participle calculating and matching primitives；

Division module, for first address to be divided into multiple first segmentations according to the first participle group, according to Second address is divided into multiple second segmentations by the second participle group；

Second obtains module, for obtaining all first segmentations and all described second points according to the second preset rules The matching result of section；

Judgment module, for judging whether first address and second address are identical according to the matching result.

Present invention also provides a kind of computer equipment, including memory and processor, the memory is stored with calculating The step of machine program, the processor realizes the above method when executing the computer program.

Present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the computer The step of above-mentioned method is realized when program is executed by processor.

The application is for sectional address first four administrative grade address, according to national county and town, province, city and region address base (tree-shaped) It is accurately matched, in addition, carrying out effective completion for excalation.Pre-stored data are in the index server of the application Unstructured data, storage mode are the column storage forms of key-value pair, and unstructured data refers to text, image, voice etc. Based on the column storage that NoSQL memory technology is formed, data volume is very big, needs to carry out using the NoSQL technology of distributed structure/architecture Storage and calculating, index server are just being combined with the distributed structure/architecture storage of NoSQL and index structure and are realizing to magnanimity number According to it is real-time quick inquiry and calculating, propose the configurable weight address matching model based on address GradeNDivision, first pass through Natural Language Processing Models carry out participle to address name and form participle group, and participle phrase is divided ingredient according to administrative grade Section, and be the node in tree by subsection compression, the tree of address is fully considered, by address according to administrative grade It carries out classification and draws section, each administrative grade two stage cultivation difference weight, the fine-tuning weight of practical business scene.The application by pair The mass data prestored in index server establishes index structure, in conjunction with Elastic search component itself computing architecture with And powerful distributed computation ability, it realizes to the first address in default index structure, is inquired real-time, quickly.The application Default-weight by training pattern training obtain, by constantly regulate training parameter in the training process, keep model training defeated Similarity out is consistent with the similarity value marked in advance, or within the scope of predetermined deviation, and above-mentioned training parameter includes each weight Value keeps weight setting more reliable with each weighted value of determination.

Detailed description of the invention

The address matching method flow schematic diagram of one embodiment of Fig. 1 the application；

The address matching apparatus structure schematic diagram of one embodiment of Fig. 2 the application；

The computer equipment schematic diagram of internal structure of one embodiment of Fig. 3 the application.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.

Referring to Fig.1, the address matching method of one embodiment of the application, first address be user input to be retrievedly Location, second address are stored in index server, and method includes:

S1: calling preset matching algorithm, respectively segment the first address and the second address according to the first preset rules, Obtain the corresponding second participle group of the corresponding first participle group in first address and second address, wherein described default Matching algorithm includes participle calculating and matching primitives.

In the present embodiment, for comparing the first address and two address similitude, above-mentioned first address and the second ground It from high to low, is write by range to specific mode for foundation administrative grade location.The first preset rules root of the present embodiment There is different word segmentation regulations, such as national general province/city/area, county/township, town according to administrative grade difference locating in address The corresponding participle of four administrative grades usually borrows national general address database and is segmented.Such as Fushan City, Guangdong Province south Sea area osmanthus cities and towns, word segmentation result are as follows: Guangdong Province/Foshan City/Nanhai District/osmanthus cities and towns.For above-mentioned province/city/area, county/township, town Address information except four administrative grades is segmented by way of semanteme participle.

S2: first address is divided by multiple first segmentations according to the first participle group, according to described second point Second address is divided into multiple second segmentations by phrase.

The present embodiment is segmented to address and/or is divided administrative hierarchy, Mei Gefen according to the corresponding participle phrase in address Section or the corresponding one or more participles of each administrative hierarchy.Each first segmentation, the second address are corresponded to for convenience of the first address is distinguished Corresponding each second segmentation, " first " of the present embodiment, " second " etc. are only used for distinguishing, and are not used in restriction, the similar use of elsewhere Language effect is identical, does not repeat.Participle group is that the participle of actual address arranges, and is formed according to the writing order of raw address.Such as name Claim in long " certain city development zone ", corresponding two participles " certain city/development zone ", but segmentation is on the basis of participle according to row The segmentation that political affairs grade carries out, such as " certain city development zone " belong to a segmentation.

S3: the matching result of all first segmentations with all second segmentations is obtained according to the second preset rules.

The present embodiment according to the corresponding relationship of administrative grade, obtains the first segmentation and the second segmentation after being matched one by one Matching result.Citing ground, corresponding first segmentation of province's rank of the first address are second point corresponding with two address province's rank Section compares, to improve the symmetry and reliability of information comparison.

S4: judge whether first address and second address are identical according to the matching result.

The present embodiment is by the corresponding relationship of administrative grade, and one-to-one correspondence compares the first address and the second address, when first Address and two address matching rate reach preset range, then determine that the first address is identical with the second address, otherwise different.This Shen It please not require nothing more than matching rate and reach preset range, and require the specified corresponding two stage cultivation degree of administrative grade in other embodiments Reach 100%, can determine that the first address is identical with the second address, it is otherwise different, to improve matching accuracy.

First address of the present embodiment is the address to be checked of user's input, and the data composed structure of the first address does not limit It is fixed, the matching primitives to address to be checked can be achieved, improve flexibility ratio and freedom degree that user uses.For example, the first address Including according to province, city/area/county/town, township/road, cell, mansion/and the data group successively arranged of six administrative grades of number At, or the composition of the data including lacking some or certain several administrative grades.The preset matching condition of the present embodiment includes matching rate The flag data reached in preset threshold or the first address reaches 100% matching etc..Above-mentioned flag data refers to energy in the first address It is described in detail the data information of geographical position, such as title, the title of certain mansion of some cell.Such as the first address Zhong Bao " Jiangnan name occupies cell Rong Yuan " included is flag data.The flag data of first address of another embodiment of the application be " town, After township " administrative grade, the data information before " and number " is flag data.

Further, first address and second address respectively include range address and tag addresses, the tune With preset matching algorithm, the first address and the second address are segmented according to the first preset rules respectively, obtain described first The step S1 of the corresponding second participle group of the corresponding first participle group in address and second address, comprising:

S11: by the corresponding range address in first address and second address, according to natural language processing mould Pre-association address dictionary is segmented in type, respectively obtains the corresponding first participle part in first address and second ground The corresponding first participle part in location.

The range address of the present embodiment is including at least an administrative grade in province/city/area, county/township, four, town administrative grade Not.The range address of the present embodiment is segmented by pre-association address dictionary, and address above mentioned dictionary is national address database In corresponding dictionary, address name is segmented by being associated with Natural Language Processing Models in advance.The present embodiment is preset Matching algorithm includes analytical calculation and matching primitives, in order to improve address matching precision, by open source segmentation methods packet Jieba carry out participle calculate when, be added to crawler address base, be used in combination with national address base treat participle address carry out school Just, it is then segmented according to administrative grade, improves the accuracy rate of participle.The administrative grade for being included by judging current address It whether is the corresponding administrative grade of call address dictionary, if so, call address dictionary carries out participle calculating.Citing ground, address: Fushan City, Guangdong Province Nanhai District osmanthus cities and towns Jiangnan name occupies in cell Rong Yuan 1 306, including the corresponding level Four row of call address dictionary Political affairs rank then segments the corresponding level Four administrative grade in address according to address dictionary, and word segmentation result is as follows: Guangdong Province/Buddhist Mountain city/Nanhai District/osmanthus cities and towns/Jiangnan name occupies cell Rong Yuan 1 306.Then first participle part correspond to Guangdong Province/Foshan City/ Nanhai District/osmanthus cities and towns.

S12: by the corresponding tag addresses in first address and second address, according to natural language processing mould The first syntactic model in type is segmented, and first address corresponding second participle part and second ground are respectively obtained Location corresponding second participle part.

The tag addresses of the present embodiment include the information of geographical position can be described in detail, such as some cell title, certain The title of mansion.Such as " Jiangnan name occupies cell Rong Yuan " in address above mentioned.The present embodiment is according to Natural Language Processing Models In the first syntactic model tag addresses are segmented, above-mentioned first syntactic model include but is not limited to " so-and-so cell ", " certain Certain mansion " etc..For example " osmanthus cities and towns Jiangnan name occupies cell Rong Yuan 1 306 ", corresponding second participle part are " Gui Cheng/Jiangnan name Occupy cell/Rong Yuan ".First syntactic model of another embodiment of the application is after extracting " town, township ", before " and number " Character be tag addresses.

S13: by the corresponding first participle part in first address and first address corresponding second participle part The corresponding first participle group in first address is formed, by the corresponding first participle part in second address and second ground Location corresponding second participle part forms the corresponding second participle group in second address.

First address or the second address of the present embodiment include range address and tag addresses, and from left to right successively Arrangement the first address of composition or the second address.Citing ground, the first address are that " Fushan City, Guangdong Province Nanhai District osmanthus cities and towns Jiangnan name occupies Cell Rong Yuan "；Second address is " the Fushan City, Guangdong Province Nanhai District osmanthus cities and towns Jiangnan garden Ming Jurong "；First address corresponding first Participle group is that " Guangdong Province/Foshan City/Nanhai District/osmanthus cities and towns/Jiangnan name occupies cell/Rong Yuan " and the second address are second point corresponding Phrase is " Guangdong Province/Foshan City/Nanhai District/osmanthus cities and towns/Jiangnan name residence/Rong Yuan ".

Further, first address and second address respectively further comprise details address, described by described first The corresponding tag addresses in address and second address are carried out according to the first syntactic model in Natural Language Processing Models Participle respectively obtains first address corresponding second participle part the second participle corresponding with second address part After step S12, comprising:

S14: by the corresponding details address in first address and second address, according to natural language processing mould The second syntactic model in type is segmented, and first address corresponding third participle part and second ground are respectively obtained The corresponding third in location segments part.

The details address of the present embodiment is specific " and number ", is had for two address similitudes of matching small Effect and influence, or even this partial content can be ignored in other embodiments.But essence is needed for certain specific application scenarios Standard arrives details address, to meet business demand.Second syntactic model of the present embodiment include but is not limited to " certain ", " certain certain Floor ", " certain certain floor room " etc..

S15: by the corresponding first participle part in first address, first address corresponding second participle part with And the corresponding third participle in the first address part forms the corresponding first participle group in first address, by second ground The corresponding first participle part in location, second address corresponding second participle part and the corresponding third in second address Participle part forms the corresponding second participle group in second address.

First address or the second address of the present embodiment include range address, tag addresses and details address, and from Left-to-right successively arrangement the first address of composition or the second address.Citing ground, the first address are " Fushan City, Guangdong Province Nanhai District Gui Cheng Zhenjiang Nan Mingju cell Rong Yuan 1 306 "；Second address is " 1, the Fushan City, Guangdong Province Nanhai District osmanthus cities and towns Jiangnan garden Ming Jurong 502"；The corresponding first participle group in first address is that " Guangdong Province/Foshan City/Nanhai District/osmanthus cities and towns/Jiangnan name occupies cell/honor The corresponding second participle group in garden/1/306 " and the second address be " Guangdong Province/Foshan City/Nanhai District/osmanthus cities and towns/Jiangnan name occupies/ Rong Yuan/1/502 ", to be segmented or divided administrative grade to the first address or the second address according to above-mentioned participle phrase.

Further, the range address includes province/city/area, county/township, four, town administrative grade, the tag addresses packet Cell name or building name are included, it is described to obtain all first segmentations and all described second points according to the second preset rules The step S3 of the matching result of section, comprising:

S31: all first segmentations are segmented respectively with all described second according to administrative grade from high to low suitable Sequence is mapped as two mutually isostructural structure trees, wherein the structure tree includes multiple nodes, and each node is respectively with each described One segmentation or each second segmentation correspond.

The present embodiment passes through corresponding all first segmentations in the first address or the second address are all second points corresponding Section, is two mutually isostructural structure trees according to the Sequential Mapping of administrative grade from high to low, and a node is at least one corresponding Segmentation or a node correspond to multiple participles of same administrative grade.Such as the highest administrative grade that will contain in the first address " province " corresponding participle " Guangdong Province " is used as root node, is then sequentially connected the corresponding participle " Foshan of next stage child node " city " City ", then and so on be connected to endpoint node " 1 502 " etc..According to the difference of specific address information, root node and end The corresponding administrative grade of node is different, can be the full address for covering all administrative grades, is also possible to covering part branch The short address of political affairs rank.

S32: the corresponding matching value of each node of structure tree of acquisition two.

The matching primitives of the present embodiment are the corresponding relationships according to administrative grade, map the intermediate nodes of two structure trees with Corresponding relationship between node, and obtained according to above-mentioned corresponding relationship and calculate the corresponding matching value of each node, matching value packet Matching segmentation is included divided by the corresponding all segmentations of the node.Citing ground, the corresponding node in the first address, and saved for " province " Point is assigned a value of in " Guangdong ", and corresponding " province " node valuation in the second address is also that " Guangdong " is then matching, is otherwise mismatched.

S33: obtain respectively corresponding first weight of the range address, corresponding second weight of the tag addresses and The corresponding third weight in the details address.

Influence of the present embodiment according to the corresponding segmentation of each administrative grade to address is different, different weights is arranged, to mention Height meets the flexibility ratio of business demand.Such as corresponding second weight of tag addresses is higher than corresponding first power of the range address Again etc..

S34: matching rate is calculated multiplied by respective weights according to matching value, it is first corresponding to respectively obtain the range address With rate, the corresponding third matching rate of corresponding second matching rate of the tag addresses and the details address.

The calculation formula of the present embodiment matching rate are as follows: each each segmented configuration weight of two stage cultivation result * is equal to each segmentation Matching rate sums up the matching rate of each segmentation, obtains the first address and two address matching result.

S35: by the adduction of first matching rate, second matching rate and the third matching rate, as the institute There is the matching result of first segmentation with all second segmentations.

Further, the step S32 of the corresponding matching value of each node of structure tree of acquisition two, comprising:

S321:, and will be in second address by corresponding each first segmentation of the range address in first address Corresponding each second segmentation of range address, corresponds according to node corresponding relationship and carries out precisely full matching, obtain each first With value.

The matching process of the different administrative grade corresponding nodes of the present embodiment is different, province/city/area, county/township, four, town row Political affairs rank is matched by complete matched accurate corresponded manner, i.e., it is then to match, otherwise that it is identical, which correspond to the correspondence of character 100%, It mismatches.For example, corresponding " province " node valuation in the first address is " Guangdong ", corresponding " province " node valuation in the first address is also " Guangdong " is then matching.

S322:, and will be in second address by corresponding each first segmentation of the tag addresses in first address Corresponding each second segmentation of tag addresses, corresponds according to node corresponding relationship and carries out model keyword match, obtain each the Two matching values.

The present embodiment passes through NLP (Natural Language Processing, natural language to tag addresses corresponding segments Speech processing) mode of Model Matching realizes matching, including or comprising matching relationship can be realized.For example " Jiangnan name occupies cell/honor Garden " includes in " Jiangnan name occupies cell " although not having complete matched peer-to-peer on character with " Jiangnan name residence/Rong Yuan " Character " Jiangnan name occupies ", still has one-to-one matching relationship.

S323:, and will be in second address by corresponding each first segmentation in the details address in first address Corresponding each second segmentation in details address, corresponds according to node corresponding relationship and carries out digital matching, obtain each third matching Value.

The details address of the present embodiment includes the segmentation of the first specified quantity, but the number of fragments for meeting matching relationship is the Two specified quantities, then the corresponding matching value in details address is the second specified quantity divided by the first specified quantity.

S324: summarize each first matching value, each second matching value and each third matching value, obtain two The corresponding matching value of described each node of structure tree.

For example, the corresponding participle phrase in the first address are as follows: the Guangdong/Foshan City/South Sea/Gui Cheng/Jiangnan name occupies cell/honor Garden/1/306；The corresponding participle phrase in second address are as follows: the Guangdong/Foshan City/South Sea/Gui Cheng/Jiangnan name residence/Rong Yuan/1/502； The first address and the second address are divided into six administrative grades after segmentation, including province/city/area, county/town, township/road, cell, big Tall building/and number, respectively correspond and be divided into six nodes, each node default-weight be respectively " 0.1/0.1/0.1/0.1/0.5/ 0.1".Preceding four administrative hierarchy is the matching of character 100%: the Guangdong/Foshan City/South Sea/Gui Cheng, matching result is respectively 0.1*1/ 0.1*1/0.1*1/0.1*1；Fifth line political affairs ratings match is the Model Matching of character inclusion relation: Jiangnan name occupies cell/Rong Yuan It is 0.5*1 with Jiangnan name residence/Rong Yuan matching result；The matching of 6th administrative hierarchy is fuzzy matching: 1/306 and 1/502 matching In, only one field of corresponding two fields has matching relationship, and 306 and 502 mismatch, then corresponding matching value is 0.5, Then matching result be 0.5*0.1, i.e., 0.05.Then above-mentioned first address and two address matching rate are as follows: 0.1+0.1+0.1+0.1 + 0.5+0.05=0.95.

Further, described to obtain corresponding first weight of the range address, the tag addresses corresponding respectively Before the step S33 of the corresponding third weight of two weights and the details address, comprising:

S331: by the training sample of the specified quantity of pre- mark similarity value, the Natural Language Processing Models are input to In be trained.

S332: by adjusting training parameter to the first parameter, make the similarity value of the Natural Language Processing Models output It is consistent with the pre- mark similarity value.

S333: by corresponding weighted value in first parameter, described first is corresponded to according to node corresponding relationship respectively Weight, second weight and the third weight.

The default-weight of the present embodiment is obtained by training pattern training, by constantly regulate trained ginseng in the training process Number, the similarity for exporting model training is consistent with the similarity value marked in advance, or within the scope of predetermined deviation, above-mentioned training Parameter includes each weighted value, with each weighted value of determination.The application other embodiments can also will be adjusted according to specific application scenarios One or more of default-weight makes Matching Model be more in line with current application scene.

Further, described by the corresponding range address in first address and second address, according to nature Pre-association address dictionary is segmented in Language Processing model, respectively obtain the corresponding first participle part in first address and Before the step S11 of the corresponding first participle part in second address, comprising:

S10: call address database according to third preset rules, respectively to first address and second address into Row address amendment.

First address or the second address of the present embodiment can be the address date not met in national address database, can Address correction, including address completion, removal determiner etc. are carried out by call address database.When the completion of the present embodiment address, It can upward completion Foshan City according to child node completion root node, such as Nanhai District；Or according to front and back node completion intermediate node, such as Foshan City and osmanthus cities and towns can carry out address completion in a manner of intermediate completion Nanhai District etc..

Further, preset matching algorithm is called, it is respectively that first address and second address is pre- according to first If rule is segmented, the corresponding second participle group of the corresponding first participle group in first address and second address is obtained Step S1 before, comprising:

S1a: by the index server be pre-stored specified quantity non-structured being indexed of address date, To obtain the default index structure.

The data being pre-stored in the index server of the present embodiment are unstructured data, and storage mode is key-value pair Column storage form, unstructured data refer to the column storage that text, image, voice etc. are formed based on NoSQL memory technology, data Amount is very big, needs to be stored and calculated using the NoSQL technology of distributed structure/architecture, index server is just combined with The distributed structure/architecture of NoSQL stores and index structure realizes real-time quick inquiry and calculating to mass data.NOSQL, that is, non- Relevant database is open source technology.Storage mode of the elasticsearch based on Key-value key-value pair and inverted index, Calculate then it is main a large amount of based on memory, realize and quickly calculate in real time.

S1b: receiving the interface card being uploaded under the specified directory of the index server, wherein the interface card is logical It crosses after the preset matching algorithm is carried out packing encapsulation and is formed.

The index server of the present embodiment is open source component, supports plug-in unit mode, interface card can be inherited to its rg. rope Draw server .plugins.Plugin class, carries out the customized address matching algorithm groupware expanded and developed, restart index server Use can be loaded.

S1c: the configuration parameter of the interface card is obtained.

S1d: the default index structure is associated with interface card foundation calculating by running the configuration parameter Relationship.

The present embodiment by preset matching algorithm development it is complete after, upload to index server specified directory after being packaged encapsulation and go forward side by side The configuration of row relevant configured parameter is inserted the default index structure with the interface with realizing through load operating configuration parameter Part, which is established, calculates incidence relation, realizes by calling address matching algorithm in plug-in unit, by the first address in default index structure Matching primitives are completed, to realize that address date is inquired.

The index server of the present embodiment is that (Elastic search is for being distributed for the Elastic search component of open source Formula full-text search), the full-text search engine of distributed computation ability is provided based on RESTful web interface, it can be to magnanimity Data are inquired real-time, quickly.Query steps include: the number of (1) by the address of magnanimity address base according to elasticsearch The bottom storage of elasticsearch is imported in the form of key-value key-value pair according to introducting interface, and key is established and is indexed. (2) by the ground of (1) Matching Model is transformed according to the customized extension search model of elasticsearch, and is added to Elasticsearch host node expansion module, and restart elasticsearch, making can be based on utilization The address matching model that the distributed storage and high concurrent of elasticsearch calculates.(3) self-definition model, In are utilized One-to-many magnanimity address matching interface is developed on elasticsearch.(4) it is connect by developing upper layer on elasticsearch Mouthful, so that a new address can be inputted, and select magnanimity address base and self-definition model to be matched, it can be based on Elasticsearch realizes the quick real calculating of address in new address and magnanimity address base, and returns to the most like address TOPN, Wherein N can program setting biography ginseng.The present embodiment by establishing index structure to the mass data prestored in index server, in conjunction with The computing architecture of Elasticsearch component itself and powerful distributed computation ability are realized to the first address default In index structure, inquired real-time, quickly.

The present embodiment is different for the matching process of the corresponding different segmentations of the first address difference administrative grade, Matching Model Difference, and it is also different to be respectively segmented corresponding matching weight.First address of the present embodiment is divided into six segmentations, respectively corresponds six A administrative grade corresponds to six nodes in tree construction, and the Matching Model of first four administrative grade is identical in six administrative grades, It corresponds and matches for character；5th administrative grade by the inclusion of or including fuzzy matching model；6th administrative grade It is matched by digital Matching Model.The present embodiment by the way that strobe utility is arranged during matching primitives, first to " province/city, Four, area/county/town, township, road " the corresponding target segment of administrative grade is precisely matched by character matched mode one by one It calculates, when the matching primitives result of target segment corresponding for aforementioned four administrative grade is lower than preset threshold, determines institute It states there is no the address date for meeting preset matching condition with first address in default index structure, directly output matching is tied By to reduce matching primitives amount, raising response speed.The present embodiment by setting strobe utility, can filter at least 90% with On address.Make an address finally only need to be matched entirely with the address of residue 10% or so in this way, is greatly saved Computing resource.

Referring to Fig. 2, the address matching device of one embodiment of the application, first address be user's input to be retrievedly Location, second address are stored in index server, and device includes:

Word segmentation module 1, it is respectively that the first address and the second address is pre- according to first for calling the preset matching algorithm If rule is segmented, corresponding second participle of the corresponding first participle group in first address and second address is obtained Group, wherein the preset matching algorithm includes participle calculating and matching primitives.

Division module 2, for first address to be divided into multiple first segmentations according to the first participle group, according to Second address is divided into multiple second segmentations by the second participle group.

First obtains module 3, for obtaining all first segmentations and all described second according to the second preset rules The matching result of segmentation.

Judgment module 4, for judging whether first address and second address are identical according to the matching result.

Further, the word segmentation module 1, comprising:

First participle unit, for by the corresponding range address in first address and second address, according to Pre-association address dictionary is segmented in Natural Language Processing Models, respectively obtains the corresponding first participle portion in first address Divide first participle part corresponding with second address.

The range address of the present embodiment is including at least an administrative grade in province/city/area, county/township, four, town administrative grade Not.The range address of the present embodiment is segmented by pre-association address dictionary, and address above mentioned dictionary is national address database In corresponding dictionary, address name is segmented by being associated with Natural Language Processing Models in advance.The present embodiment is in order to mention High address matching precision is added to crawler address base when calculating by carrying out participle in open source segmentation methods packet jieba, with National address base be used in combination treat participle address be corrected, then segmented according to administrative grade, improve participle Accuracy rate.By judging whether the administrative grade that current address is included is the corresponding administrative grade of call address dictionary, if so, Then call address dictionary is segmented.Citing ground, address: Fushan City, Guangdong Province Nanhai District osmanthus cities and towns Jiangnan name occupies cell Rong Yuan 1 In seat 306, including the corresponding level Four administrative grade of call address dictionary, then by the corresponding level Four administrative grade in address according to address Dictionary is segmented, and word segmentation result is as follows: Guangdong Province/Foshan City/Nanhai District/osmanthus cities and towns/Jiangnan name occupies cell Rong Yuan 1 306. Then first participle part corresponds to Guangdong Province/Foshan City/Nanhai District/osmanthus cities and towns.

Second participle unit, for by the corresponding tag addresses in first address and second address, according to The first syntactic model in Natural Language Processing Models is segmented, and corresponding second participle in the first address portion is respectively obtained Divide the second participle corresponding with second address part.

First component units, for the corresponding first participle part in first address and first address is corresponding Second participle part forms the corresponding first participle group in first address, by the corresponding first participle part in second address The second participle corresponding with second address part forms the corresponding second participle group in second address.

Further, first address and second address respectively further comprise details address, the word segmentation module 1, Include:

Third participle unit, for by the corresponding details address in first address and second address, according to The second syntactic model in Natural Language Processing Models is segmented, and the corresponding third participle in the first address portion is respectively obtained Third corresponding with second address is divided to segment part.

Second component units, for the corresponding first participle part in first address, first address is corresponding Second participle part and first address corresponding third participle part form the corresponding first participle in first address Group, by the corresponding first participle part in second address, second address corresponding second participle part and described the Double-address corresponding third participle part forms the corresponding second participle group in second address.

Further, the range address includes province/city/area, county/township, four, town administrative grade, and described first obtains mould Block 3, comprising:

Map unit, for being segmented all first segmentations respectively according to administrative grade by height with all described second Be two mutually isostructural structure trees to low Sequential Mapping, wherein the structure tree includes multiple nodes, each node respectively with Each first segmentation or each second segmentation correspond.

First acquisition unit is used for the corresponding matching value of each node of structure tree of acquisition two.

The present embodiment maps corresponding between the intermediate node of two structure trees and node according to the corresponding relationship of administrative grade Relationship, and the corresponding matching value of each node is obtained according to above-mentioned corresponding relationship, matching value includes matching segmentation divided by the section The corresponding all segmentations of point.Citing ground, the corresponding node in the first address, and be " province " node, it is assigned a value of in " Guangdong ", second Corresponding " province " node valuation in address is also that " Guangdong " is then matching, is otherwise mismatched.

Second acquisition unit, it is corresponding for obtaining corresponding first weight of the range address, the tag addresses respectively The corresponding third weight of the second weight and the details address.

Computing unit respectively obtains the range address pair for calculating matching rate multiplied by respective weights according to matching value The first matching rate, the corresponding third matching rate of corresponding second matching rate of the tag addresses and the details address answered.

Unit is summed it up, for making the adduction of first matching rate, second matching rate and the third matching rate For the matching result of all first segmentations and all second segmentations.

Further, the first acquisition unit, comprising:

First coupling subelement, for by corresponding each first segmentation of the range address in first address, and by institute Corresponding each second segmentation of range address in the second address is stated, is corresponded according to node corresponding relationship and carries out precisely complete Match, obtains each first matching value.

Second coupling subelement, for by corresponding each first segmentation of the tag addresses in first address, and by institute Corresponding each second segmentation of tag addresses in the second address is stated, is corresponded according to node corresponding relationship and carries out model keyword Matching, obtains each second matching value.

Third coupling subelement, for by corresponding each first segmentation in the details address in first address, and by institute Corresponding each second segmentation in details address in the second address is stated, is corresponded according to node corresponding relationship and carries out digital matching, Obtain each third matching value.

Summarize subelement, for summarizing each first matching value, each second matching value and each third With value, the corresponding matching value of two each nodes of structure tree is obtained.

Further, described first module 3 is obtained, comprising:

Input unit, for being input to the natural language for the training sample of the specified quantity of pre- mark similarity value It is trained in processing model.

Adjustment unit, for making the Natural Language Processing Models output by adjusting training parameter to the first parameter Similarity value is consistent with the pre- mark similarity value.

Corresponding unit, for being corresponded to according to node corresponding relationship respectively by corresponding weighted value in first parameter First weight, second weight and the third weight.

Further, the word segmentation module 1, comprising:

Call unit, for call address database according to third preset rules, respectively to first address and described Second address carries out address correction.

Further, address matching device, further includes:

Index module, for will in the index server be pre-stored specified quantity non-structured address date into Line index, to obtain the default index structure.

Receiving module, for receiving the interface card being uploaded under the specified directory of the index server, wherein described Interface card after the preset matching algorithm is carried out packing encapsulation by forming.

Second obtains module, for obtaining the configuration parameter of the interface card.

Module is established, for establishing the default index structure and the interface card by running the configuration parameter Calculate incidence relation.

The present embodiment by address matching algorithm development it is complete after, upload to index server specified directory after being packaged encapsulation and go forward side by side The configuration of row relevant configured parameter is inserted the default index structure with the interface with realizing through load operating configuration parameter Part, which is established, calculates incidence relation, realizes by calling address matching algorithm in plug-in unit, by the first address in default index structure Matching primitives are completed, to realize that address date is inquired.

Referring to Fig. 3, a kind of computer equipment is also provided in the embodiment of the present application, which can be server, Its internal structure can be as shown in Figure 3.The computer equipment includes processor, the memory, network connected by system bus Interface and database.Wherein, the processor of the Computer Design is for providing calculating and control ability.The computer equipment is deposited Reservoir includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program And database.The internal memory provides environment for the operation of operating system and computer program in non-volatile memory medium.It should The database of computer equipment is used for all data that storage address matching process needs.The network interface of the computer equipment is used It is communicated in passing through network connection with external end.To realize address matching method when the computer program is executed by processor.

Above-mentioned processor executes address above mentioned matching process, and the first address is the address to be retrieved of user's input, the second ground Location is stored in index server, and method includes: to call preset matching algorithm, respectively by first address and second ground Location is segmented according to the first preset rules, obtains the corresponding first participle group in first address and second address is corresponding The second participle group；First address is divided into multiple first segmentations according to the first participle group, according to described second Second address is divided into multiple second segmentations by participle group；According to the second preset rules obtain it is all it is described first segmentation with The matching result of all second segmentations；Judge whether are first address and second address according to the matching result It is identical.

Above-mentioned computer equipment, the data being pre-stored in index server are unstructured data, and storage mode is key The column storage form of value pair, the column that unstructured data refers to that text, image, voice etc. are formed based on NoSQL memory technology are deposited Storage, data volume is very big, needs to be stored and calculated using the NoSQL technology of distributed structure/architecture, index server is exactly tied The distributed structure/architecture storage and index structure of having closed NoSQL realize real-time quick inquiry and calculating to mass data, propose Configurable weight address matching model based on address GradeNDivision, first pass through Natural Language Processing Models to address name into Row participle forms participle group, participle phrase is divided into segmentation according to administrative grade, and be in tree by subsection compression Node has fully considered the tree of address, and address is carried out classification according to administrative grade and draws section, each administrative grade segmentation Match different weights, the fine-tuning weight of practical business scene.By establishing index to the mass data prestored in index server Structure, computing architecture and powerful distributed computation ability in conjunction with Elastic search component itself are realized to first Address is inquired real-time, quickly in default index structure.For sectional address first four administrative grade address, according to the whole nation Province, city and region's county and town's address base (tree-shaped) is accurately matched, in addition, carrying out effective completion for excalation.Default-weight is logical Training pattern training is crossed to obtain, by constantly regulate training parameter in the training process, the similarity that exports model training with The similarity value marked in advance is consistent, or within the scope of predetermined deviation, and above-mentioned training parameter includes each weighted value, with each power of determination Weight values keep weight setting more reliable.

In one embodiment, first address and second address respectively include range address and tag addresses, Above-mentioned processor calls the preset matching algorithm, is respectively divided the first address and the second address according to the first preset rules Word, the step of obtaining the corresponding first participle group in first address and the corresponding second participle group in second address, comprising: By the corresponding range address in first address and second address, according to pre-association in Natural Language Processing Models Location dictionary is segmented, and the corresponding first participle part in first address and second address corresponding first are respectively obtained Segment part；By the corresponding tag addresses in first address and second address, according to Natural Language Processing Models In the first syntactic model segmented, respectively obtain first address it is corresponding second participle part and second address Corresponding second participle part；The corresponding first participle part in first address and first address is second point corresponding Word part forms the corresponding first participle group in first address, by the corresponding first participle part in second address and described Second address corresponding second participle part forms the corresponding second participle group in second address.

In one embodiment, first address and second address respectively further comprise details address, above-mentioned processing Device is by the corresponding tag addresses in first address and second address, according to first in Natural Language Processing Models Syntactic model is segmented, and first address corresponding second participle part corresponding with second address the is respectively obtained After the step of two participle parts, comprising: by the corresponding details address in first address and second address, according to The second syntactic model in Natural Language Processing Models is segmented, and the corresponding third participle in the first address portion is respectively obtained Third corresponding with second address is divided to segment part；By the corresponding first participle part in first address, described first It is corresponding that address corresponding second participle part and first address corresponding third participle part form first address First participle group, by the corresponding first participle part in second address, second address corresponding second participle part And the corresponding third participle in the second address part forms the corresponding second participle group in second address.

In one embodiment, the range address includes province, city/area, the township Xian He/four, town administrative grade, the mark Will address includes cell name or building name, above-mentioned processor according to the second preset rules obtain all first segmentations with The step of matching result of all second segmentations, comprising: by all first segmentations and all second segmentations point It is not two mutually isostructural structure trees according to the Sequential Mapping of administrative grade from high to low, wherein the structure tree includes more A node, each node are corresponded with each first segmentation or each second segmentation respectively；Obtain two structure trees The corresponding matching value of each node；It is corresponding that corresponding first weight of the range address, the tag addresses are obtained respectively The corresponding third weight of second weight and the details address；Matching rate is calculated multiplied by respective weights according to matching value, respectively Obtain corresponding first matching rate of the range address, corresponding second matching rate of the tag addresses and the details address Corresponding third matching rate；By the adduction of first matching rate, second matching rate and the third matching rate, as institute State the matching result of all first segmentations with all second segmentations.

In one embodiment, above-mentioned processor obtains the step of the corresponding matching value of two each nodes of structure tree Suddenly, comprising: by corresponding each first segmentation of the range address in first address, and by the range in second address Corresponding each second segmentation in location, corresponds according to node corresponding relationship and carries out precisely full matching, obtain each first matching value；It will Corresponding each first segmentation of tag addresses in first address, it is corresponding each with by the tag addresses in second address Second segmentation, corresponds according to node corresponding relationship and carries out model keyword match, obtain each second matching value；By described Corresponding each first segmentation in details address in one address, it is each second point corresponding with by the details address in second address Section corresponds according to node corresponding relationship and carries out digital matching, obtains each third matching value；Summarize each first matching Value, each second matching value and each third matching value, obtain two described each node of structure tree corresponding With value.

In one embodiment, above-mentioned processor obtains corresponding first weight of the range address, the mark respectively Before the step of corresponding second weight in address and the corresponding third weight in the details address, comprising: will mark in advance similar The training sample of the specified quantity of angle value is input in the Natural Language Processing Models and is trained；By adjusting training ginseng Number keeps the similarity value of the Natural Language Processing Models output consistent with the pre- mark similarity value to the first parameter；It will Corresponding weighted value in first parameter corresponds to first weight, second power according to node corresponding relationship respectively Weight and the third weight.

It will be understood by those skilled in the art that structure shown in Fig. 3, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme.

One embodiment of the application also provides a kind of computer readable storage medium, is stored thereon with computer program, calculates Address matching method is realized when machine program is executed by processor, the first address is the address to be retrieved of user's input, the second address It is stored in index server, method includes: to call preset matching algorithm, respectively by first address and second address It is segmented according to the first preset rules, obtains the corresponding first participle group in first address and second address is corresponding Second participle group；First address is divided into multiple first segmentations according to the first participle group, according to described second point Second address is divided into multiple second segmentations by phrase；All first segmentations and institute are obtained according to the second preset rules There is the matching result of second segmentation；According to the matching result judge first address and second address whether phase Together.

Above-mentioned computer readable storage medium, the data being pre-stored in index server are unstructured data, storage Mode is the column storage form of key-value pair, and unstructured data refers to that text, image, voice etc. are based on NoSQL memory technology shape At column storage, data volume is very big, needs the NoSQL technology using distributed structure/architecture to be stored and calculated, index service Device is just being combined with the distributed structure/architecture storage of NoSQL and index structure and is realizing to the real-time quick inquiry of mass data and meter It calculates, proposes the configurable weight address matching model based on address GradeNDivision, first pass through Natural Language Processing Models over the ground Location title carries out participle and forms participle group, participle phrase is divided into segmentation according to administrative grade, and be tree-shaped by subsection compression Node in structure has fully considered the tree of address, and address is carried out classification according to administrative grade and draws section, each administration Rank two stage cultivation difference weight, the fine-tuning weight of practical business scene.By to the mass data prestored in index server Index structure is established, computing architecture and powerful distributed computation ability in conjunction with Elastic search component itself are real Now the first address is inquired real-time, quickly in default index structure.For sectional address first four administrative grade address, It is accurately matched according to national county and town, province, city and region address base (tree-shaped), in addition, carrying out effective completion for excalation.It is silent Recognize weight to obtain by training pattern training, by constantly regulate training parameter in the training process, exports model training Similarity is consistent with the similarity value marked in advance, or within the scope of predetermined deviation, and above-mentioned training parameter includes each weighted value, with It determines each weighted value, keeps weight setting more reliable.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, above-mentioned computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, Any reference used in provided herein and embodiment to memory, storage, database or other media, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double speed are according to rate SDRAM (SSRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, device, article or the method that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, device, article or method institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, device of element, article or method.

The foregoing is merely preferred embodiment of the present application, are not intended to limit the scope of the patents of the application, all utilizations Equivalent structure or equivalent flow shift made by present specification and accompanying drawing content is applied directly or indirectly in other correlations Technical field, similarly include in the scope of patent protection of the application.

Claims

1. a kind of address matching method, which is characterized in that the first address is the address to be retrieved of user's input, the storage of the second address In index server, method includes:

Preset matching algorithm is called, is respectively divided first address and second address according to the first preset rules Word obtains the corresponding second participle group of the corresponding first participle group in first address and second address, wherein described pre- If matching algorithm includes participle calculating and matching primitives；

First address is divided into multiple first segmentations according to the first participle group, according to the second participle group by institute It states the second address and is divided into multiple second segmentations；

2. address matching method according to claim 1, which is characterized in that first address and second address point Not Bao Kuo range address and tag addresses, the calling preset matching algorithm, respectively by the first address and the second address according to One preset rules are segmented, and obtain the corresponding first participle group in first address and second address is second point corresponding The step of phrase, comprising:

By the corresponding range address in first address and second address, according to being closed in advance in Natural Language Processing Models Connection address dictionary is segmented, and respectively obtains the corresponding first participle part in first address and second address is corresponding First participle part；

By the corresponding tag addresses in first address and second address, according in Natural Language Processing Models One syntactic model is segmented, and it is corresponding with second address to respectively obtain first address corresponding second participle part Second participle part；

It will be described in the composition of the corresponding first participle part in first address and first address corresponding second participle part The corresponding first participle group in first address, the corresponding first participle part in second address and second address is corresponding Second participle part forms the corresponding second participle group in second address.

3. address matching method according to claim 2, which is characterized in that first address and second address are also Details address is respectively included, it is described by the corresponding tag addresses in first address and second address, according to nature The first syntactic model in Language Processing model is segmented, respectively obtain first address it is corresponding second participle part and After the step of second address corresponding second segments part, comprising:

By the corresponding details address in first address and second address, according in Natural Language Processing Models Two syntactic models are segmented, and it is corresponding with second address to respectively obtain first address corresponding third participle part Third segments part；

By the corresponding first participle part in first address, first address corresponding second participle part and described the One address corresponding third participle part forms the corresponding first participle group in first address, and second address is corresponding First participle part, second address corresponding second participle part and the corresponding third in second address segment part Form the corresponding second participle group in second address.

4. address matching method according to claim 3, which is characterized in that the range address includes province, city/area, county With four, township/town administrative grade, the tag addresses include cell name or building name, described to be obtained according to the second preset rules The step of taking the matching result of all first segmentations and all second segmentations, comprising:

It is according to the Sequential Mapping of administrative grade from high to low respectively by all first segmentations and all second segmentations Two mutually isostructural structure trees, wherein the structure tree includes multiple nodes, each node respectively with each first segmentation or Each second segmentation corresponds；

The corresponding matching value of each node of structure tree of acquisition two；

Corresponding first weight of the range address, corresponding second weight of the tag addresses and the details are obtained respectively The corresponding third weight in address；

Matching rate is calculated multiplied by respective weights according to matching value, respectively obtains corresponding first matching rate of the range address, institute State the corresponding third matching rate of corresponding second matching rate of tag addresses and the details address；

By the adduction of first matching rate, second matching rate and the third matching rate, as described all described The matching result of one segmentation and all second segmentations.

5. address matching method according to claim 4, which is characterized in that two each nodes of structure tree of the acquisition The step of corresponding matching value, comprising:

By corresponding each first segmentation of the range address in first address, and by the range address pair in second address Each second segmentation answered, corresponds according to node corresponding relationship and carries out precisely full matching, obtain each first matching value；

By corresponding each first segmentation of the tag addresses in first address, and by the tag addresses pair in second address Each second segmentation answered, corresponds according to node corresponding relationship and carries out model keyword match, obtain each second matching value；

By corresponding each first segmentation in the details address in first address, and by the details address pair in second address Each second segmentation answered, corresponds according to node corresponding relationship and carries out digital matching, obtain each third matching value；

Summarize each first matching value, each second matching value and each third matching value, obtains two knots The corresponding matching value of each node of paper mulberry.

6. address matching method according to claim 5, which is characterized in that described to obtain the range address correspondence respectively The first weight, corresponding second weight of the tag addresses and the step of the corresponding third weight in the details address it Before, comprising:

By the training sample of the specified quantity of pre- mark similarity value, it is input in the Natural Language Processing Models and is instructed Practice；

By adjusting training parameter to the first parameter, make the similarity value and the pre- mark of the Natural Language Processing Models output It is consistent to infuse similarity value；

By corresponding weighted value in first parameter, first weight, described is corresponded to according to node corresponding relationship respectively Second weight and the third weight.

7. address matching method according to claim 2, which is characterized in that the calling preset matching algorithm respectively will First address and second address are segmented according to the first preset rules, obtain first address corresponding first Before the step of participle group and the corresponding second participle group in second address, comprising:

It is pre- to obtain by non-structured being indexed of address date for the specified quantity being pre-stored in the index server If index structure；

Receive the interface card that is uploaded under the specified directory of the index server, wherein the interface card is by by institute It states after preset matching algorithm carries out packing encapsulation and is formed；

Obtain the configuration parameter of the interface card；

The default index structure and the interface card are established into calculating incidence relation by running the configuration parameter.

8. a kind of address matching device, which is characterized in that the first address is the address to be retrieved of user's input, the storage of the second address In index server, device includes:

Word segmentation module, it is respectively that first address and second address is pre- according to first for calling preset matching algorithm If rule is segmented, corresponding second participle of the corresponding first participle group in first address and second address is obtained Group, wherein the preset matching algorithm includes participle calculating and matching primitives；

Division module, for first address to be divided into multiple first segmentations according to the first participle group, according to described Second address is divided into multiple second segmentations by the second participle group；

Second obtains module, for obtaining all first segmentations and all second segmentations according to the second preset rules Matching result；

9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 7 is realized when being executed by processor.