WO2021000831A1

WO2021000831A1 - Address matching method and apparatus, computer device and storage medium

Info

Publication number: WO2021000831A1
Application number: PCT/CN2020/098804
Authority: WO
Inventors: 申超波; 阮晓雯; 徐亮
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-07-03
Filing date: 2020-06-29
Publication date: 2021-01-07
Also published as: CN110442603B; CN110442603A

Abstract

The present application relates to the field of big data, and discloses an address matching method and apparatus, a computer device, and a storage medium, wherein in the address matching method a first address is an address to be retrieved inputted by a user, and a second address is stored in an index server, the method comprising: invoking a preset matching algorithm, and respectively performing word segmentation on a first address and a second address on the basis of a first preset rule to obtain a first word segmentation group corresponding to the first address and a second word segmentation group corresponding to the second address, the preset matching algorithm comprising word segmentation calculation and matching calculation; on the basis of the first word segmentation group, dividing the first address into a plurality of first segments and, on the basis of the second word segmentation group, dividing the second address into a plurality of second segments; on the basis of a second preset rule, acquiring a matching result of the first segments and the second segments, and determining whether the first address and the second address are the same. For the first four administrative level addresses of the segmented address, precise matching is implemented on the basis of an address database (tree type) of nationwide provinces, municipalities, counties and towns, and partial omissions are effectively completed.

Description

Address matching method, device, computer equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on July 3, 2019, the application number is 201910601364.8, and the invention title is "Address matching method, device, computer equipment and storage medium", the entire content of which is incorporated by reference In this application.

Technical field

This application relates to the field of big data, in particular to address matching methods, devices, computer equipment and storage media.

Background technique

Traditional address fuzzy matching often takes the address as a complete individual and performs fuzzy matching based on NLP, but the inventor realizes that this method has the following defects: 1) The structure of the address is a tree structure of address names, the closer it is to a tree. The bottom layer of the structure is similar, but the address names matched as a whole are parallel structure comparison, which does not conform to the actual distribution structure of the address name; 2) The comparison effect for short addresses will be relatively poor, but most short addresses have relatively short addresses. Good value. 3) The address name of the same address has the same value as individual words, but it is inconsistent in practice. For example, Shenzhen/Nanshan District/Tencent Building, where the address name Tencent Building is obviously more valuable as an effective address.

technical problem

The main purpose of this application is to provide an address matching method, which aims to solve the technical problem of existing address matching defects.

Technical solutions

This application proposes an address matching method. The first address is the address to be retrieved input by the user, and the second address is stored in the index server. The method includes:

Call a preset matching algorithm, respectively segment the first address and the second address according to the first preset rule, and obtain the first segmentation group corresponding to the first address and the second segmentation group corresponding to the second address Phrase segmentation, wherein the preset matching algorithm includes word segmentation calculation and matching calculation;

Dividing the first address into a plurality of first segments according to the first word segmentation, and dividing the second address into a plurality of second segments according to the second word segmentation;

Obtaining matching results of all the first segments and all the second segments according to a second preset rule;

Determine whether the first address and the second address are the same according to the matching result.

This application also provides an address matching device, the first address is the address to be retrieved input by the user, the second address is stored in the index server, and the device includes:

The word segmentation module is used to call a preset matching algorithm, and respectively segment the first address and the second address according to a first preset rule to obtain the first segmentation group and the second segment corresponding to the first address The second word segmentation group corresponding to the address, wherein the preset matching algorithm includes word segmentation calculation and matching calculation;

A dividing module, configured to divide the first address into a plurality of first segments according to the first phrase group, and divide the second address into a plurality of second segments according to the second phrase group;

A second acquiring module, configured to acquire matching results of all the first segments and all the second segments according to a second preset rule;

The judgment module is configured to judge whether the first address and the second address are the same according to the matching result.

The present application also provides a computer device, including a memory and a processor, the memory stores a computer program, the processor implements an address matching method when the computer program is executed, and the first address is the address to be retrieved input by the user, The second address is stored in the index server, and the method includes:

This application also provides a computer-readable storage medium on which a computer program is stored, which implements an address matching method when the computer program is executed by a processor, the first address is the address to be retrieved input by the user, and the second address is stored in In the index server, the methods include:

Beneficial effect

In this application, the first four administrative-level addresses of the segmented addresses are accurately matched according to the national provinces, municipalities, counties and towns address database (tree-shaped). In addition, partial missing is effectively completed, and the massive amount of pre-stored in the index server The data is built into an index structure, combined with the Elasticsearch component's own computing architecture and powerful distributed computing capabilities, to achieve real-time fast query of the first address in the preset index structure.

Description of the drawings

FIG. 1 is a schematic flowchart of an address matching method according to an embodiment of the present application;

Fig. 2 is a schematic structural diagram of an address matching device according to an embodiment of the present application;

Fig. 3 is a schematic diagram of the internal structure of a computer device according to an embodiment of the present application.

The best mode of the invention

1, in an address matching method of an embodiment of the present application, the first address is an address to be retrieved input by a user, and the second address is stored in an index server, and the method includes:

S1: Invoke a preset matching algorithm, respectively segment the first address and the second address according to the first preset rule, and obtain the first segmentation group corresponding to the first address and the second segmentation group corresponding to the second address , Wherein the preset matching algorithm includes word segmentation calculation and matching calculation.

In this embodiment, taking the comparison of the similarity between the first address and the second address as an example, the above-mentioned first address and the second address are written according to the administrative level from high to low, and from range to specific. The first preset rule of this embodiment has different word segmentation rules according to the administrative level in the address. For example, the word segmentation corresponding to the four administrative levels of province/city/district, county/township, and town is commonly used nationwide. The general address database performs word segmentation. For example, in Guicheng Town, Nanhai District, Foshan City, Guangdong Province, the word segmentation results are as follows: Guangdong Province/Foshan City/Nanhai District/Guicheng Town. For the address information outside the four administrative levels of provinces/cities/districts, counties/townships, and towns, word segmentation is performed through semantic segmentation.

S2: Divide the first address into a plurality of first segments according to the first phrase group, and divide the second address into a plurality of second segments according to the second phrase group.

In this embodiment, the address is segmented and/or administrative levels are divided according to the word segmentation phrase corresponding to the address, and each segment or each administrative level corresponds to one or more word segmentation. In order to facilitate the distinction between the first address corresponding to each first segment, and the second address corresponding to each second segment, the "first", "second", etc. in this embodiment are only used for distinction and are not used for limitation. Other places are similar The terms have the same effect and will not be repeated. The word segmentation group is the word segmentation arrangement of the actual address, which is formed according to the writing order of the original address. For example, the long-named "development zone of a certain city" corresponds to two participles "a certain city/development zone", but the segmentation is based on the word segmentation based on administrative levels. For example, "development zone of a certain city" belongs to one Segmented.

S3: Obtain matching results of all the first segments and all the second segments according to a second preset rule.

In this embodiment, the first segment and the second segment are matched one by one according to the corresponding relationship of the administrative level to obtain the matching result. For example, the first segment corresponding to the province level of the first address is compared with the second segment corresponding to the province level of the second address, so as to improve the symmetry and reliability of information comparison.

S4: Determine whether the first address and the second address are the same according to the matching result.

This embodiment compares the first address and the second address in a one-to-one correspondence through the correspondence of administrative levels. When the matching rate of the first address and the second address reaches the preset range, it is determined that the first address and the second address are the same, otherwise different. In other embodiments of the present application, not only is the matching rate required to reach the preset range, but also the segment matching degree corresponding to the designated administrative level is required to reach 100%, before it can be determined that the first address and the second address are the same, otherwise different, in order to improve matching accuracy degree.

The first address in this embodiment is the address to be queried entered by the user, and the data composition structure of the first address is not limited, and it can all realize the matching calculation of the address to be queried, which improves the flexibility and freedom of the user. For example, the first address includes data arranged in sequence according to six administrative levels: province, city/district/county/town, township/road, community, building/building, and house number, or includes missing one or several administrative levels Level of data composition. The preset matching condition in this embodiment includes the matching rate reaching a preset threshold, or the marking data in the first address reaching 100% matching, and so on. The aforementioned sign data refers to the data information in the first address that can specify the geographic location, such as the name of a certain community or the name of a certain building. For example, "Rongyuan of Jiangnan Mingju Residential Quarter" included in the first address is the mark data. In another embodiment of the present application, the sign data of the first address is after the administrative level of "town, township", and the data information before "building and house number" is sign data.

Further, the first address and the second address respectively include a range address and a flag address, and the preset matching algorithm is invoked, and the first address and the second address are respectively segmented according to a first preset rule to obtain all The step S1 of the first word segmentation corresponding to the first address and the second word segmentation corresponding to the second address includes:

S11: Perform word segmentation on the range addresses corresponding to the first address and the second address respectively according to the pre-associated address dictionary in the natural language processing model to obtain the first segmentation part corresponding to the first address and the first segmentation part respectively. The first segmentation part corresponding to the second address.

The scope address of this embodiment includes at least one of the four administrative levels of province/city/district, county/township, and town. The range address in this embodiment is segmented through a pre-associated address dictionary. The address dictionary is a corresponding vocabulary in a national address database, and the address name is segmented by pre-associating with a natural language processing model. The preset matching algorithm in this embodiment includes analysis calculation and matching calculation. In order to improve the accuracy of address matching, the crawler address library is added when the open source word segmentation algorithm package jieba is used for word segmentation calculation, and it is used in combination with the national address library to treat word segmentation The address is corrected, and then word segmentation is performed according to the administrative level to improve the accuracy of word segmentation. By judging whether the administrative level contained in the current address is the administrative level corresponding to the calling address dictionary, if so, the address dictionary is called for word segmentation calculation. For example, the address: 306, Building 1, Rongyuan, Jiangnan Mingju Community, Guicheng Town, Nanhai District, Foshan City, Guangdong Province, including the four-level administrative level corresponding to the address dictionary, then the four-level administrative level corresponding to the address is segmented according to the address dictionary. The result of word segmentation is as follows: Guangdong Province/Foshan City/Nanhai District/Guicheng Town/Jiangnan Mingju Residential District Rongyuan Block 306. The first participle corresponds to Guangdong Province/Foshan City/Nanhai District/Guicheng Town.

S12: Perform word segmentation according to the first grammar model in the natural language processing model with the flag addresses corresponding to the first address and the second address, respectively, to obtain the second segmentation part corresponding to the first address and the The second segmentation part corresponding to the second address.

The logo address in this embodiment includes information that can specify a geographic location, such as the name of a certain community or the name of a certain building. For example, "Jiangnan Mingju Community Rongyuan" in the above address. In this embodiment, the token address is segmented according to the first grammar model in the natural language processing model. The first grammar model includes, but is not limited to, "a certain cell" and "a certain building". For example, "306, Block 1, Rongyuan, Jiangnan Mingju Community, Guicheng Town", the corresponding second participle is "Guicheng/Jiangnan Mingju Community/Rongyuan". The first grammar model of another embodiment of the present application is that after extracting "town, township", the characters before "building and house number" are the sign addresses.

S13: Combine the first word segmentation part corresponding to the first address and the second word segmentation part corresponding to the first address into a first word segmentation group corresponding to the first address, and group the first word segmentation corresponding to the second address The part and the second word segmentation part corresponding to the second address form a second word segmentation group corresponding to the second address.

The first address or the second address in this embodiment both include a range address and a mark address, and are arranged from left to right to form the first address or the second address. For example, the first address is "Jiangnan Mingju Rongyuan, Guicheng Town, Nanhai District, Foshan City, Guangdong Province"; the second address is "Jiangnan Mingju Rongyuan, Guicheng Town, Nanhai District, Foshan City, Guangdong Province"; the first address corresponds to the first address One sub-phrase is "Guangdong Province/Foshan City/Nanhai District/Guicheng Town/Jiangnan Mingju Community/Rongyuan" and the second sub-phrase corresponding to the second address is "Guangdong Province/Foshan City/Nanhai District/Guicheng Town/Jiangnan Mingju/Rongyuan".

Further, the first address and the second address respectively include detailed addresses, and the marking addresses corresponding to the first address and the second address are performed according to the grammar model in the natural language processing model. After the step S13 of obtaining the second word segmentation part corresponding to the first address and the second word segmentation part corresponding to the second address respectively, the method includes:

S14: The detailed addresses corresponding to the first address and the second address are segmented according to the second grammar model in the natural language processing model to obtain the third segmentation part corresponding to the first address and the The third word segmentation part corresponding to the second address.

The detailed address in this embodiment is the specific "building and house number", which has a small effect and influence on matching the similarity of two addresses, and this part of content can even be ignored in other embodiments. However, for some specific application scenarios, the detailed address needs to be accurate to meet business needs. The second grammar model of this embodiment includes but is not limited to "a certain building", "a certain building and a certain floor", "a certain building and a certain room" and so on.

S15: Combine a first word segmentation part corresponding to the first address, a second word segmentation part corresponding to the first address, and a third word segmentation part corresponding to the first address into a first word segmentation group corresponding to the first address Forming the first word segmentation part corresponding to the second address, the second word segmentation part corresponding to the second address, and the third word segmentation part corresponding to the second address into a second word segmentation group corresponding to the second address.

The first address or the second address in this embodiment both include a range address, a mark address, and a detail address, and are arranged from left to right to form the first address or the second address. For example, the first address is “306, Block 1, Rongyuan, Jiangnan Mingju Community, Guicheng Town, Nanhai District, Foshan City, Guangdong Province”; the second address is “502, Building 1, Jiangnan Mingju Rongyuan, Guicheng Town, Nanhai District, Foshan City, Guangdong Province”; The first segment corresponding to the first address is "Guangdong Province/Foshan City/Nanhai District/Guicheng Town/Jiangnan Mingju Community/Rongyuan/1 Block/306" and the second segment corresponding to the second address is "Guangdong Province /Foshan City/Nanhai District/Guicheng Town/Jiangnan Mingju/Rongyuan/1 Block/502" in order to divide the first or second address into sections or administrative levels according to the above-mentioned word segmentation phrases.

Further, the range address includes four administrative levels of province/city/district, county/township, and town, the mark address includes a cell name or a building name, and the first points are obtained according to a second preset rule. Step S3 of the matching result of the segment and all the second segments includes:

S31: Map all the first segments and all the second segments into two structure trees with the same structure in the order of administrative level from high to low, where the structure tree includes multiple nodes, each The nodes respectively correspond to each of the first segments or each of the second segments in a one-to-one correspondence.

In this embodiment, by mapping all first segments corresponding to the first address, or all second segments corresponding to the second address, into two structure trees with the same structure in the order of administrative level, one node is at least Correspond to a segment, or a node corresponds to multiple word segments of the same administrative level. For example, the participle "Guangdong Province" corresponding to the highest administrative level "province" contained in the first address is used as the root node, and then the participle "Foshan City" corresponding to the next-level sub-node "city" is sequentially connected, and then connected to the end by analogy Node "1 Block 502" and so on. Depending on the specific address information, the root node and the end node respectively correspond to different administrative levels. It can be a full address covering all administrative levels, or a short address covering some administrative levels.

S32: Obtain matching values corresponding to each node of the two structure trees.

The matching calculation in this embodiment is to map the corresponding relationship between the nodes and the nodes between the two structure trees according to the corresponding relationship of administrative levels, and obtain and calculate the matching value corresponding to each node according to the above-mentioned corresponding relationship. The matching value includes matching The segment is divided by all the segments corresponding to the node. For example, if a node corresponding to the first address is a "province" node, it is assigned the value "Guangdong", and the "province" node corresponding to the second address is also assigned a value of "Guangdong", it is a match, otherwise it does not match.

S33: Obtain the first weight corresponding to the range address, the second weight corresponding to the mark address, and the third weight corresponding to the detail address, respectively.

In this embodiment, different weights are set according to the different impacts of the corresponding segments of each administrative level on the address, so as to improve the flexibility of meeting business requirements. For example, the second weight corresponding to the flag address is higher than the first weight corresponding to the range address.

S34: Calculate the matching rate according to the matching value multiplied by the corresponding weight to obtain the first matching rate corresponding to the range address, the second matching rate corresponding to the mark address, and the third matching rate corresponding to the detail address.

The formula for calculating the matching rate in this embodiment is: the matching result of each segment * the configuration weight of each segment is equal to the matching rate of each segment, and the matching rates of each segment are added to obtain the matching between the first address and the second address result.

S35: The sum of the first matching rate, the second matching rate, and the third matching rate is used as a matching result of all the first segments and all the second segments.

Further, the step S32 of obtaining the matching value corresponding to each node of the two structure trees includes:

S321: Perform precise and full matching of each first segment corresponding to the range address in the first address with each second segment corresponding to the range address in the second address according to the node correspondence relationship. , Get each first matching value.

The matching methods for nodes corresponding to different administrative levels in this embodiment are different. The four administrative levels of province/city/district, county/township, and town are matched through the exact correspondence method of full matching, that is, if the corresponding characters are 100% corresponding to the same, it is a match , Otherwise it does not match. For example, if the "province" node corresponding to the first address is assigned the value "Guangdong", and the "province" node corresponding to the first address is assigned the value "Guangdong", it is a match.

S322: Perform a one-to-one correspondence between each first segment corresponding to the flag address in the first address and each second segment corresponding to the flag address in the second address to perform model keywords according to the node correspondence relationship. Match to obtain each second matching value.

In this embodiment, the corresponding segment of the mark address is matched by NLP (Natural Language Processing) model matching, and the matching relationship can be realized by including or including. For example, "Jiangnan Mingju Community/Rongyuan" and "Jiangnan Mingju/Rongyuan", although the characters do not have a full matching relationship, but "Jiangnan Mingju Community" contains the characters "Jiangnan Mingju", still There is a one-to-one matching relationship.

S323: Perform digital matching for each first segment corresponding to the detail address in the first address and each second segment corresponding to the detail address in the second address in a one-to-one correspondence according to the node correspondence. Obtain each third matching value.

The detailed address in this embodiment includes the first specified number of segments, but the number of segments that meet the matching relationship is the second specified number, and the matching value corresponding to the detailed address is the second specified number divided by the first specified number.

S324: Summarize each of the first matching values, each of the second matching values, and each of the third matching values to obtain matching values corresponding to each node of the two structure trees.

For example, the word segmentation phrase corresponding to the first address is: Guangdong/Foshan City/Nanhai/Guicheng/Jiangnan Mingju Community/Rongyuan/1/306; the word segmentation phrase corresponding to the second address is: Guangdong/Foshan City/Nanhai/Guicheng/ Jiangnan Mingju/Rongyuan/1/502; after segmentation, the first and second addresses are divided into six administrative levels, including province/city/district, county/town, township/road, community, building/building and The house number is divided into six nodes respectively, and the default weight of each node is "0.1/0.1/0.1/0.1/0.5/0.1". The first four administrative levels are 100% character matching: Guangdong/Foshan City/Nanhai/Guicheng, the matching results are 0.1*1/0.1*1/0.1*1/0.1*1; the fifth administrative level matching is a model of character inclusion relations Matching: The matching result of Jiangnan Mingju Community/Rongyuan and Jiangnan Mingju/Rongyuan is 0.5*1; the sixth administrative level matching is fuzzy matching: 1/306 and 1/502 matching, there is only one corresponding two fields The field has a matching relationship. If 306 and 502 do not match, the corresponding matching value is 0.5, and the matching result is 0.5*0.1, that is, 0.05. Then the matching ratio between the first address and the second address is: 0.1+0.1+0.1+0.1+0.5+0.05=0.95.

Further, before step S33 of separately acquiring the first weight corresponding to the range address, the second weight corresponding to the mark address, and the third weight corresponding to the detail address, the method includes:

S331: Input a specified number of training samples pre-labeled with similarity values into the natural language processing model for training.

S332: Make the similarity value output by the natural language processing model consistent with the pre-labeled similarity value by adjusting the training parameter to the first parameter.

S333: Corresponding to the corresponding weight values in the first parameter as the first weight, the second weight, and the third weight according to the node correspondence relationship.

The default weights in this embodiment are obtained through training of the training model, and the training parameters are continuously adjusted during the training process, so that the similarity of the model training output is consistent with the pre-marked similarity value, or within the preset deviation range. The above training parameters include Each weight value to determine each weight value. Other embodiments of the present application may also adjust one or more of the default weights according to specific application scenarios, so that the matching model is more in line with the current application scenarios.

Further, the range addresses corresponding to the first address and the second address are segmented according to the pre-associated address dictionary in the natural language processing model, and the first segmentation part and the first segmentation part corresponding to the first address and Before step S11 of the first word segmentation part corresponding to the second address, the method includes:

S10: Call the address database to perform address correction on the first address and the second address respectively according to the third preset rule.

The first address or the second address in this embodiment may be inconsistent with address data in the national address database, and address correction can be performed by calling the address database, including address completion, removal of qualifiers, and so on. When completing the address in this embodiment, the root node is complemented based on the sub-nodes. For example, Nanhai District can complement Foshan City upwards; or the intermediate nodes can be complemented based on the front and rear nodes, such as Foshan City and Guicheng Town, which can complement Nanhai District in the middle. Method for address completion.

Further, a preset matching algorithm is called, and the first address and the second address are respectively segmented according to a first preset rule to obtain the first segmentation group corresponding to the first address and the corresponding second address Before step S1 of the second word grouping, include:

S1a: Indexing a specified number of unstructured address data pre-stored in the index server to obtain the preset index structure.

The data pre-stored in the index server of this embodiment is unstructured data, and its storage method is the column storage form of key-value pairs. Unstructured data refers to column storage formed based on NoSQL storage technology such as text, image, and voice. The amount of data is very large, and the distributed architecture of NoSQL technology needs to be used for storage and calculation. The index server combines the NoSQL distributed architecture storage and index structure to achieve real-time and fast query and calculation of massive data. NOSQL is a non-relational database, an open source technology. Elasticsearch is based on the storage method of Key-value key-value pairs and inverted indexes, and the calculation is mainly based on memory to achieve fast real-time calculation.

S1b: Receive the interface plug-in uploaded to the designated directory of the index server, where the interface plug-in is formed by packaging and encapsulating the preset matching algorithm.

The index server in this embodiment is an open source component and supports a plug-in mode. The interface plug-in can inherit its rg. index server. plugins. Plugin class to customize and expand the address matching algorithm plug-in developed by restarting the index server to load and use.

S1c: Obtain the configuration parameters of the interface plug-in.

S1d: Establish a calculation association relationship between the preset index structure and the interface plug-in by running the configuration parameter.

In this embodiment, after the preset matching algorithm is developed, it is packaged and packaged and then uploaded to the specified directory of the index server and configured for related configuration parameters, so as to realize the calculation of the preset index structure and the interface plug-in by loading and running configuration parameters The association relationship is realized by calling the address matching algorithm in the plug-in to complete the matching calculation of the first address in the preset index structure to realize the address data query.

The index server in this embodiment is an open-source Elasticsearch component (Elasticsearch is used for distributed full-text search), which provides a full-text search engine with distributed computing capabilities based on a RESTful web interface, and can perform real-time and fast queries on massive data. The query steps include: (1) Import the addresses of the massive address library into the underlying storage of elasticsearch in the form of key-value pairs according to the data import interface of elasticsearch, and index the keys. (2) The ground matching model of (1) is transformed according to the elasticsearch custom extended search model, and added to the elasticsearch master node extension module, and elasticsearch is restarted to make it a distributed storage and high concurrent computing based on the use of elasticsearch Address matching model. (3) Use this custom model to develop a one-to-many mass address matching interface on elasticsearch. (4) By developing the upper-level interface on elasticsearch, it is possible to enter a new address, and select the mass address library and custom model to be matched, that is, based on elasticsearch, the new address and the address in the mass address library can be quickly calculated. And return the most similar TOPN address, where N can be programmed to pass parameters. In this embodiment, by establishing an index structure for the massive data pre-stored in the index server, combining with the computing architecture of the Elasticsearch component itself and powerful distributed computing capabilities, real-time fast querying of the first address in the preset index structure is realized.

This embodiment has different matching methods for different segments corresponding to different administrative levels of the first address, different matching models, and different matching weights corresponding to each segment. The first address in this embodiment is divided into six segments, corresponding to six administrative levels, corresponding to six nodes in the tree structure. The matching models of the first four administrative levels in the six administrative levels are the same, and the characters are matched one by one. ; The fifth administrative level adopts the fuzzy matching model of inclusion or inclusion; the sixth administrative level adopts the digital matching model to match. In this embodiment, a filtering mechanism is set in the matching calculation process. First, the target segmentation corresponding to the four administrative levels of "province/city, district/county/town, township, and road" is accurately matched by character one by one. Matching calculation, when the matching calculation result for the target segment corresponding to the four administrative levels is lower than a preset threshold, it is determined that there is no address data in the preset index structure that meets the preset matching condition with the first address , Output the matching conclusion directly to reduce the amount of matching calculation and improve the response speed. In this embodiment, by setting a filtering mechanism, at least 90% of addresses can be filtered. In this way, an address only needs to be fully matched with the remaining 10% of the addresses, which greatly saves computing resources.

2, in the address matching device of an embodiment of the present application, the first address is an address to be retrieved input by a user, and the second address is stored in an index server, and the device includes:

The word segmentation module 1 is used to call the preset matching algorithm, and respectively segment the first address and the second address according to the first preset rule to obtain the first segmentation group and the second address corresponding to the first address The corresponding second word segmentation group, wherein the preset matching algorithm includes word segmentation calculation and matching calculation.

The dividing module 2 is configured to divide the first address into a plurality of first segments according to the first phrase group, and divide the second address into a plurality of second segments according to the second phrase group.

The first obtaining module 3 is configured to obtain the matching results of all the first segments and all the second segments according to a second preset rule.

The judging module 4 is configured to judge whether the first address and the second address are the same according to the matching result.

The first address in this embodiment is the address to be queried entered by the user, and the data composition structure of the first address is not limited, and it can all realize the matching calculation of the address to be queried, which improves the flexibility and freedom of the user. For example, the first address includes data arranged in sequence according to six administrative levels: province, city/district/county/town, township/road, community, building/building, and house number, or includes missing one or several administrative levels Level of data composition. The preset matching condition in this embodiment includes the matching rate reaching a preset threshold, or the marking data in the first address reaching 100% matching, and so on. The above-mentioned sign data refers to the data information in the first address that can specify the geographic location, such as the name of a certain community or the name of a certain building. For example, "Rongyuan of Jiangnan Mingju Residential Quarter" included in the first address is the mark data. In another embodiment of the present application, the sign data of the first address is after the administrative level of "town, township", and the data information before "building and house number" is sign data.

Further, the word segmentation module 1 includes:

The first word segmentation unit is used to segment the range addresses corresponding to the first address and the second address respectively according to the pre-associated address dictionary in the natural language processing model to obtain the first segmentation corresponding to the first address. Part and the first word segmentation part corresponding to the second address.

The scope address of this embodiment includes at least one of the four administrative levels of province/city/district, county/township, and town. The range address in this embodiment is segmented through a pre-associated address dictionary. The address dictionary is a corresponding vocabulary in a national address database, and the address name is segmented by pre-associating with a natural language processing model. In order to improve the accuracy of address matching, this embodiment adds a crawler address library when performing word segmentation calculations in the open source word segmentation algorithm package jieba, and uses it in combination with the national address library to correct the address to be segmented, and then perform word segmentation according to the administrative level to improve The accuracy of word segmentation. By judging whether the administrative level contained in the current address is the administrative level corresponding to the calling address dictionary, if so, calling the address dictionary for word segmentation. For example, the address: 306, Building 1, Rongyuan, Jiangnan Mingju Community, Guicheng Town, Nanhai District, Foshan City, Guangdong Province, including the four-level administrative level corresponding to the address dictionary, then the four-level administrative level corresponding to the address is segmented according to the address dictionary. The result of word segmentation is as follows: Guangdong Province/Foshan City/Nanhai District/Guicheng Town/Jiangnan Mingju Residential District Rongyuan Block 306. The first participle corresponds to Guangdong Province/Foshan City/Nanhai District/Guicheng Town.

The second word segmentation unit is used to segment the flag addresses corresponding to the first address and the second address respectively according to the first grammar model in the natural language processing model to obtain the second address corresponding to the first address. The word segmentation part and the second word segmentation part corresponding to the second address.

The first component unit is configured to combine a first word segmentation part corresponding to the first address and a second word segmentation part corresponding to the first address into a first word segmentation group corresponding to the first address, and to combine the second address The corresponding first segmentation part and the second segmentation part corresponding to the second address form a second segmentation group corresponding to the second address.

Further, the first address and the second address also respectively include detailed addresses, and the word segmentation module 1 includes:

The third word segmentation unit is used to segment the detailed addresses corresponding to the first address and the second address respectively according to the second grammar model in the natural language processing model to obtain the third address corresponding to the first address. The word segmentation part and the third word segmentation part corresponding to the second address.

The second component unit is used to combine the first word segmentation part corresponding to the first address, the second word segmentation part corresponding to the first address, and the third word segmentation part corresponding to the first address into the first address corresponding The first word segmentation group of the second address, the first word segmentation portion corresponding to the second address, the second word segmentation portion corresponding to the second address, and the third word segmentation portion corresponding to the second address form the second address corresponding to the The second sub-phrase.

Further, the scope address includes four administrative levels of province/city/district, county/township, and town. The first acquisition module 3 includes:

The mapping unit is configured to map all the first segments and all the second segments into two structure trees with the same structure in the order of administrative level from high to low, wherein the structure tree includes multiple Nodes, each node corresponds to each of the first segment or each of the second segments, respectively.

The first obtaining unit is used to obtain the matching values corresponding to the respective nodes of the two structure trees.

In this embodiment, the corresponding relationship between nodes and nodes between two structure trees is mapped according to the corresponding relationship of administrative levels, and the matching value corresponding to each node is obtained according to the above corresponding relationship. The matching value includes the matching segment divided by the node All corresponding segments. For example, if a node corresponding to the first address is a "province" node, it is assigned the value "Guangdong", and the "province" node corresponding to the second address is also assigned a value of "Guangdong", it is a match, otherwise it does not match.

The second obtaining unit is configured to obtain the first weight corresponding to the range address, the second weight corresponding to the mark address, and the third weight corresponding to the detail address, respectively.

The calculation unit is configured to calculate the matching rate according to the matching value multiplied by the corresponding weight to obtain the first matching rate corresponding to the range address, the second matching rate corresponding to the flag address, and the third matching rate corresponding to the detail address, respectively .

The summation unit is configured to sum the first matching rate, the second matching rate, and the third matching rate as the sum of all the first segments and all the second segments Match results.

Further, the first obtaining unit includes:

The first matching subunit is used to compare each first segment corresponding to the range address in the first address with each second segment corresponding to the range address in the second address, according to the node correspondence relationship one One-to-one correspondence performs accurate full matching, and each first matching value is obtained.

The second matching subunit is used to compare each first segment corresponding to the flag address in the first address with each second segment corresponding to the flag address in the second address, according to the node correspondence relationship. One-to-one matching of model keywords is performed to obtain each second matching value.

The third matching subunit is used to connect each first segment corresponding to the detail address in the first address to each second segment corresponding to the detail address in the second address, and perform a number one-to-one correspondence according to the node correspondence. Match to obtain each third matching value.

The summarizing subunit is used to summarize each of the first matching values, each of the second matching values, and each of the third matching values to obtain matching values corresponding to each node of the two structure trees.

Further, the first obtaining module 3 includes:

The input unit is used to input a specified number of training samples with pre-labeled similarity values into the natural language processing model for training.

The adjustment unit is configured to adjust the training parameter to the first parameter to make the similarity value output by the natural language processing model consistent with the pre-labeled similarity value.

The corresponding unit is configured to correspond the corresponding weight value in the first parameter to the first weight, the second weight, and the third weight according to the node correspondence relationship.

The default weights in this embodiment are obtained through training of the training model. By continuously adjusting the training parameters during the training process, the similarity of the model training output is consistent with the pre-marked similarity value, or within the preset deviation range. The above training parameters include Each weight value to determine each weight value. Other embodiments of the present application may also adjust one or more of the default weights according to specific application scenarios, so that the matching model is more in line with the current application scenarios.

Further, the word segmentation module 1 includes:

The calling unit is configured to call the address database to perform address correction on the first address and the second address respectively according to a third preset rule.

Further, the address matching device further includes:

The index module is used for indexing a specified number of unstructured address data pre-stored in the index server to obtain the preset index structure.

The receiving module is configured to receive the interface plug-ins uploaded to the designated directory of the index server, wherein the interface plug-ins are formed by packaging the preset matching algorithm.

The second acquiring module is used to acquire the configuration parameters of the interface plug-in.

The establishment module is used to establish a calculation association relationship between the preset index structure and the interface plug-in through the operation configuration parameter.

In this embodiment, after the address matching algorithm is developed, it is packaged and packaged and uploaded to the specified directory of the index server and configured with related configuration parameters, so as to realize the calculation association between the preset index structure and the interface plug-in by loading and operating configuration parameters The relationship is realized by calling the address matching algorithm in the plug-in to complete the matching calculation of the first address in the preset index structure to realize the address data query.

The index server in this embodiment is an open-source Elasticsearch component (Elasticsearch is used for distributed full-text search), which provides a full-text search engine with distributed computing capabilities based on a RESTful web interface, and can perform real-time and fast queries on massive data. The query steps include: (1) Import the addresses of the massive address library into the underlying storage of elasticsearch in the form of key-value pairs according to the data import interface of elasticsearch, and index the keys. (2) The ground matching model of (1) is transformed according to the elasticsearch custom extended search model, and added to the elasticsearch master node extension module, and elasticsearch is restarted to make it a distributed storage and high concurrent computing based on the use of elasticsearch Address matching model. (3) Use this custom model to develop a one-to-many mass address matching interface on elasticsearch. (4) By developing the upper-level interface on elasticsearch, it is possible to enter a new address, and select the mass address library and custom model to be matched, that is, based on elasticsearch, the new address and the address in the mass address library can be quickly calculated. And return the most similar TOPN address, where N can be programmed to pass parameters. In this embodiment, an index structure is established on the massive data pre-stored in the index server, combined with the computing architecture of the Elasticsearch component itself and the powerful distributed computing capability, to implement real-time fast query of the first address in the preset index structure.

3, an embodiment of the present application also provides a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 3. The computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. The computer designed processor is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store all the data needed for the address matching process. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize the address matching method.

The above-mentioned processor executes the above-mentioned address matching method, the first address is the address to be retrieved input by the user, and the second address is stored in the index server. The method includes: invoking a preset matching algorithm, and separately comparing the first address and the second address. The address is word segmented according to a first preset rule, and a first word group corresponding to the first address and a second word group corresponding to the second address are obtained; the first address is divided according to the first word group Multiple first segments, divide the second address into multiple second segments according to the second word segmentation; obtain all the first segments and all the second segments according to a second preset rule The matching result of the segment; judging whether the first address and the second address are the same according to the matching result.

For the above computer equipment, the data pre-stored in the index server is unstructured data, and its storage method is the column storage form of key-value pairs. Unstructured data refers to the column storage formed based on NoSQL storage technology such as text, image, and voice. The amount of data is very large, and it is necessary to use the distributed architecture of NoSQL technology for storage and calculation. The index server combines the NoSQL distributed architecture storage and index structure to achieve real-time fast query and calculation of massive data. It is proposed based on multiple addresses. The address matching model with configurable weight for level division. Firstly, the address name is segmented through the natural language processing model to form segmented phrases, and the segmented phrases are divided into segments according to administrative levels, and the segments are mapped to nodes in a tree structure, fully considered Based on the tree structure of addresses, the addresses are divided into sections according to administrative levels. Each administrative level section matches different weights, and the weights can be fine-tuned in actual business scenarios. By establishing an index structure for the massive data pre-stored in the index server, combined with the computing architecture of the Elasticsearch component itself and powerful distributed computing capabilities, real-time fast query of the first address in the preset index structure is realized. For the first four administrative-level addresses of the segmented address, exact matching is performed according to the address database (tree-shaped) of the provinces, municipalities, counties and towns across the country. In addition, partial missing is effectively completed. The default weights are obtained through training of the training model. By continuously adjusting the training parameters during the training process, the similarity of the model training output is consistent with the pre-marked similarity value, or within the preset deviation range. The above training parameters include each weight value, To determine each weight value, make the weight setting more reliable.

In an embodiment, the first address and the second address include a range address and a flag address, respectively, and the processor invokes the preset matching algorithm, and respectively sets the first address and the second address according to the first preset The step of performing word segmentation according to rules to obtain a first segmentation group corresponding to the first address and a second segmentation group corresponding to the second address includes: corresponding range addresses of the first address and the second address respectively , Perform word segmentation according to the pre-associated address dictionary in the natural language processing model, and obtain the first word segmentation part corresponding to the first address and the first word segmentation part corresponding to the second address respectively; combine the first address and the first address Mark addresses corresponding to the two addresses, and perform word segmentation according to the first grammar model in the natural language processing model to obtain the second word segmentation part corresponding to the first address and the second word segmentation part corresponding to the second address respectively; The first word segmentation part corresponding to the first address and the second word segmentation part corresponding to the first address form the first word segmentation group corresponding to the first address, and the first word segmentation part corresponding to the second address and the The second word segmentation part corresponding to the second address forms a second word segmentation group corresponding to the second address.

In one embodiment, the first address and the second address further include detailed addresses, and the processor above sets the flag addresses corresponding to the first address and the second address according to the natural language processing model After the steps of obtaining the second word segmentation part corresponding to the first address and the second word segmentation part corresponding to the second address respectively, including: dividing the first address and the second address separately The corresponding detailed address is segmented according to the second grammar model in the natural language processing model, and the third segmentation part corresponding to the first address and the third segmentation part corresponding to the second address are obtained respectively; The first word segmentation part corresponding to the address, the second word segmentation part corresponding to the first address, and the third word segmentation part corresponding to the first address form the first word segmentation group corresponding to the first address, and the second address The corresponding first segmentation part, the second segmentation part corresponding to the second address, and the third segmentation part corresponding to the second address form a second segmentation group corresponding to the second address.

In one embodiment, the range address includes four administrative levels of province, city/district, county, and township/town, the mark address includes the name of a cell or a building, and the processor obtains all the addresses according to the second preset rule. The step of matching results between the first segment and all the second segments includes: mapping all the first segments and all the second segments into two in the order of administrative level from high to low. Structure trees with the same structure, wherein the structure tree includes a plurality of nodes, and each node corresponds to each of the first segment or each of the second segment respectively; each node of the two structure trees is obtained Respectively corresponding matching values; respectively obtaining the first weight corresponding to the range address, the second weight corresponding to the mark address, and the third weight corresponding to the detail address; the matching rate is calculated according to the matching value multiplied by the corresponding weight, respectively Obtain the first matching rate corresponding to the range address, the second matching rate corresponding to the mark address, and the third matching rate corresponding to the detail address; the first matching rate, the second matching rate, and the The sum of the third matching rate is used as a matching result of all the first segments and all the second segments.

In one embodiment, the step of obtaining the matching value corresponding to each node of the two structure trees by the above-mentioned processor includes: combining each first segment corresponding to the range address in the first address with the Each second segment corresponding to the range address in the second address is matched exactly in one-to-one correspondence according to the node correspondence to obtain each first matching value; and each first segment corresponding to the flag address in the first address Segment, corresponding to each second segment corresponding to the flag address in the second address, perform a one-to-one matching of model keywords according to the node correspondence relationship to obtain each second matching value; combine the details in the first address Each first segment corresponding to the address, and each second segment corresponding to the detailed address in the second address, perform digital matching in one-to-one correspondence according to the node correspondence to obtain each third matching value; summarize each of the The first matching value, each of the second matching values, and each of the third matching values obtain matching values corresponding to each node of the two structure trees.

In one embodiment, before the step of obtaining the first weight corresponding to the range address, the second weight corresponding to the mark address, and the third weight corresponding to the detail address by the above-mentioned processor respectively, the method includes: pre-marking similar A specified number of training samples with a degree value are input into the natural language processing model for training; by adjusting the training parameter to the first parameter, the similarity value output by the natural language processing model is consistent with the pre-labeled similarity value; The corresponding weight values in the first parameter are respectively corresponding to the first weight, the second weight, and the third weight according to the node correspondence relationship.

Those skilled in the art can understand that the structure shown in FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.

An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored. The computer-readable storage medium may be non-volatile or volatile. The computer program is executed when the processor is executed. In the address matching method, the first address is the address to be retrieved input by the user, and the second address is stored in the index server. The method includes: calling a preset matching algorithm, and respectively comparing the first address and the second address according to the first preset Set rules for word segmentation to obtain a first segmentation group corresponding to the first address and a second segmentation group corresponding to the second address; according to the first segmentation group, the first address is divided into multiple first segments Segment, dividing the second address into a plurality of second segments according to the second word segmentation; obtaining matching results of all the first segments and all the second segments according to a second preset rule; Determine whether the first address and the second address are the same according to the matching result.

For the above computer-readable storage medium, the data pre-stored in the index server is unstructured data, and its storage method is the column storage form of key-value pairs. Unstructured data refers to text, image, voice, etc. formed based on NoSQL storage technology Column storage, the amount of data is very large, and it is necessary to use the distributed architecture of NoSQL technology for storage and calculation. The index server combines the NoSQL distributed architecture storage and index structure to achieve real-time fast query and calculation of massive data. A configurable weight address matching model based on multi-level address division. First, the address name is segmented through a natural language processing model to form sub-phrases, and the sub-phrases are divided into segments according to administrative levels, and the segments are mapped to nodes in a tree structure , Taking full account of the tree structure of addresses, the addresses are divided into sections according to administrative levels. Each administrative level is matched with different weights, and the weights can be fine-tuned in actual business scenarios. By establishing an index structure for the massive data pre-stored in the index server, combined with the computing architecture of the Elasticsearch component itself and powerful distributed computing capabilities, real-time fast query of the first address in the preset index structure is realized. For the first four administrative-level addresses of the segmented address, exact matching is performed according to the address database (tree-shaped) of the provinces, municipalities, counties and towns across the country. In addition, partial missing is effectively completed. The default weights are obtained through training of the training model. By continuously adjusting the training parameters during the training process, the similarity of the model training output is consistent with the pre-marked similarity value, or within the preset deviation range. The above training parameters include each weight value, To determine each weight value, make the weight setting more reliable.

In an embodiment, the first address and the second address include a range address and a flag address, respectively, and the processor invokes the preset matching algorithm, and respectively sets the first address and the second address according to the first preset Rule word segmentation to obtain a first segmentation group corresponding to the first address and a second segmentation group corresponding to the second address includes: corresponding range addresses of the first address and the second address, respectively , Perform word segmentation according to the pre-associated address dictionary in the natural language processing model, and obtain the first word segmentation part corresponding to the first address and the first word segmentation part corresponding to the second address respectively; combine the first address and the first address Mark addresses corresponding to the two addresses, and perform word segmentation according to the first grammar model in the natural language processing model to obtain the second word segmentation part corresponding to the first address and the second word segmentation part corresponding to the second address respectively; The first word segmentation part corresponding to the first address and the second word segmentation part corresponding to the first address form the first word segmentation group corresponding to the first address, and the first word segmentation part corresponding to the second address and the The second word segmentation part corresponding to the second address forms a second word segmentation group corresponding to the second address.

In one embodiment, before the step of obtaining the first weight corresponding to the range address, the second weight corresponding to the mark address, and the third weight corresponding to the detail address by the above-mentioned processor respectively, the method includes: pre-marking similar The specified number of training samples of the degree value are input into the natural language processing model for training; by adjusting the training parameter to the first parameter, the similarity value output by the natural language processing model is the same as the pre-labeled similarity value Consistent; the corresponding weight values in the first parameter are respectively corresponding to the first weight, the second weight, and the third weight according to the node correspondence relationship.

Persons of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by computer programs instructing relevant hardware. The above-mentioned computer programs can be stored in a non-volatile computer readable storage medium. Here, when the computer program is executed, it may include the procedures of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media provided in this application and used in the embodiments may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

It should be noted that in this article, the terms "including", "including" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, It also includes other elements that are not explicitly listed, or elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article or method that includes the element.

The above are only the preferred embodiments of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of this application description and drawings, or directly or indirectly applied to other related The technical field is equally included in the scope of patent protection of this application.

Claims

An address matching method, wherein the first address is the address to be retrieved input by the user, and the second address is stored in an index server, and the method includes:

Call a preset matching algorithm, respectively segment the first address and the second address according to the first preset rule, and obtain the first segmentation group corresponding to the first address and the second segmentation group corresponding to the second address Phrase segmentation, wherein the preset matching algorithm includes word segmentation calculation and matching calculation;

Dividing the first address into a plurality of first segments according to the first word segmentation, and dividing the second address into a plurality of second segments according to the second word segmentation;

Obtaining matching results of all the first segments and all the second segments according to a second preset rule;

Determine whether the first address and the second address are the same according to the matching result.
The address matching method according to claim 1, wherein the first address and the second address respectively include a range address and a flag address, and the preset matching algorithm is invoked, and the first address and the second address are respectively The address segmentation is performed according to a first preset rule to obtain a first segmentation group corresponding to the first address and a second segmentation group corresponding to the second address, including:

The range addresses corresponding to the first address and the second address are segmented according to the pre-associated address dictionary in the natural language processing model, and the first segmentation part and the second address corresponding to the first address are obtained respectively The corresponding first participle part;

The mark addresses corresponding to the first address and the second address are segmented according to the first grammar model in the natural language processing model to obtain the second segmentation part corresponding to the first address and the second segmentation respectively. The second segmentation part corresponding to the address;

The first word segmentation part corresponding to the first address and the second word segmentation part corresponding to the first address form the first word segmentation group corresponding to the first address, and the first word segmentation part corresponding to the second address is combined with The second word segmentation part corresponding to the second address forms a second word segmentation group corresponding to the second address.
4. The address matching method according to claim 2, wherein the first address and the second address further include detailed addresses, and the first address and the second address are respectively corresponding to the flag addresses, After the step of performing word segmentation according to the grammar model in the natural language processing model, and obtaining the second word segmentation part corresponding to the first address and the second word segmentation part corresponding to the second address respectively, the method includes:

The detailed addresses corresponding to the first address and the second address are segmented according to the second grammar model in the natural language processing model, and the third segmentation part and the second segmentation part corresponding to the first address are obtained respectively. The third participle part corresponding to the address;

The first segmentation part corresponding to the first address, the second segmentation part corresponding to the first address, and the third segmentation part corresponding to the first address form the first segmentation group corresponding to the first address, and The first word segmentation part corresponding to the second address, the second word segmentation part corresponding to the second address, and the third word segmentation part corresponding to the second address form a second word segmentation group corresponding to the second address.
The address matching method according to claim 3, wherein the range address includes four administrative levels of province, city/district, county, and township/town, and the mark address includes a cell name or a building name, and the second The step of obtaining the matching results of all the first segments and all the second segments by a preset rule includes:

Map all the first segments and all the second segments into two structure trees with the same structure in the order of administrative level from high to low, wherein the structure tree includes multiple nodes, and each node is One-to-one correspondence with each of the first segments or each of the second segments;

Obtaining matching values corresponding to each node of the two structure trees;

Acquiring the first weight corresponding to the range address, the second weight corresponding to the mark address, and the third weight corresponding to the detail address respectively;

Calculate the matching rate according to the matching value multiplied by the corresponding weight, and obtain the first matching rate corresponding to the range address, the second matching rate corresponding to the flag address, and the third matching rate corresponding to the detail address respectively;

The sum of the first matching rate, the second matching rate, and the third matching rate is used as a matching result of all the first segments and all the second segments.
The address matching method according to claim 4, wherein the step of obtaining the matching value corresponding to each node of the two structure trees respectively comprises:

Each first segment corresponding to the range address in the first address is matched with each second segment corresponding to the range address in the second address in a one-to-one correspondence according to the node correspondence relationship to perform precise and full matching, to obtain Each first matching value;

Match each first segment corresponding to the flag address in the first address with each second segment corresponding to the flag address in the second address in a one-to-one correspondence of model keywords according to the node correspondence, Get each second matching value;

Match each first segment corresponding to the detail address in the first address with each second segment corresponding to the detail address in the second address in a one-to-one correspondence according to the node correspondence to obtain each Third matching value;

Summarize each of the first matching values, each of the second matching values, and each of the third matching values to obtain matching values corresponding to each node of the two structure trees.
The address matching method according to claim 5, wherein before the step of respectively obtaining the first weight corresponding to the range address, the second weight corresponding to the flag address, and the third weight corresponding to the detail address, include:

Inputting a specified number of training samples with pre-labeled similarity values into the natural language processing model for training;

By adjusting the training parameter to the first parameter, the similarity value output by the natural language processing model is consistent with the pre-labeled similarity value;

Corresponding weight values in the first parameter to the first weight, the second weight, and the third weight according to the node correspondence relationship.
The address matching method according to claim 2, wherein said calling a preset matching algorithm respectively performs word segmentation on said first address and said second address according to a first preset rule, to obtain said first address corresponding Before the steps of the first segmentation group of and the second segmentation group corresponding to the second address, the step includes:

Indexing a specified number of unstructured address data pre-stored in the index server to obtain a preset index structure;

Receiving an interface plug-in uploaded to a designated directory of the index server, wherein the interface plug-in is formed by packaging and encapsulating the preset matching algorithm;

Obtaining configuration parameters of the interface plug-in;

Establish a calculation association relationship between the preset index structure and the interface plug-in by running the configuration parameter.
An address matching device, wherein the first address is the address to be retrieved input by the user, the second address is stored in an index server, and the device includes:

The word segmentation module is used to call a preset matching algorithm, and respectively segment the first address and the second address according to a first preset rule to obtain the first segmentation group and the second segment corresponding to the first address The second word segmentation group corresponding to the address, wherein the preset matching algorithm includes word segmentation calculation and matching calculation;

A dividing module, configured to divide the first address into a plurality of first segments according to the first phrase group, and divide the second address into a plurality of second segments according to the second phrase group;

A second acquiring module, configured to acquire matching results of all the first segments and all the second segments according to a second preset rule;

The judgment module is configured to judge whether the first address and the second address are the same according to the matching result.
A computer device includes a memory and a processor, the memory stores a computer program, wherein the processor implements an address matching method when the computer program is executed, the first address is the address to be retrieved input by the user, and the second address Stored in the index server, methods include:

Call a preset matching algorithm, respectively segment the first address and the second address according to the first preset rule, and obtain the first segmentation group corresponding to the first address and the second segmentation group corresponding to the second address Phrase segmentation, wherein the preset matching algorithm includes word segmentation calculation and matching calculation;

Dividing the first address into a plurality of first segments according to the first word segmentation, and dividing the second address into a plurality of second segments according to the second word segmentation;

Obtaining matching results of all the first segments and all the second segments according to a second preset rule;

Determine whether the first address and the second address are the same according to the matching result.
The computer device according to claim 9, wherein the first address and the second address include a range address and a flag address, respectively, and the preset matching algorithm is invoked to combine the first address and the second address, respectively The step of performing word segmentation according to a first preset rule to obtain a first segmentation group corresponding to the first address and a second segmentation group corresponding to the second address includes:

The range addresses corresponding to the first address and the second address are segmented according to the pre-associated address dictionary in the natural language processing model, and the first segmentation part and the second address corresponding to the first address are obtained respectively The corresponding first participle part;

The mark addresses corresponding to the first address and the second address are segmented according to the first grammar model in the natural language processing model to obtain the second segmentation part corresponding to the first address and the second segmentation respectively. The second segmentation part corresponding to the address;

The first word segmentation part corresponding to the first address and the second word segmentation part corresponding to the first address form the first word segmentation group corresponding to the first address, and the first word segmentation part corresponding to the second address is combined with The second word segmentation part corresponding to the second address forms a second word segmentation group corresponding to the second address.
10. The computer device according to claim 10, wherein the first address and the second address further include detailed addresses, and the flag addresses corresponding to the first address and the second address are respectively based on After the grammar model in the natural language processing model performs word segmentation to obtain the second word segmentation part corresponding to the first address and the second word segmentation part corresponding to the second address, the steps include:

The detailed addresses corresponding to the first address and the second address are segmented according to the second grammar model in the natural language processing model, and the third segmentation part and the second segmentation part corresponding to the first address are obtained respectively. The third participle part corresponding to the address;

The first segmentation part corresponding to the first address, the second segmentation part corresponding to the first address, and the third segmentation part corresponding to the first address form the first segmentation group corresponding to the first address, and The first word segmentation part corresponding to the second address, the second word segmentation part corresponding to the second address, and the third word segmentation part corresponding to the second address form a second word segmentation group corresponding to the second address.
The computer device according to claim 11, wherein the range address includes four administrative levels of province, city/district, county, and town/town, the mark address includes a cell name or a building name, and the second predetermined The step of obtaining the matching results of all the first segments and all the second segments by a rule includes:

Map all the first segments and all the second segments into two structure trees with the same structure in the order of administrative level from high to low, wherein the structure tree includes multiple nodes, and each node is One-to-one correspondence with each of the first segments or each of the second segments;

Obtaining matching values corresponding to each node of the two structure trees;

Acquiring the first weight corresponding to the range address, the second weight corresponding to the mark address, and the third weight corresponding to the detail address respectively;

Calculate the matching rate according to the matching value multiplied by the corresponding weight, and obtain the first matching rate corresponding to the range address, the second matching rate corresponding to the flag address, and the third matching rate corresponding to the detail address respectively;

The sum of the first matching rate, the second matching rate, and the third matching rate is used as a matching result of all the first segments and all the second segments.
The computer device according to claim 12, wherein the step of obtaining the matching values corresponding to the respective nodes of the two structure trees comprises:

Each first segment corresponding to the range address in the first address is matched with each second segment corresponding to the range address in the second address in a one-to-one correspondence according to the node correspondence relationship to perform precise and full matching, to obtain Each first matching value;

Match each first segment corresponding to the flag address in the first address with each second segment corresponding to the flag address in the second address in a one-to-one correspondence of model keywords according to the node correspondence, Get each second matching value;

Match each first segment corresponding to the detail address in the first address with each second segment corresponding to the detail address in the second address in a one-to-one correspondence according to the node correspondence to obtain each Third matching value;

Summarize each of the first matching values, each of the second matching values, and each of the third matching values to obtain matching values corresponding to each node of the two structure trees.
The computer device according to claim 13, wherein before the step of separately obtaining the first weight corresponding to the range address, the second weight corresponding to the flag address, and the third weight corresponding to the detail address, the step includes :

Inputting a specified number of training samples with pre-labeled similarity values into the natural language processing model for training;

By adjusting the training parameter to the first parameter, the similarity value output by the natural language processing model is consistent with the pre-labeled similarity value;

Corresponding weight values in the first parameter to the first weight, the second weight, and the third weight according to the node correspondence relationship.
A computer-readable storage medium having a computer program stored thereon, wherein the computer program implements an address matching method when executed by a processor, the first address is the address to be retrieved input by the user, and the second address is stored in an index server , Methods include:

Call a preset matching algorithm, respectively segment the first address and the second address according to the first preset rule, and obtain the first segmentation group corresponding to the first address and the second segmentation group corresponding to the second address Phrase segmentation, wherein the preset matching algorithm includes word segmentation calculation and matching calculation;

Dividing the first address into a plurality of first segments according to the first word segmentation, and dividing the second address into a plurality of second segments according to the second word segmentation;

Obtaining matching results of all the first segments and all the second segments according to a second preset rule;

Determine whether the first address and the second address are the same according to the matching result.
The computer-readable storage medium according to claim 15, wherein the first address and the second address respectively include a range address and a flag address, and the preset matching algorithm is invoked to combine the first address and the second address respectively The step of performing word segmentation for the second address according to the first preset rule to obtain the first segmentation group corresponding to the first address and the second segmentation group corresponding to the second address includes:

The range addresses corresponding to the first address and the second address are segmented according to the pre-associated address dictionary in the natural language processing model, and the first segmentation part and the second address corresponding to the first address are obtained respectively The corresponding first participle part;

The mark addresses corresponding to the first address and the second address are segmented according to the first grammar model in the natural language processing model to obtain the second segmentation part corresponding to the first address and the second segmentation respectively. The second segmentation part corresponding to the address;

The first word segmentation part corresponding to the first address and the second word segmentation part corresponding to the first address form the first word segmentation group corresponding to the first address, and the first word segmentation part corresponding to the second address is combined with The second word segmentation part corresponding to the second address forms a second word segmentation group corresponding to the second address.
The computer-readable storage medium according to claim 16, wherein the first address and the second address further include a detailed address, and the first address and the second address respectively correspond to the flags After the steps of performing word segmentation according to the grammar model in the natural language processing model to obtain the second word segmentation part corresponding to the first address and the second word segmentation part corresponding to the second address, the following steps include:

The detailed addresses corresponding to the first address and the second address are segmented according to the second grammar model in the natural language processing model, and the third segmentation part and the second segmentation part corresponding to the first address are obtained respectively. The third participle part corresponding to the address;

The first segmentation part corresponding to the first address, the second segmentation part corresponding to the first address, and the third segmentation part corresponding to the first address form the first segmentation group corresponding to the first address, and The first word segmentation part corresponding to the second address, the second word segmentation part corresponding to the second address, and the third word segmentation part corresponding to the second address form a second word segmentation group corresponding to the second address.
The computer-readable storage medium according to claim 17, wherein the scope address includes four administrative levels of province, city/district, county and township/town, and the logo address includes a cell name or a building name, and the The step of obtaining the matching results of all the first segments and all the second segments by the second preset rule includes:

Map all the first segments and all the second segments into two structure trees with the same structure in the order of administrative level from high to low, wherein the structure tree includes multiple nodes, and each node is One-to-one correspondence with each of the first segments or each of the second segments;

Obtaining matching values corresponding to each node of the two structure trees;

Acquiring the first weight corresponding to the range address, the second weight corresponding to the mark address, and the third weight corresponding to the detail address respectively;

Calculate the matching rate according to the matching value multiplied by the corresponding weight, and obtain the first matching rate corresponding to the range address, the second matching rate corresponding to the flag address, and the third matching rate corresponding to the detail address respectively;

The sum of the first matching rate, the second matching rate, and the third matching rate is used as a matching result of all the first segments and all the second segments.
18. The computer-readable storage medium according to claim 18, wherein the step of obtaining the matching value corresponding to each node of the two structure trees comprises:

Each first segment corresponding to the range address in the first address is matched with each second segment corresponding to the range address in the second address in a one-to-one correspondence according to the node correspondence relationship to perform precise and full matching, to obtain Each first matching value;

Match each first segment corresponding to the flag address in the first address with each second segment corresponding to the flag address in the second address in a one-to-one correspondence of model keywords according to the node correspondence, Get each second matching value;

Match each first segment corresponding to the detail address in the first address with each second segment corresponding to the detail address in the second address in a one-to-one correspondence according to the node correspondence to obtain each Third matching value;

Summarize each of the first matching values, each of the second matching values, and each of the third matching values to obtain matching values corresponding to each node of the two structure trees.
The computer-readable storage medium according to claim 19, wherein the step of respectively obtaining the first weight corresponding to the range address, the second weight corresponding to the flag address, and the third weight corresponding to the detail address Before, including:

Inputting a specified number of training samples with pre-labeled similarity values into the natural language processing model for training;

By adjusting the training parameter to the first parameter, the similarity value output by the natural language processing model is consistent with the pre-labeled similarity value;

Corresponding weight values in the first parameter to the first weight, the second weight, and the third weight according to the node correspondence relationship.