CN113515677A

CN113515677A - Address matching method and device and computer readable storage medium

Info

Publication number: CN113515677A
Application number: CN202110834270.2A
Authority: CN
Inventors: 张强; 高恩伟; 闫岩
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Priority date: 2021-07-22
Filing date: 2021-07-22
Publication date: 2021-10-19
Anticipated expiration: 2041-07-22
Also published as: CN113515677B

Abstract

The invention discloses an address matching method, an address matching device and a computer readable storage medium, wherein the address matching method comprises the following steps: acquiring at least two target addresses matched with addresses to be matched in a standard address set, wherein the standard address set comprises addresses of at least two data sources, and each target address is obtained by matching according to different matching models; determining the confidence of each target address, wherein the higher the number of data sources matched with the target address is, the higher the corresponding confidence is; and determining the target address matched with the address to be matched in all the target addresses according to the confidence degrees of all the target addresses. The invention can improve the accuracy of address matching.

Description

Address matching method and device and computer readable storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to an address matching method and apparatus, and a computer-readable storage medium.

Background

In the field of communications, there is a need for address matching, for example, after address information such as a mobile base station cell address, a residential community address, a school, a hospital institution address, and the like is manually collected, because there may be an inaccuracy problem, it is necessary to match the collected address with a standard address to obtain a corresponding correct standard address, for example, a cell to be matched is a "basketball garden", a result obtained by calculating similarity according to a minimum edit distance is a "basketball garden", and a correct result should be a "basketball garden", so when address matching is simply performed by using the minimum edit distance to calculate similarity, the matching accuracy is low, and the present invention at least solves the following technical problems: how to improve the accuracy of address matching.

Disclosure of Invention

The invention mainly aims to provide an address matching method, an address matching device and a computer readable storage medium, and aims to solve the technical problem of low accuracy of address matching.

In order to achieve the above object, the present invention provides an address matching method, including:

acquiring at least two target addresses matched with addresses to be matched in a standard address set, wherein the standard address set comprises addresses of at least two data sources, and each target address is obtained by matching according to different matching models;

determining the confidence of each target address, wherein the higher the number of data sources matched with the target address is, the higher the corresponding confidence is;

and determining the target address matched with the address to be matched in all the target addresses according to the confidence degrees of all the target addresses.

Optionally, the step of obtaining at least two target addresses matched with the address to be matched in the standard address set includes:

determining a first target address according to the address to be matched, the standard address set and a preset probability transition matrix model, wherein the preset probability transition matrix model is obtained by training a probability transition matrix training model according to an address training set and the standard address set;

determining a second target address according to the address to be matched, the standard address set and a preset residual error network fusion model, wherein the preset residual error network fusion model comprises an embedding layer, a TextRCNN network, a TextCNN network, a residual error layer and a preset activation function, the preset residual error network fusion model is obtained by training the residual error network fusion training model according to the address training set and the standard address set, and the target addresses are the first target address and the second target address respectively.

Optionally, the step of determining the first address according to the address to be matched, the standard address set, and a preset probability transition matrix model includes:

acquiring candidate characteristic words with frequency greater than preset frequency in the standard address set;

constructing a feature word set according to the candidate feature words;

extracting a characteristic word sequence corresponding to the address to be matched according to the characteristic word set, wherein the characteristic word sequence comprises the candidate characteristic words and common characters in the address to be matched;

combining the characteristic word elements in the characteristic word sequence according to the target combination length and a preset combination sequence to obtain a characteristic word substring set of the target combination length;

determining a joint probability corresponding to a feature word substring set according to a preset hidden Markov model and the feature word substring set with the target combination length, wherein the preset hidden Markov model is obtained by training a hidden Markov training model according to the standard address set, the joint probability corresponding to the feature word substring set is obtained according to a feature word transition probability in the hidden Markov model, and the preset transition probability model is the preset hidden Markov model;

when the target combination length is smaller than the preset combination length, increasing the target combination length, and returning to execute the step of combining the characteristic word elements in the characteristic word sequence according to the target combination length and the preset combination sequence to obtain a characteristic word substring set with the target combination length;

when the target combination length is greater than or equal to the preset combination length, acquiring the feature substring set with the maximum joint probability;

determining an optimal solution according to the feature substring set with the maximum joint probability;

determining the optimal solution as the first address.

Optionally, the step of determining, according to the confidence of each target address, the target address matched with the address to be matched from all the target addresses includes:

determining the matching degree of each target address and the address to be matched;

determining the product of the matching degree and the confidence degree corresponding to each target address;

and determining the target address matched with the address to be matched according to the target address corresponding to the maximum product.

Optionally, after the step of obtaining at least two target addresses matched with the address to be matched in the standard address set, the address matching method further includes:

when there are at least two different target addresses, performing the step of determining a confidence level for each of the target addresses;

and when all the target addresses are the same, determining that the target address is the target address matched with the address to be matched.

Optionally, the step of determining the confidence level of each target address includes:

determining the number of the data sources matched with each target address;

determining the confidence level of the target address according to the quantity.

Optionally, before the step of obtaining at least two target addresses matched with the address to be matched in the standard address set, the address matching method further includes:

acquiring an original address sent by a server;

and carrying out illegal character cleaning, redundant address cleaning, wrongly written character replacement and incomplete address completion on the original address to obtain the address to be matched.

In addition, in order to achieve the above object, the present invention further provides an address matching apparatus, which includes an obtaining module and a determining module, wherein:

the acquisition module is used for acquiring at least two target addresses matched with the addresses to be matched in a standard address set, the standard address set comprises addresses of at least two data sources, and each target address is obtained by matching according to different matching models;

the determining module is configured to determine a confidence level of each target address, where the greater the number of data sources matched with the target address, the higher the corresponding confidence level is, and determine, according to the confidence level of each target address, the target address matched with the address to be matched, in all the target addresses.

In addition, to achieve the above object, the present invention further provides an address matching apparatus, which includes a memory, a processor, and an address matching program stored in the memory and operable on the processor, wherein the address matching program, when executed by the processor, implements the steps of any one of the above address matching methods.

In addition, to achieve the above object, the present invention further provides a computer-readable storage medium having an address matching program stored thereon, the address matching program implementing the steps of the address matching method according to any one of the above items when executed by a processor.

The method, the device and the computer readable storage medium for address matching provided by the embodiments of the present invention determine the confidence of each target address by obtaining at least two target addresses in a standard address set, which are matched with the address to be matched, and determine the target address matched with the address to be matched in all the target addresses according to the confidence of each target address, wherein the standard address set includes addresses of at least two data sources, each target address is obtained by matching according to different matching models, the more the number of data sources matched with a target address is, the higher the corresponding confidence is, and since the confidence corresponding to the target address obtained by matching based on different matching models is obtained in the matching process, the more the number of data sources matched with a target address is, the higher the corresponding confidence is, therefore, when matching is carried out, the problem of low accuracy rate caused by the fact that the matching address is obtained by simply adopting the minimum editing distance to calculate the similarity when a single address data source is adopted for matching is avoided, and the accuracy of address matching can be effectively improved.

Drawings

FIG. 1 is a schematic diagram of an apparatus in a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a first embodiment of an address matching method according to the present invention;

FIG. 3 is a flowchart illustrating a second embodiment of an address matching method according to the present invention;

FIG. 4 is a flowchart illustrating a third embodiment of an address matching method according to the present invention;

FIG. 5 is a flowchart illustrating a fourth embodiment of an address matching method according to the present invention;

FIG. 6 is a functional block diagram of the address matching apparatus according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.

The address matching device related to the embodiment of the invention can be a server, a terminal device or other computer devices.

As shown in fig. 1, the apparatus may include: a processor 1001, such as a CPU, a memory 1002, and a communication bus 1003. The communication bus 1003 is used to implement connection communication among these components. The memory 1003 may be a high-speed RAM memory or a non-volatile memory (e.g., a disk memory). The memory 1003 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration of the device shown in fig. 1 is not intended to be limiting of the device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, the memory 1003, which is a kind of computer storage medium, may include therein an operating system and an address matching program.

In the apparatus shown in fig. 1, the processor 1001 may be configured to call an address matching program stored in the memory 1003, and perform the following operations:

Further, the processor 1001 may call an address matching program stored in the memory 1003, and also perform the following operations:

constructing a feature word set according to the candidate feature words;

determining the optimal solution as the first address.

determining the number of the data sources matched with each target address;

acquiring an original address sent by a server;

Referring to fig. 2, a first embodiment of the present invention provides an address matching method, where the address matching method includes:

step S10, at least two target addresses matched with the addresses to be matched in a standard address set are obtained, the standard address set comprises the addresses of at least two data sources, and each target address is obtained by matching according to different matching models;

in this embodiment, the execution subject is an address matching device, and the address matching device may specifically be a server, or a terminal device, or may also be another computer device; the standard address set is a set composed of preset standard addresses, in this embodiment, addresses are obtained from more than two data sources as the standard addresses, the data sources refer to data sources providing the standard addresses, the data sources are map service providers, such as a high-grade map, a Baidu map, an Tencent map and the like, when a standard address is obtained from a map service provider, the standard address may be specifically and respectively crawled from an Application Programming Interface (API) of a map corresponding to each map service, each standard address may include a plurality of tags, for example, each standard address corresponds to five-level tags of province level, city level, prefecture level, street community and cell level, each standard address may further include tags of other levels, such as "road parcel", "street number, village group" and "detailed address", and the standard address may include fewer or more tags of the levels than the above example; taking the example of crawling tag information of all cell levels of Guizhou province through a Baidu map API, the URL of the Baidu map is http:// API. map. basic. com/place/v 2/search? The access parameters comprise longitude and latitude, a developer access key, a retrieval keyword, an output format and the like, the size of a longitude and latitude grid is adjusted until five-level label information of all cells is obtained, and a standard address corresponding to a Baidu map API is obtained, the five-level label information of all the cells can be further obtained from the Gaud map API and the Baidu map API by adopting a similar mode, so that the standard address corresponding to the Gaud map API and the standard address corresponding to the Baidu map API are respectively obtained, the standard addresses corresponding to different map APIs are combined into a standard address set, in addition, the standard addresses can be obtained from other storage address information data sources to obtain a standard address set, and the standard address set comprises the addresses of at least two data sources; the address to be matched is an address to be matched, based on different matching requirements, the address to be matched may be obtained based on various manners, for example, the address to be matched is obtained from a magic-box life service platform to match the address of the platform to obtain a corresponding target address, the target address is an address matched with the address to be matched, specifically, the target address is a standard address matched with the address to be matched in a standard address set, in this embodiment, the address to be matched is matched to obtain at least two target addresses, each target address is obtained according to different matching models, the number of the matching models is more than two, each matching model can be matched to obtain the corresponding target address by combining the address to be matched, the matching models may be various address matching models implemented by using a machine learning technology, such as: the method comprises the steps of establishing an address matching model based on a deep learning model, establishing an address matching model based on an interest knowledge point atlas pre-training, establishing an address matching model based on a probability transfer matrix, establishing an address matching model based on a residual error network fusion model, and the like, wherein address matching mechanisms adopted by different matching models are different, so that at least two target addresses can be obtained when standard addresses of different data sources are matched.

Step S20, determining the confidence of each target address, wherein the more the number of data sources matched with the target address is, the higher the corresponding confidence is;

the confidence level is used to indicate the credibility of the target address, and in this embodiment, the greater the number of data sources matched by the target address, the higher the corresponding confidence level, i.e. the higher the trustworthiness of the target addresses, wherein, after at least two target addresses have been matched based on different matching models, the target address may be matched with only one data source, or may be matched with more than two data sources, and the target address is matched with the data source in the meaning that the standard address corresponding to the data source comprises the target address, and because the target address is obtained based on different matching models, in the case of high accuracy of the matching model, the matching result tends to converge, as embodied by the standard addresses corresponding to more data sources, therefore, in this embodiment, the greater the number of data sources matched based on the target address, the higher the confidence corresponding to the target address.

When determining the confidence corresponding to each target address, the method may be that first, the standard addresses corresponding to different data sources in the standard address set are classified, and the classification may be performed based on that the standard addresses belong to an intersection of several data sources, for example, if there are three data sources, the standard addresses may belong to an intersection of three data sources, at this time, the standard addresses may be classified into a first type, if the standard addresses belong to an intersection of two data sources, the standard addresses may be classified into a second type, if the standard addresses only belong to one data source, the standard addresses may be classified into a third type, and in addition, for fewer or more data sources, the standard addresses may also be classified into different types in a similar manner, and when obtaining the target addresses, since the target addresses are the standard addresses matched with the addresses to be matched, the types of the target addresses may be obtained based on the types associated with the standard addresses, further, based on the type, the confidence level may be determined, and the confidence level corresponding to each type may be associated in advance, so that the confidence level corresponding to the target address may be determined, or the corresponding confidence level may also be obtained directly according to the number of data sources matched to the target address, that is, the corresponding relationship between the number of matched data sources and the confidence level is set in advance, and after the number of data sources matched to the target address is determined, the confidence level of the target address is determined according to the number and the corresponding relationship.

When at least two different target addresses exist, the step of determining the confidence of each target address is executed, and when the target addresses are the same, the target address is determined to be the target address matched with the address to be matched, so that the accuracy of address matching can be improved.

Step S30, determining the target address matched with the address to be matched from all the target addresses according to the confidence of each target address.

After the confidence degrees of the target addresses are obtained, the target address with the highest confidence degree in all the target addresses can be directly used as the target address matched with the address to be matched, or the target address matched with the address to be matched can be further obtained by combining the matching degrees of the target address and the address to be matched based on the confidence degrees, so that the accuracy of address matching is improved.

In this embodiment, by obtaining at least two target addresses matched with an address to be matched in a standard address set, determining a confidence of each target address, and determining a target address matched with the address to be matched in all the target addresses according to the confidence of each target address, wherein the standard address set includes addresses of at least two data sources, and each target address is obtained by matching according to different matching models, the more the number of data sources matched by the target address is, the higher the corresponding confidence is, because the confidence corresponding to the target address obtained by matching based on different matching models is obtained in the matching process, the further the target address matched by the address to be matched is obtained, and the more the number of data sources matched by the target address is, the higher the corresponding confidence is, so that when matching is performed, a single address data source is avoided, the problem of low accuracy rate caused by simply adopting the minimum editing distance to calculate the similarity to obtain the matched address can be effectively solved, and the accuracy of address matching can be effectively improved.

Referring to fig. 3, a second embodiment of the present invention provides an address matching method, based on the first embodiment shown in fig. 2, where the step S10 includes:

step S11, determining a first target address according to the address to be matched, the standard address set and a preset probability transition matrix model, wherein the preset probability transition matrix model is obtained by training a probability transition matrix training model according to an address training set and the standard address set;

when the first target address is determined according to the address to be matched, the standard address set and the preset probability transition matrix model, the following method can be adopted:

acquiring candidate characteristic words with frequency greater than preset frequency in the standard address set, and constructing a characteristic word set according to the candidate characteristic words;

the preset frequency is keywords or the occurrence frequency of the keywords, the keywords or the keywords with the frequency greater than the preset frequency are candidate feature words, the preset frequency is 10000, for example, the process characteristic feature word set of the selected candidate feature words is as follows: q ═ town and country cells of provincial and urban area street communities };

the Hidden Markov Model (HMM) is concerned with probability transition matrix models, known as i_NN.epsilon.T is a discrete set of times, T.epsilon.1, 2_NThe state space composed of possible values is a discrete set of characteristic words Q ═ Q₁，q₂，...，q_NLet its transition probability matrix be a ═ p_ij]_N*NSince HMM is a probabilistic model about time sequence, its previous state is only related to the next state, namely:

p_ij＝P(i_t+1＝q_j|i_t＝q_i),i,j＝1,2,…,N；

counting the transfer times of the feature words of the adjacent levels from the time t to the time t +1, and recording as:

N(i_t+1＝q_j|i_t＝q_i),i,j＝1,2,…,N；

let its corresponding transition weight be a (i)_t+1-q_j|i_t-q_i) In order to avoid the situation that the transfer times are 0 and cannot be calculated, the transfer weight is processed in the following way:

wherein i, j is 1,2<m<log₂N(i_t+1＝q_j|i_t＝q_i)_minSubsequently, analyzing and comparing experimental results of m under different value conditions to set the value of m, and further calculating the probability pij corresponding to the feature word transfer weight by using a SoftMax function, wherein the specific calculation mode is as follows:

and extracting a characteristic word sequence corresponding to the address to be matched according to the characteristic word set Q, wherein the characteristic word is extracted by traversing the address of the cell to be matched to obtain a characteristic word sequence O (O)₁，o₂，...，o_i，...，o_L)，O_iThe method comprises the following steps that (1) an E Q and an L E N are assigned, a characteristic word sequence comprises candidate characteristic words and common characters in an address to be matched, the common characters are characters except the candidate characteristic words in the address to be matched, characteristic word elements in the characteristic word sequence are combined according to a target combination length and a preset combination sequence to obtain a characteristic word substring set with the target combination length, the preset combination sequence can be an extraction sequence, the characteristic word elements in the characteristic word sequence are combined according to the extraction sequence, the combination length is L (L is more than or equal to 2 and less than or equal to L-1), the characteristic word substrings with the same length are classified to obtain the characteristic word substring set with the combination length L: { O_l|O_l＝(o₁，o₂，...，o_l) L-1, the joint probability of each substring in the set Ol is calculated in the following way:

calculating the substring with the maximum joint probability of the substrings of the feature words with the combination length of l by adopting the following method:

determining the joint probability corresponding to the feature word substring set according to the preset hidden Markov model and the feature word substring set with the target combination length l, wherein the preset hidden Markov model is obtained by training the hidden Markov training model according to the standard address set, and the joint probability corresponding to the feature word substring set is obtained according to the feature word transfer probability in the hidden Markov modelObtaining a preset transition probability model which is a preset hidden Markov model, increasing the target combination length when the target combination length is smaller than the preset combination length, wherein the preset combination length can be set as L-1, and returning to the step of combining the characteristic word elements in the characteristic word sequence according to the target combination length and the preset combination sequence to obtain a characteristic word substring set with the target combination length; so as to obtain the feature word substring set with maximum joint probability corresponding to all the length l, i.e. MAX ═ { MaxQ₂，MaxQ₃，...，MaxQ_L-1Acquiring a feature substring set with the maximum joint probability when the target combination length is greater than or equal to the preset combination length, determining an optimal solution according to the feature substring set with the maximum joint probability, and determining the optimal solution as a first address; respectively making i ═ 2MaxQ_iAnd i ═ 2MaxQ_i+1Traversing elements in the maximum joint probability word substring set MAX, recording the number of feature words with the same sequence in substrings with adjacent lengths as ni, wherein i is 2,3_i1 is OS ═ MaxQ_i+1If n is_i>1, then OS ═ MaxQ_iUntil i ═ L-2.

Step S12, a second target address is determined according to the address to be matched, the standard address set and a preset residual error network fusion model, the preset residual error network fusion model comprises an embedding layer, a TextRCNN network, a TextCNN network, a residual error layer and a preset activation function, the preset residual error network fusion model is obtained by training a residual error network fusion training model according to the address training set and the standard address set, and the target addresses are the first target address and the second target address respectively.

In this embodiment, in order to improve the accuracy of address matching, the matching models adopted are a preset probability transition matrix model and a preset residual error network fusion model, and are respectively matched with the model to be matched to obtain a first target address and a second target address, the number of the target addresses is two, the target addresses are respectively a first target address and a second target address, and a target address matched with the address to be matched is determined in the first target address and the second target address according to the confidence of determining the first target address and the confidence of determining the second target address, wherein the target address matched with the address to be matched can be directly determined by using the high confidence as the target address matched with the address to be matched, or the target address matched with the address to be matched can be determined by combining the first matching degree of the first target address with the address to be matched and the second matching degree of the second target address with the address to be matched, and respectively determining a first product of the first matching degree and the first confidence degree and a second product of the second matching degree and the second confidence degree, comparing the first product with the second product, and taking the target address corresponding to the larger product as the target address matched with the address to be matched.

In this embodiment, a first target address is determined according to an address to be matched, a standard address set and a preset probability transition matrix model, the preset probability transition matrix model is obtained by training a probability transition matrix training model according to an address training set and the standard address set, a second target address is determined according to the address to be matched, the standard address set and a preset residual error network fusion model, the preset residual error network fusion model comprises an embedding layer, a TextRCNN network, a TextCNN network, a residual error layer and a preset activation function, the preset residual error network fusion model is obtained by training the residual error network fusion training model according to the address training set and the standard address set, and the target addresses are the first target address and the second target address respectively, so that the accuracy of address matching is improved.

Referring to fig. 4, a third embodiment of the present invention provides an address matching method, based on the first embodiment shown in fig. 2, where the step S30 includes:

step S31, determining the matching degree of each target address and the address to be matched;

in order to further improve the accuracy of address matching, in this embodiment, a target address matched with the address to be matched is obtained further based on the matching degree between the target address and the address to be matched and the confidence corresponding to the target address; the matching degree of the target address and the address to be matched can be calculated by adopting the following modes, the word number of the same character and the word number of different characters of the address to be matched and the target address are determined, the sum of the word number of the same character and the word number of different characters is calculated, the ratio of the word number of the same character to the sum is determined, the ratio is used as the matching degree, the matching degree can also be calculated in other modes, and the more similar the target address and the address to be matched is, the higher the matching degree is.

Step S32, determining the product of the matching degree and the confidence degree corresponding to each target address;

step S33, determining the target address matched with the address to be matched according to the target address corresponding to the maximum product.

Taking the target address corresponding to the maximum product as the target address matched with the address to be matched, for example, the number of the target addresses is two, the address to be matched is a, the target addresses are b and c respectively, the matching degree corresponding to the target address b is p (b), the matching degree corresponding to the target address c is p (c), and it is determined that the confidence degree corresponding to the target address b is 0.9, the confidence degree corresponding to the target address is 0.8, the product of the matching degree corresponding to the target address b and the confidence degree is 0.9 × p (b), and the product of the matching degree corresponding to the target address c and the confidence degree is 0.8p (c), wherein p (b) is 0.9, and p (c) is 0.8, so that the product of the matching degree corresponding to the target address b and the confidence degree is larger, and therefore, the target address b is taken as the target address matched with the address to be matched, so that the target address is more accurate, the accuracy of address matching is improved.

In this embodiment, the matching degree of each target address and the address to be matched is determined, the product of the matching degree corresponding to each target address and the confidence degree is determined, and the target address matched with the address to be matched is determined according to the target address corresponding to the maximum product, so that the accuracy of address matching can be further improved by combining the matching degree and the confidence degree.

Referring to fig. 5, a fourth embodiment of the present invention provides an address matching method, based on the first embodiment shown in fig. 2, before the step S10, the address matching method further includes:

step S40, acquiring the original address sent by the server;

the original address is an address which needs to be matched and is sent by the server, and since the original address may have situations such as character errors or incorrect format, in order to improve the accuracy of address matching, in this embodiment, the original address is also preprocessed, the server may be a server which provides the original address at will, and the server which provides the original address and the server which performs address matching are different servers.

And step S50, performing illegal character cleaning, redundant address cleaning, wrongly written character replacement and incomplete address filling on the original address to obtain the address to be matched.

Illegal character cleaning refers to the deletion of illegal characters, such as ", () |! @? "etc. characters, which do not belong to the address information, and the illegal characters are cleaned, i.e. deleted, for example, the original address is" drum street of Bijie city, Guizhou city, Jinsha county @ Lijing famous city! ", contains the illegal characters" @ "and"! ", so it is necessary to connect" @ "and"! Deleting to obtain a famous city of the drum street in Jinsha county, Bijie, Guizhou province;

the redundant address cleaning refers to deleting a redundant address, wherein the redundant address is unnecessary address information, such as unnecessary information of a road number, a house number and the like, for example, an original address is 'F1 unit 5 building 4 in the famous city Jinsha county of Bijie city, Guizhou province', and then 'F1 unit 5 building 4' is deleted to obtain 'the famous city Jinsha county of Bijie city, Guizhou province';

the wrongly-written characters replacement is to replace recognized wrongly-written characters, an error correction model is needed during replacement, a sample containing wrongly-written characters can be automatically modeled based on a method for training a model on a large-scale data set to obtain an error correction model, and wrongly-written characters are corrected through the error correction model, for example, the original address is 'the famous city of the drum street in the Jinsha county of Bijiu city of Guizhou province', the 'name' is changed into 'name' through error correction by the error correction model, and 'the famous city of the drum street in the Jinsha county of Bijiu city of Guizhou province is obtained';

the incomplete address complementing means complementing missing address elements in the address information, for example, the original address is 'jinsha county famous city', and since the province, the city and the street of the jinsha county belong to fixed information, the address complementing can be performed on the 'jinsha county famous city', so that 'the honour city of the jinsha county drum street famous city of the honour state province, Bijie county is obtained'.

After the original address is preprocessed in the mode, the obtained address to be matched is more accurate, and therefore the accuracy of address matching can be improved.

In the embodiment, the address to be matched is obtained by obtaining the original address sent by the server and performing illegal character washing, redundant address washing, wrongly written character replacement and incomplete address completion on the original address, so that the address to be matched is more accurate, and the accuracy of address matching is further improved.

Referring to fig. 6, fig. 6 is a schematic diagram of an address matching apparatus according to an embodiment of the present invention, an obtaining module 10 and a determining module 20, wherein:

the acquiring module 10 is configured to acquire at least two target addresses matched with an address to be matched in a standard address set, where the standard address set includes addresses of at least two data sources, and each target address is obtained by matching according to different matching models;

the determining module 20 is configured to determine a confidence of each target address, where the greater the number of data sources matched with the target address, the higher the corresponding confidence is, and determine, according to the confidence of each target address, the target address matched with the address to be matched in all the target addresses.

In an embodiment, the obtaining module 10 is further configured to perform the following steps:

constructing a feature word set according to the candidate feature words;

determining the optimal solution as the first address.

In an embodiment, the determining module 20 is further configured to perform the following steps:

determining the number of the data sources matched with each target address;

acquiring an original address sent by a server;

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for causing an address matching apparatus (which may be a server or other computer device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. An address matching method, characterized in that the address matching method comprises:

2. The address matching method according to claim 1, wherein the step of obtaining at least two target addresses in the standard address set that match the address to be matched comprises:

3. The address matching method of claim 2, wherein the step of determining the first address according to the address to be matched, the standard address set and a preset probability transition matrix model comprises:

constructing a feature word set according to the candidate feature words;

determining the optimal solution as the first address.

4. The address matching method according to claim 1, wherein the step of determining the target address matching the address to be matched among all the target addresses according to the confidence of each of the target addresses comprises:

5. The address matching method of claim 1, wherein after the step of obtaining at least two target addresses in the standard address set that match the address to be matched, the address matching method further comprises:

6. The address matching method of claim 1, wherein the step of determining a confidence level for each of the target addresses comprises:

determining the number of the data sources matched with each target address;

7. The address matching method according to claim 1, wherein, before the step of obtaining at least two target addresses in the standard address set that match the address to be matched, the address matching method further comprises:

acquiring an original address sent by a server;

8. An address matching apparatus, comprising an obtaining module and a determining module, wherein:

9. An address matching apparatus, characterized in that the address matching apparatus comprises a memory, a processor and an address matching program stored on the memory and executable on the processor, the address matching program implementing the steps of the address matching method according to any one of claims 1 to 7 when executed by the processor.

10. A computer-readable storage medium, characterized in that an address matching program is stored on the computer-readable storage medium, which when executed by a processor implements the steps of the address matching method according to any one of claims 1 to 7.