CN113515677B

CN113515677B - Address matching method, device and computer readable storage medium

Info

Publication number: CN113515677B
Application number: CN202110834270.2A
Authority: CN
Inventors: 张强; 高恩伟; 闫岩
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Priority date: 2021-07-22
Filing date: 2021-07-22
Publication date: 2023-10-27
Anticipated expiration: 2041-07-22
Also published as: CN113515677A

Abstract

The invention discloses an address matching method, an address matching device and a computer readable storage medium, wherein the address matching method comprises the following steps: obtaining at least two target addresses matched with an address to be matched in a standard address set, wherein the standard address set comprises at least two addresses of data sources, and each target address is obtained by matching according to different matching models; determining the confidence coefficient of each target address, wherein the more the number of data sources matched with the target address is, the higher the corresponding confidence coefficient is; and determining the target addresses matched with the addresses to be matched in all the target addresses according to the confidence degrees of the target addresses. The invention can improve the accuracy of address matching.

Description

Address matching method, device and computer readable storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to an address matching method, an address matching device, and a computer readable storage medium.

Background

In the communication field, there is an address matching requirement, for example, after the address information such as the mobile base station cell address, the residential community address, the school, the hospital organization address and the like is manually collected, because there may be an inaccurate problem, the collected address needs to be matched with the standard address to obtain a corresponding correct standard address, for example, the cell to be matched is a "basket-sky-garden", the result obtained by calculating the similarity according to the minimum editing distance is a "basket-sky-garden", and the correct result should be a "basket-sky-garden", so that when the similarity is calculated by simply using the minimum editing distance to perform address matching, the matching accuracy is lower, and the invention at least solves the following technical problems: how to improve the accuracy of address matching.

Disclosure of Invention

The invention mainly aims to provide an address matching method, an address matching device and a computer readable storage medium, and aims to solve the technical problem of low accuracy of address matching.

In order to achieve the above object, the present invention provides an address matching method, including:

obtaining at least two target addresses matched with an address to be matched in a standard address set, wherein the standard address set comprises at least two addresses of data sources, and each target address is obtained by matching according to different matching models;

determining the confidence coefficient of each target address, wherein the more the number of data sources matched with the target address is, the higher the corresponding confidence coefficient is;

and determining the target addresses matched with the addresses to be matched in all the target addresses according to the confidence degrees of the target addresses.

Optionally, the step of obtaining at least two target addresses in the standard address set, which are matched with the addresses to be matched, includes:

determining a first target address according to the address to be matched, the standard address set and a preset probability transition matrix model, wherein the preset probability transition matrix model is obtained by training a probability transition matrix training model according to an address training set and the standard address set;

Determining a second target address according to the address to be matched, the standard address set and a preset residual network fusion model, wherein the preset residual network fusion model comprises an embedded layer, a TextRCNN network, a TextCNN network, a residual layer and a preset activation function, the preset residual network fusion model is obtained by training the residual network fusion training model according to the address training set and the standard address set, and the target addresses are the first target address and the second target address respectively.

Optionally, the step of determining the first address according to the address to be matched, the standard address set and a preset probability transition matrix model includes:

candidate feature words with occurrence frequency larger than preset frequency in the standard address set are obtained;

constructing a feature word set according to the candidate feature words;

extracting a feature word sequence corresponding to the address to be matched according to the feature word set, wherein the feature word sequence comprises the candidate feature words and common characters in the address to be matched;

combining the characteristic word elements in the characteristic word sequence according to the target combination length and a preset combination sequence to obtain a characteristic word substring set with the target combination length;

Determining the joint probability corresponding to the feature word sub-string set according to a preset hidden Markov model and the feature word sub-string set of the target combination length, wherein the preset hidden Markov model is obtained by training a hidden Markov training model according to the standard address set, the joint probability corresponding to the feature word sub-string set is obtained according to the feature word transition probability in the hidden Markov model, and the preset transition probability model is the preset hidden Markov model;

when the target combination length is smaller than a preset combination length, increasing the target combination length, and returning to the step of executing the characteristic word elements in the characteristic word sequence to be combined according to the target combination length and the preset combination sequence to obtain a characteristic word substring set of the target combination length;

when the target combination length is greater than or equal to the preset combination length, acquiring the characteristic substring set with the maximum joint probability;

determining an optimal solution according to the characteristic substring set with the maximum joint probability;

and determining the optimal solution as the first address.

Optionally, the step of determining the target address matched with the address to be matched in all the target addresses according to the confidence of each target address includes:

Determining the matching degree of each target address and the address to be matched;

determining the product of the matching degree and the confidence degree corresponding to each target address;

and determining the target address matched with the address to be matched according to the target address corresponding to the largest product.

Optionally, after the step of obtaining at least two target addresses in the standard address set, the address matching method further includes:

performing said step of determining the confidence of each of said target addresses when there are at least two different of said target addresses;

and when the target addresses are identical, determining that the target address is the target address matched with the address to be matched.

Optionally, the step of determining the confidence of each of the target addresses includes:

determining the number of the data sources matched with each target address;

and determining the confidence of the target address according to the quantity.

Optionally, before the step of obtaining at least two target addresses in the standard address set, the address matching method further includes:

acquiring an original address sent by a server;

And performing illegal character cleaning, redundant address cleaning, mispronounced character replacement and incomplete address filling on the original address to obtain the address to be matched.

In addition, in order to achieve the above object, the present invention further provides an address matching device, which includes an acquisition module and a determination module, wherein:

the acquisition module is used for acquiring at least two target addresses matched with the addresses to be matched in a standard address set, the standard address set comprises at least two addresses of data sources, and each target address is obtained by matching according to different matching models;

the determining module is configured to determine a confidence level of each target address, where the greater the number of data sources matched by the target addresses, the higher the corresponding confidence level, and determine, according to the confidence level of each target address, the target addresses matched by the addresses to be matched from all the target addresses.

In addition, in order to achieve the above object, the present invention also provides an address matching device, which includes a memory, a processor, and an address matching program stored in the memory and executable on the processor, wherein the address matching program, when executed by the processor, implements the steps of the address matching method described in any one of the above.

In addition, in order to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon an address matching program which, when executed by a processor, implements the steps of the address matching method of any one of the above.

According to the address matching method, the device and the computer readable storage medium, the confidence of each target address is determined by acquiring at least two target addresses matched with the address to be matched in the standard address set, and the target addresses matched with the address to be matched are determined in all target addresses according to the confidence of each target address, wherein the standard address set comprises the addresses of at least two data sources, each target address is respectively obtained according to different matching models, the more the number of data sources matched by the target address is, the higher the corresponding confidence is, because in the matching process, the more the number of data sources matched by the target address is, the higher the corresponding confidence is, the lower the matching target address is, so that the problem that the accuracy is low due to the fact that the matching address is obtained by calculating the similarity by simply adopting the minimum editing distance when the matching is carried out by adopting the single address data source is avoided, and the accuracy of the address matching can be effectively improved.

Drawings

FIG. 1 is a schematic diagram of a device architecture of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart of a first embodiment of an address matching method according to the present invention;

FIG. 3 is a flowchart of a second embodiment of an address matching method according to the present invention;

FIG. 4 is a flowchart of a third embodiment of an address matching method according to the present invention;

FIG. 5 is a flowchart of a fourth embodiment of an address matching method according to the present invention;

fig. 6 is a schematic diagram of a functional module of the address matching device of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic diagram of an apparatus structure of a hardware running environment according to an embodiment of the present invention.

The address matching device according to the embodiment of the invention may be a server, a terminal device or other computer devices.

As shown in fig. 1, the apparatus may include: a processor 1001, such as a CPU, memory 1002, a communications bus 1003. Wherein the communication bus 1003 is used to enable connectivity communications between these components. The memory 1003 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1003 may alternatively be a storage device separate from the processor 1001 described above.

It will be appreciated by those skilled in the art that the device structure shown in fig. 1 is not limiting of the device and may include more or fewer components than shown, or may be combined with certain components, or a different arrangement of components.

As shown in fig. 1, an operating system and an address matching program may be included in a memory 1003 as one type of computer storage medium.

In the apparatus shown in fig. 1, the processor 1001 may be configured to call an address matching program stored in the memory 1003, and perform the following operations:

Further, the processor 1001 may call an address matching program stored in the memory 1003, and further perform the following operations:

constructing a feature word set according to the candidate feature words;

and determining the optimal solution as the first address.

determining the number of the data sources matched with each target address;

And determining the confidence of the target address according to the quantity.

acquiring an original address sent by a server;

Referring to fig. 2, a first embodiment of the present invention provides an address matching method, which includes:

step S10, at least two target addresses matched with an address to be matched in a standard address set are obtained, the standard address set comprises at least two addresses of data sources, and each target address is obtained by matching according to different matching models;

in this embodiment, the execution body is an address matching device, and the address matching device may specifically be a server, or may be a terminal device, or may also be other computer devices; the standard address set is a set formed by preset standard addresses, in this embodiment, addresses are obtained from more than two data sources as standard addresses, where the data sources refer to data sources providing standard addresses, such as map service providers, for example, a gao-de map, a hundred-degree map, a Tech-map, etc., where, when the standard addresses are obtained from the map service providers, the standard addresses may be specifically crawled from application program interfaces (Application Programming Interface, APIs) of corresponding maps on respective map services, each standard address may include a plurality of labels, for example, each standard address corresponds to a provincial, municipal, county, street community, and cell-level five-level label, and each standard address may further include labels of other levels, such as "road patch", "street number, village group", "detailed address", and the labels of the levels included in the standard addresses may be fewer or more than the above examples; taking the example of crawling label information of all cell levels in Guizhou province through a hundred degree map API, the URL of the hundred degree map is http:// API. The access parameters comprise longitude and latitude, a developer access key, a search keyword, an output format and the like, the size of a longitude and latitude grid is adjusted until five-level tag information of all cells is obtained, standard addresses corresponding to hundred-degree map APIs are obtained, the five-level tag information of all cells can be further obtained from the Goldmap APIs and the hundred-degree map APIs in a similar mode, the standard addresses corresponding to the Goldmap APIs and the standard addresses corresponding to the hundred-degree map APIs are obtained respectively, the standard addresses corresponding to different map APIs are combined into a standard address set, in addition, the standard addresses can be obtained from other data sources of storage address information, and the standard address set comprises the addresses of at least two data sources; the to-be-matched address is an address to be matched, based on different matching requirements, the to-be-matched address can be obtained based on various modes, for example, the to-be-matched address is obtained from a magic hundred box life service platform, so as to match the address of the platform to obtain a corresponding target address, the target address is an address matched with the to-be-matched address, specifically, the target address is a standard address matched with the to-be-matched address in a standard address set, the to-be-matched address is matched to obtain at least two target addresses, each target address is obtained by matching according to different matching models, the number of the matching models is more than two, each matching model can be matched by combining the to-be-matched address to obtain the corresponding target address, and the matching model can be an address matching model realized by various machine learning technologies, for example: the method comprises the steps of constructing an address matching model based on a deep learning model, an address matching model based on interest knowledge point map pre-training, an address matching model based on a probability transition matrix, an address matching model based on a residual network fusion model and the like, wherein the address matching mechanisms adopted by different matching models are different, so that at least two target addresses can be obtained when standard addresses of different data sources are matched.

Step S20, determining the confidence coefficient of each target address, wherein the more the number of data sources matched with the target address is, the higher the corresponding confidence coefficient is;

the confidence is used to indicate the confidence level of the target address, in this embodiment, the more the number of data sources matched by the target address is, the higher the corresponding confidence level, that is, the higher the confidence level of the target address is, wherein after at least two target addresses are obtained by matching based on different matching models, the target address may be matched with only one data source or may be matched with more than two data sources, and the meaning of matching the target address with the data source is that the target address is included in the standard address corresponding to the data source.

When determining the confidence levels corresponding to the target addresses, the standard addresses corresponding to different data sources in the standard address set may be classified first, for example, if there are three data sources, the standard addresses may belong to the intersection of the three data sources, at this time, the standard addresses may be classified into a first class, if the standard addresses belong to the intersection of two data sources, the standard addresses may be classified into a second class, if the standard addresses belong to only one data source, the standard addresses may be classified into a third class, and for fewer or more data sources, the standard addresses may also be classified into different classes in a similar manner, and when obtaining the target addresses, since the target addresses are standard addresses matched with the to-be-matched addresses, the types of the target addresses may be obtained based on the types of the association of the standard addresses, the confidence levels may be determined in advance based on the types, thereby determining the confidence levels corresponding to each type of the target addresses, or the confidence levels may be determined in advance according to the number of the confidence levels, and the number of the confidence levels may be determined according to the number of the confidence levels.

When at least two different target addresses exist, the step of determining the confidence coefficient of each target address is executed, and when each target address is identical, the target address is determined to be the target address matched with the address to be matched, so that the accuracy of address matching can be improved.

And step S30, determining the target addresses matched with the addresses to be matched in all the target addresses according to the confidence degrees of the target addresses.

After the confidence coefficient of each target address is obtained, the target address with the highest confidence coefficient in all the target addresses can be directly used as the target address matched with the address to be matched, or the target address matched with the address to be matched can be further obtained by combining the confidence coefficient with the matching coefficient of the target address and the address to be matched, so that the accuracy of address matching is improved.

In this embodiment, the confidence level of each target address is determined by obtaining at least two target addresses matched with the address to be matched in the standard address set, and the target addresses matched with the address to be matched are determined in all target addresses according to the confidence level of each target address, where the standard address set includes at least two addresses of data sources, each target address is obtained by matching according to different matching models, and the more the number of data sources matched by the target address is, the higher the corresponding confidence level is, because in the matching process, the more the confidence level is based on the confidence level corresponding to the target addresses obtained by matching of different matching models, the more the number of data sources matched by the target address is, and the higher the corresponding confidence level is, so that when matching is performed by adopting a single address data source, the problem that the accuracy is low due to the fact that the matching address is obtained by simply adopting the minimum editing distance to calculate the similarity is avoided, and the accuracy of address matching can be effectively improved.

Referring to fig. 3, a second embodiment of the present invention provides an address matching method, based on the first embodiment shown in fig. 2, the step S10 includes:

step S11, determining a first target address according to the address to be matched, the standard address set and a preset probability transition matrix model, wherein the preset probability transition matrix model is obtained by training a probability transition matrix training model according to an address training set and the standard address set;

when determining the first target address according to the address to be matched, the standard address set and the preset probability transition matrix model, the following manner can be adopted:

candidate feature words with occurrence frequency larger than preset frequency in the standard address set are obtained, and a feature word set is constructed according to the candidate feature words;

the preset frequency is the frequency of occurrence of keywords or the number of occurrence of keywords, the keywords or the keywords with the frequency greater than the preset frequency are candidate feature words, the preset frequency is 10000, for example, the selected candidate feature word process feature word set is: q= { provincial district street community village and town district };

combining hidden Markov models (Hidden Markov Model, HMM) involves a probability transition matrix model, known as i _N The parameter set T of N e T is a discrete set of times, T e {1,2,..n, once.. } wherein i _N The state space of possible valued components is a set of discrete feature words q= { Q ₁ ，q ₂ ，...，q _N Let its transition probability matrix be a= [ p ] _ij ] _N*N Since HMM is a probabilistic model about time sequence, its former state is related only to the next following state, namely:

p _ij ＝P(i _t+1 ＝q _j |i _t ＝q _i ),i,j＝1,2,…,N；

counting the feature word transfer times of adjacent layers from the time t to the time t+1, and marking as follows:

N(i _t+1 ＝q _j |i _t ＝q _i ),i,j＝1,2,…,N；

let the corresponding transfer weight be a (i) _t+1 -q _j |i _t -q _i ) In order to avoid the situation that the calculation cannot be performed due to the fact that the transfer times are 0, the following method is adopted to process the transfer weight:

where i, j=1, 2,..<m<log ₂ N(i _t+1 ＝q _j |i _t ＝q _i ) _min And then analyzing and comparing experimental results under the condition of different values of m to set the value of m, and further calculating the probability pij corresponding to the feature word transfer weight by using a softMax function, wherein the specific calculation mode is as follows:

extracting the characteristics corresponding to the address to be matched according to the characteristic word set QA word sequence, wherein, by traversing the cell address to be matched, extracting the characteristic word, and obtaining a characteristic word sequence O= (O) ₁ ，o ₂ ，...，o _i ，...，o _L )，O _i The method comprises the steps of (1) combining characteristic word elements in a characteristic word sequence according to a target combination length and a preset combination sequence to obtain a characteristic word substring set with the target combination length, wherein the preset combination sequence can be an extraction sequence, combining characteristic word elements in the characteristic word sequence according to the extraction sequence, wherein the combination length is l= (2 is less than or equal to L is less than or equal to L-1), and classifying characteristic word substrings with the same length to obtain the characteristic word substring set with the combination length of L: { O _l |O _l ＝(o ₁ ，o ₂ ，...，o _l ) L=2, 3,..l-1 }, the joint probability for each substring in the set Ol is calculated by:

the substring with the largest combination probability of the characteristic word substring with the combination length of l is calculated by adopting the following method:

determining the joint probability corresponding to the characteristic word sub-string set according to a preset hidden Markov model and a characteristic word sub-string set with a target combination length L, training the hidden Markov training model according to a standard address set by the preset hidden Markov model, obtaining the joint probability corresponding to the characteristic word sub-string set according to the characteristic word transition probability in the hidden Markov model, increasing the target combination length when the target combination length is smaller than the preset combination length, setting the preset combination length to be L-1, and executing the characteristic word training according to the target combination length and the preset combination sequenceCombining the characteristic word elements in the sequence to obtain a characteristic word substring set with a target combination length; thereby obtaining the characteristic word substring set with the maximum joint probability corresponding to all the lengths l, namely MAX= { MaxQ ₂ ，MaxQ ₃ ，...，MaxQ _L-1 When the target combination length is greater than or equal to the preset combination length, acquiring a characteristic substring set with the maximum joint probability, determining an optimal solution according to the characteristic substring set with the maximum joint probability, and determining the optimal solution as a first address; let i=2 MaxQ respectively _i And i=2 MaxQ _i+1 Traversing the elements in the maximum joint probability word sub-string set MAX, recording the number of feature words with the same sequence in the sub-strings with adjacent lengths as ni, wherein i=2, 3, & gt, L-2, judging the optimal solution OS, enabling i=2, and if n _i If less than or equal to 1, os=maxq _i+1 If n _i >1, os=maxq _i Until i=l-2.

Step S12, determining a second target address according to the address to be matched, the standard address set and a preset residual network fusion model, wherein the preset residual network fusion model comprises an embedded layer, a textRCNN network, a textCNN network, a residual layer and a preset activation function, the preset residual network fusion model is obtained by training the residual network fusion training model according to the address training set and the standard address set, and the target addresses are the first target address and the second target address respectively.

In this embodiment, in order to improve the accuracy of address matching, the adopted matching model is a preset probability transition matrix model and a preset residual network fusion model, and is respectively matched with a to-be-matched model to obtain a first target address and a second target address, the number of the target addresses is two, and the first target address and the second target address are respectively determined, and according to the confidence coefficient of the first target address and the confidence coefficient of the second target address, the target address matched with the to-be-matched address is determined in the first target address and the second target address, wherein the target address with high confidence coefficient can be directly matched with the to-be-matched address, or the first matching coefficient of the first target address and the to-be-matched address can be combined with the second matching coefficient of the second target address and the to-be-matched address, and the target address matched with the to-be-matched address is determined, wherein a first product of the first matching coefficient and a second product of the second matching coefficient are respectively determined, and the target address corresponding to the larger product is compared as the target address matched with the to be-matched address.

In this embodiment, the first target address is determined according to the address to be matched, the standard address set and the preset probability transfer matrix model, the preset probability transfer matrix model is obtained by training the probability transfer matrix training model according to the address training set and the standard address set, the second target address is determined according to the address to be matched, the standard address set and the preset residual network fusion model, the preset residual network fusion model comprises an embedded layer, a TextRCNN network, a TextCNN network, a residual layer and a preset activation function, the preset residual network fusion model is obtained by training the residual network fusion training model according to the address training set and the standard address set, and the target addresses are the first target address and the second target address respectively, so that the accuracy of address matching is improved.

Referring to fig. 4, a third embodiment of the present invention provides an address matching method, based on the first embodiment shown in fig. 2, the step S30 includes:

step S31, determining the matching degree of each target address and the address to be matched;

in order to further improve the accuracy of address matching, in this embodiment, the target address matched with the address to be matched is further obtained based on the matching degree of the target address and the address to be matched and by combining the confidence corresponding to the target address; the matching degree of the target address and the address to be matched can be calculated by determining the word numbers of the same characters and the word numbers of different characters of the address to be matched and the target address, calculating the sum of the word numbers of the same characters and the word numbers of different characters, determining the ratio of the word numbers of the same characters to the sum, and taking the ratio as the matching degree.

Step S32, determining the product of the matching degree corresponding to each target address and the confidence degree;

and step S33, determining the target address matched with the address to be matched according to the target address corresponding to the maximum product.

Taking the target address corresponding to the maximum product as the target address to be matched, for example, the number of the target addresses is two, the number of the target addresses to be matched is a, the target addresses are b and c, the matching degree corresponding to the target address b is P (b), the matching degree corresponding to the target address c is P (c), the confidence degree corresponding to the target address b is 0.9, the confidence degree corresponding to the target address is 0.8, the product of the matching degree corresponding to the target address b and the confidence degree is 0.9 x P (b), the product of the matching degree corresponding to the target address c and the confidence degree is 0.8P (c), wherein P (b) is 0.9, and P (c) is 0.8.

In this embodiment, the product of the matching degree and the confidence degree corresponding to each target address is determined by determining the matching degree of each target address and the address to be matched, and the target address to be matched by the address to be matched is determined according to the target address corresponding to the largest product, so that the accuracy of address matching can be further improved by combining the matching degree and the confidence degree.

Referring to fig. 5, a fourth embodiment of the present invention provides an address matching method, based on the first embodiment shown in fig. 2, before step S10, the address matching method further includes:

step S40, obtaining an original address sent by a server;

in this embodiment, the original address is also preprocessed, and the server may be any server that provides the original address, and the server that provides the original address and the server that performs address matching are different servers, because the original address may have a character error or an incorrect format, etc. in order to improve the accuracy of address matching.

And S50, performing illegal character cleaning, redundant address cleaning, mispronounced character replacement and incomplete address filling on the original address to obtain the address to be matched.

Illegal character cleaning means that illegal characters such as ", () ] are deleted! ? "equal characters, which do not belong to address information, are cleaned, i.e. deleted, for example, the original address is" Jinsha county drumhead street @ Jing Mingcheng-! ", contains the illegal character" @ ", and" | -! ", thus" @ "and" | -! Deleting to obtain 'Jinsha county drummer street li Jing Mingcheng' in Pijie city in Guizhou;

The redundant address cleaning means that the redundant address is deleted, the redundant address is unnecessary address information such as unnecessary road number, house number and the like, for example, the original address is 'Jinsha county of Pijingjingjingli Jing Mingcheng F1 unit 5 building 4' which is the Guizhou province, and the redundant address 'F1 unit 5 building 4' is deleted to obtain 'Jinsha county Jing Mingcheng' which is the Guizhou Pijingjingjingjingjingli;

the mispronounced word replacement refers to the replacement of the identified mispronounced word, an error correction model is needed during the replacement, a model training method based on a large-scale data set can be adopted to automatically model a sample containing the mispronounced word to obtain an error correction model, error correction of the mispronounced word is carried out through the error correction model, for example, an original address is "street li Jing Mingcheng of Jinsha county in Pijingjingshi city in Guizhou", and the error correction is carried out through the error correction model, so that "Ming" is changed into "name", and "street li Jing Mingcheng" of Jinsha county in Guizhou Jijingjingshi is obtained;

incomplete address filling refers to filling in the address elements missing in the address information, for example, the original address is "jinsha county li Jing Mingcheng", and since the province, the city and the street where jinsha county is located belong to fixed information, address filling can be performed on "jinsha county li Jing Mingcheng" to obtain "jinsha county drum field street li Jing Mingcheng" in the bureau city of Guizhou province.

After the original address is preprocessed in the mode, the obtained address to be matched is more accurate, and therefore the accuracy of address matching can be improved.

In this embodiment, the address to be matched is obtained by obtaining the original address sent by the server, and performing illegal character cleaning, redundant address cleaning, misprinted word replacement and incomplete address filling on the original address, so that the address to be matched is more accurate, and the accuracy of address matching is further improved.

Referring to fig. 6, fig. 6 is a schematic diagram of an embodiment of an address matching device according to the present invention, an obtaining module 10 and a determining module 20, where:

the acquiring module 10 is configured to acquire at least two target addresses matched with an address to be matched in a standard address set, where the standard address set includes addresses of at least two data sources, and each target address is obtained by matching according to different matching models;

the determining module 20 is configured to determine a confidence level of each target address, where the greater the number of data sources matched by the target address, the higher the corresponding confidence level, and determine, according to the confidence level of each target address, the target address matched by the address to be matched from all the target addresses.

In an embodiment, the obtaining module 10 is further configured to perform the following steps:

constructing a feature word set according to the candidate feature words;

and determining the optimal solution as the first address.

In an embodiment, the determining module 20 is further configured to perform the following steps:

determining the number of the data sources matched with each target address;

and determining the confidence of the target address according to the quantity.

Acquiring an original address sent by a server;

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing an address matching device (which may be a server or other computer device) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. An address matching method, characterized in that the address matching method comprises:

obtaining at least two target addresses matched with addresses to be matched in a standard address set, wherein the standard address set comprises addresses of at least two data sources, each target address is obtained by matching according to different matching models, the standard address set is a set formed by preset standard addresses, the data sources provide standard addresses, and each standard address comprises a plurality of labels;

determining the target addresses matched with the addresses to be matched in all the target addresses according to the confidence degrees of the target addresses;

the step of obtaining at least two target addresses matched with the addresses to be matched in the standard address set comprises the following steps:

Determining a first target address according to the address to be matched, the standard address set and a preset probability transfer matrix model, wherein the preset probability transfer matrix model is obtained by training the probability transfer matrix model according to an address training set and the standard address set, the preset probability transfer matrix model is a preset hidden Markov model, and the preset hidden Markov model is obtained by training the hidden Markov model according to the standard address set;

determining a second target address according to the address to be matched, the standard address set and a preset residual network fusion model, wherein the preset residual network fusion model comprises an embedded layer, a TextRCNN network, a TextCNN network, a residual layer and a preset activation function, the preset residual network fusion model is obtained by training the residual network fusion model according to the address training set and the standard address set, and the target addresses are the first target address and the second target address respectively.

2. The address matching method as claimed in claim 1, wherein the step of determining the first target address according to the address to be matched, the standard address set, and a preset probability transition matrix model comprises:

constructing a feature word set according to the candidate feature words;

determining the joint probability corresponding to the characteristic word sub-string set according to a preset hidden Markov model and the characteristic word sub-string set of the target combination length, wherein the joint probability corresponding to the characteristic word sub-string set is obtained according to the characteristic word transition probability in the hidden Markov model;

When the target combination length is greater than or equal to the preset combination length, acquiring the characteristic word sub-string set with the maximum joint probability;

determining an optimal solution according to the characteristic word sub-string set with the maximum joint probability;

and determining the optimal solution as the first target address.

3. The address matching method as claimed in claim 1, wherein the step of determining the target address matching the address to be matched among all the target addresses according to the confidence of each of the target addresses comprises:

4. The address matching method as claimed in claim 1, wherein after the step of obtaining at least two target addresses in the standard address set that match the addresses to be matched, the address matching method further comprises:

5. The address matching method of claim 1, wherein said step of determining a confidence level for each of said target addresses comprises:

determining the number of the data sources matched with each target address;

and determining the confidence of the target address according to the quantity.

6. The address matching method as claimed in claim 1, wherein before the step of obtaining at least two target addresses in the standard address set that match the addresses to be matched, the address matching method further comprises:

acquiring an original address sent by a server;

7. The address matching device is characterized by comprising an acquisition module and a determination module, wherein:

the acquisition module is used for acquiring at least two target addresses matched with addresses to be matched in a standard address set, the standard address set comprises addresses of at least two data sources, each target address is obtained by matching according to different matching models, the standard address set is a set formed by preset standard addresses, the data sources provide standard addresses, and each standard address comprises a plurality of labels;

The determining module is used for determining the confidence coefficient of each target address, the more the number of data sources matched with the target addresses is, the higher the corresponding confidence coefficient is, and the target addresses matched with the addresses to be matched are determined in all the target addresses according to the confidence coefficient of each target address;

8. An address matching device comprising a memory, a processor and an address matching program stored on the memory and executable on the processor, the address matching program when executed by the processor implementing the steps of the address matching method according to any one of claims 1 to 6.

9. A computer-readable storage medium, on which an address matching program is stored, which when executed by a processor implements the steps of the address matching method according to any one of claims 1 to 6.