CN110895651A - Address standardization processing method, device, equipment and computer readable storage medium - Google Patents

Address standardization processing method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN110895651A
CN110895651A CN201810965153.8A CN201810965153A CN110895651A CN 110895651 A CN110895651 A CN 110895651A CN 201810965153 A CN201810965153 A CN 201810965153A CN 110895651 A CN110895651 A CN 110895651A
Authority
CN
China
Prior art keywords
address
sub
text
level
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810965153.8A
Other languages
Chinese (zh)
Other versions
CN110895651B (en
Inventor
王翔
张雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Financial Technology Holding Co Ltd
Original Assignee
Beijing Jingdong Financial Technology Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Financial Technology Holding Co Ltd filed Critical Beijing Jingdong Financial Technology Holding Co Ltd
Priority to CN201810965153.8A priority Critical patent/CN110895651B/en
Publication of CN110895651A publication Critical patent/CN110895651A/en
Application granted granted Critical
Publication of CN110895651B publication Critical patent/CN110895651B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides an address standardization processing method, an address standardization processing device, address standardization processing equipment and a computer readable storage medium, wherein the method comprises the following steps: receiving an address text to be processed; marking the level of each sub-address in the address text to be processed through a preset neural network model to obtain the marked address text to be processed; and processing the sub-addresses according to a preset standard address library aiming at each sub-address in the marked address text to be processed to obtain a standard address corresponding to the address text to be processed. Therefore, the standard address information corresponding to the address text to be processed can be quickly and accurately determined, the accuracy of address standardization can be improved, and in addition, the manual maintenance cost of the address text can be reduced.

Description

Address standardization processing method, device, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to an address standardization processing method, apparatus, device, and computer-readable storage medium.
Background
Since the address information has the problems of diversified description, inaccuracy, description error, homophones, shorthand and the like of the same address, the subsequent service based on the address information becomes extremely difficult. For example, with the development of science and technology, online shopping gradually becomes the mainstream of current shopping for users, and generally, a user selects a commodity to be purchased on the network, and the commodity is sent to a mailing address of the user through express delivery after paying, so that the correctness of the information of the receiving address of the user becomes very important in order to ensure that the commodity purchased by the user can be accurately and quickly sent. If the address is wrongly or briefly written, the commodity of the user cannot be sent to the user mailing address, so that poor consumption experience is brought to the user, and on the other hand, a certain degree of customer loss is caused for the merchant. Therefore, how to standardize the user address information is an urgent technical problem to be solved.
In the prior art, generally, address information of a user is checked manually, and erroneous address information is corrected and missing information is supplemented.
However, because the number of address information to be processed is large and the manual processing speed is limited, the adoption of the method for standardizing the address information often has the technical problems of low processing efficiency, waste of human resources and high maintenance cost.
Disclosure of Invention
The invention provides an address standardization processing method, device and equipment and a computer readable storage medium, which are used for solving the technical problems of low address standardization efficiency and high manual maintenance cost caused by manually correcting and supplementing address text information in the prior art.
The first aspect of the present invention provides an address standardization processing method, including:
receiving an address text to be processed;
marking the level of each sub-address in the address text to be processed through a preset neural network model to obtain the marked address text to be processed;
and processing the sub-addresses according to a preset standard address library aiming at each sub-address in the marked address text to be processed to obtain a standard address corresponding to the address text to be processed.
Another aspect of the present invention provides an address normalization processing apparatus, including:
the first receiving module is used for receiving the address text to be processed;
the first labeling module is used for labeling the level of each sub-address in the address text to be processed through a preset neural network model to obtain a labeled address text to be processed;
and the processing module is used for processing the subaddresses in the labeled address text to be processed according to a preset standard address library to obtain a standard address corresponding to the address text to be processed.
Still another aspect of the present invention is to provide an address normalization processing apparatus, including: a memory, a processor;
a memory; a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the address normalization processing method as described above by the processor.
Yet another aspect of the present invention is to provide a computer-readable storage medium having stored therein computer-executable instructions for implementing the address normalization processing method as described above when the computer-executable instructions are executed by a processor.
The address standardization processing method, the device, the equipment and the computer readable storage medium provided by the invention receive the address text to be processed; marking the level of each sub-address in the address text to be processed through a preset neural network model to obtain the marked address text to be processed; and processing the sub-addresses according to a preset standard address library aiming at each sub-address in the marked address text to be processed to obtain a standard address corresponding to the address text to be processed. Therefore, the standard address information corresponding to the address text to be processed can be quickly and accurately determined, the accuracy of address standardization can be improved, and in addition, the manual maintenance cost of the address text can be reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a schematic flowchart of an address normalization processing method according to an embodiment of the invention;
fig. 2 is a schematic flowchart of an address normalization processing method according to a second embodiment of the present invention;
fig. 3 is a schematic flowchart of an address normalization processing method according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an address normalization processing apparatus according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of an address normalization processing apparatus according to a fifth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other examples obtained based on the examples in the present invention are within the scope of the present invention.
Fig. 1 is a schematic flow chart of an address normalization processing method according to an embodiment of the present invention, as shown in fig. 1, the method includes:
step 101, receiving an address text to be processed.
In this embodiment, because the address information has the problems of diversified, inaccurate, wrong description, homophones, shorthand and the like in the same address description, the subsequent service based on the address information becomes very difficult, for example, if the subsequent service is a logistics service, if the address information input by the user is inaccurate, the logistics may not be normally transported to the user's hand; if the subsequent service is a navigation service; if the address information input by the user is inaccurate, it may be impossible to plan a correct route for the user or a wrong route for the user according to the address information. Therefore, in order to improve the service quality and improve the user experience, after receiving the address information text input by the user, the address information text needs to be standardized, that is, the non-standard address information text input by the user is converted into standard and correct address information.
And 102, marking the level of each sub-address in the address text to be processed through a preset neural network model, and obtaining the marked address text to be processed.
In this embodiment, any one of the address texts to be processed includes a plurality of sub-addresses, for example, in beijing, tokyo, kazakh, kozakh, undecahbound, and building a, which are different sub-addresses respectively, and it is understood that different sub-addresses have different levels, for example, beijing, tokyo, and bulgah are ranked in the city, and the tokyo is ranked in the district. Therefore, in order to conveniently standardize the address text to be processed, after the address text to be processed input by the user is received, the levels of the sub-addresses in the address text to be processed can be labeled through the preset neural network model, and the labeled address text to be processed with different levels can be obtained.
And 103, processing the subaddresses in the labeled address text to be processed according to a preset standard address library to obtain a standard address corresponding to the address text to be processed.
In this embodiment, after the level of each sub-address in the address text to be processed is labeled, the labeled address text to be processed and a preset standard address library may be processed, so as to achieve standardization of the address text to be processed. Specifically, the address text to be processed input by the user may be compared with the standard address library, so that the correction of the error information in the address text to be processed and the supplement of the missing information are realized, and thus the standard address corresponding to the address text to be processed can be obtained, that is, the standardization of the address text to be processed is realized. The standard address library comprises standard names of all current address information, stores the standard names according to different levels, and stores association relations among the addresses of the levels.
In the address standardization processing method provided by the embodiment, the address text to be processed is received; marking the level of each sub-address in the address text to be processed through a preset neural network model to obtain the marked address text to be processed; and processing the sub-addresses according to a preset standard address library aiming at each sub-address in the marked address text to be processed to obtain a standard address corresponding to the address text to be processed. Therefore, the standard address information corresponding to the address text to be processed can be quickly and accurately determined, the accuracy of address standardization can be improved, and in addition, the manual maintenance cost of the address text can be reduced.
Further, on the basis of the above embodiment, the method further includes:
receiving an address text to be processed;
training a preset model to be trained through the text to be trained after each sub-address is labeled to obtain the preset neural network model;
marking the level of each sub-address in the address text to be processed through a preset neural network model to obtain the marked address text to be processed;
and processing the sub-addresses according to a preset standard address library aiming at each sub-address in the marked address text to be processed to obtain a standard address corresponding to the address text to be processed.
In this embodiment, after receiving the address text to be processed sent by the user, the rank of the sub-address in the address text to be processed needs to be identified through a preset neural network model. Therefore, before identifying the level of the sub-address in the address text to be processed, a preset neural network model needs to be established first. Specifically, the text to be trained after labeling each sub-address can be obtained, the labeled text to be processed is respectively subjected to a training set and a testing set at random, parameters of the model to be trained are continuously adjusted until the identification result output by the model to be trained is accurate enough, the preset neural network model is obtained, and therefore labeling of the text to be processed can be achieved subsequently according to the neural network model, the labeled text to be processed and a preset standard address library are processed, and standardization of the text to be processed is achieved.
In the address standardization processing method provided by this embodiment, the preset to-be-trained model is trained through the to-be-trained text labeled on each sub-address, so as to obtain the preset neural network model, so that the neural network model can be used to label the to-be-processed address text, and a basis is provided for subsequent address standardization.
Further, on the basis of any of the above embodiments, the method further includes:
receiving an address text to be processed;
receiving a text to be trained;
removing useless punctuation marks in the text to be trained;
segmenting words of the text to be trained without useless punctuations to obtain sub-addresses corresponding to the text to be trained;
marking the level of each sub-address in the text to be trained;
training a preset model to be trained through the text to be trained after each sub-address is labeled to obtain the preset neural network model;
marking the level of each sub-address in the address text to be processed through a preset neural network model to obtain the marked address text to be processed;
and processing the sub-addresses according to a preset standard address library aiming at each sub-address in the marked address text to be processed to obtain a standard address corresponding to the address text to be processed.
In this embodiment, before training a preset model to be trained through a text to be trained after labeling each sub-address, the text to be trained needs to be processed. Specifically, in some cases, the text to be trained input by the user may include useless punctuation marks, for example, the useless punctuation marks may be "/", and so on, and therefore, in order to improve the efficiency of subsequent standardization, the useless punctuation marks in the text to be trained need to be removed first. Further, in the process of training the model, the model to be trained may be processed for each individual character, but since the combination of characters in the text to be trained has a specific meaning, in order to improve the accuracy of model identification, the text to be trained also needs to be participled to obtain a plurality of sub-addresses corresponding to the text to be trained. For example, the word of mansion a of the country of great happy area, also banked with the scientific name of eleven street, may be divided into the mansion a of great happy area, also banked with the country of great happy area, also banked with the scientific name of eleven, street, a, and seven sub-addresses corresponding to the text to be trained may be obtained. After the text to be trained is segmented into a plurality of sub-addresses, the sub-addresses can be labeled in grades. And training a preset model to be trained through the text to be trained after each sub-address is labeled, and processing the labeled text to be processed and a preset standard address library to realize the standardization of the text to be processed.
It should be noted that there are various methods for removing useless punctuations in the text to be trained, and any method may be adopted to remove the useless punctuations, which is not limited herein.
According to the address standardization processing method provided by the embodiment, useless characters in the text to be trained are removed in advance, and the text to be trained is segmented, so that the efficiency of subsequent model training can be improved, and a basis is provided for address standardization.
Further, there are various methods for removing useless punctuations from the text to be trained, and specifically, the useless punctuations in the text to be trained can be removed by a regular matching method.
Fig. 2 is a schematic flow chart of an address normalization processing method according to a second embodiment of the present invention, where on the basis of any of the foregoing embodiments, as shown in fig. 2, the method further includes:
step 201, receiving an address text to be processed;
step 202, receiving a text to be trained;
step 203, removing useless punctuation marks in the text to be trained;
step 204, segmenting the text to be trained without useless punctuations to obtain sub-addresses corresponding to the text to be trained;
step 205, encoding each sub-address according to a preset encoding mode;
step 206, converting each sub-address and the code corresponding to each sub-address into a text vector and a code vector through a preset vector conversion model, and storing the text vector and the code vector in a correlation manner;
step 207, establishing a model for the text vector and the coding vector corresponding to each sub-address through a preset incidence relation to establish the incidence relation between the text vector and the coding vector of the adjacent sub-address;
step 208, marking the level of each subaddress after the association relationship is established according to the preset subaddress level;
step 209, training a preset model to be trained through the text to be trained after each sub-address is labeled, and obtaining the preset neural network model;
step 210, labeling the level of each sub-address in the address text to be processed through a preset neural network model, and obtaining a labeled address text to be processed;
and step 211, aiming at each sub-address in the labeled address text to be processed, processing the sub-address according to a preset standard address library to obtain a standard address corresponding to the address text to be processed.
In this embodiment, after removing useless characters in the text to be trained in advance and performing word segmentation on the text to be trained, in order to further enhance the integrity of the sub-addresses, each sub-address after word segmentation may be encoded in a preset encoding manner. Specifically, the first character in the sub-address may be coded as 1, the last character in the sub-address may be coded as 3, and any number of characters in the middle may be coded as 2, for example, the code corresponding to beijing is 13; the corresponding code of the great happy area is 123; the XX building corresponds to code 1223. It should be noted that, there may be multiple encoding manners, and any manner capable of enhancing the integrity of the sub-address may be selected to implement the encoding of the sub-address, which is not limited herein.
Furthermore, because the coded text to be trained needs to be input into the model to be trained to train the model, the text to be trained and the coding information corresponding to the text to be trained also need to be converted into a language that can be recognized by the model, and therefore, the text to be trained and the coding information corresponding to the text to be trained can be converted into a text vector and a coding vector through a preset vector conversion model. Because the text vectors and the encoding vectors of different sub-addresses are in one-to-one correspondence, in order to represent the correspondence between the text vectors and the encoding vectors, the text vectors and the encoding vectors need to be stored in association, and the text vectors and the encoding vectors after the storage in association are marked as (v)11,v12,…,v1n). It should be noted that there may be a plurality of vector conversion manners, and any manner capable of implementing vector conversion may be selected to implement vector conversion of the text to be trained and the coding information corresponding to the text to be trained, which is not limited herein.
Further, since one text to be trained includes at least one sub-address, and there is an association relationship between the sub-addresses, in order to strengthen the association structure of the text to be trained, for the text vector and the encoding vector corresponding to each sub-address, it is necessary to associate the stored text vector and the encoding vector (v)11,v12,…,v1n) Adding the data into a preset incidence relation establishing model, establishing incidence relation between text vectors and coding vectors corresponding to the current sub-address and the adjacent sub-address, and establishing the incidence relation between the text vectors and the coding vectorsThe vector for establishing the association relationship is marked as (v)21,v22,…,v2n) Then, subsequently, for each sub-address, the information of the previous and next sub-addresses can be determined according to the sub-address. For example, still using the great district of Beijing to also village to create the eleven street A mansion, aiming at the great district of subaddress, the subaddress before the great district can be determined to be Beijing and the subaddress after the great district can be determined to be also village according to the association relationship. It should be noted that any association relationship establishment model may be adopted to implement enhancement of association relationship between sub-addresses, and the present invention is not limited herein. For example, the Bi-LSTM model can be used to enhance the association relationship between the sub-addresses.
Further, after the incidence relation between the text vector and the coding vector of the adjacent sub-address is established through a preset incidence relation establishing model, the level marking can be carried out on each sub-address after the incidence relation is established according to the preset sub-address level. Specifically, the vectors establishing the association relationship are denoted as (v)21,v22,…,v2n) And adding the data into a preset labeling model, and performing level labeling on each subaddress after the association relationship is established according to the preset subaddress level to obtain a labeling result. In particular, the annotation model may be a CRF model. Wherein, the preset sub-address levels are shown in table 1:
economic City (R) Zone(s) Street | community | village | town | Road village Road number Cell Building number plate Landmark
P1 P2 P3 P4 P5 P5_ID P6 P6_ID P7
TABLE 1
It should be noted that, in order to further increase the relevance of each character in the sub-address, the sub-address level may be labeled in a biees manner. Wherein B represents begin; i represents imide; o represents outside; e represents end; s represents single. Because each character in each sub-address is labeled, the relevance of each character in the sub-address can be increased on the basis of determining the level of the sub-address. The A building of the eleventh street of the Kechu of the Kazakh, also known as Beijing Daxing district, is still used, for example, the level corresponding to Beijing is P1, and correspondingly, the level corresponding to Beijing is labeled B-P1, which is characterized as the level P1 and is the first character in the sub-address; the corresponding character labeled E-P1, representing that the character is P1 and the last character in the subaddress; correspondingly, the great interest zone corresponds to a level P3, and the great correspondence is labeled B-P3, which is characterized by a level P3 and is the first character in the subaddress; the "Xingqing" correspondence, labeled I-P3, features its rank P3 and is the middle character in the subaddress; the region correspondence is labeled E-P3; the token is characterized by a rank of P3 and is the last character in the subaddress, and is labeled in the manner described above for each subaddress.
The text to be trained marked by the method is used for training the model to be trained, and a neural network model is obtained. Therefore, accurate marking can be carried out on the input address text to be processed according to the neural network model. And processing the marked address text to be processed and a preset standard address library to realize the standardization of the address text to be processed.
In the address standardization processing method provided by this embodiment, each sub-address and a code corresponding to each sub-address are converted into a text vector and a code vector through a preset vector conversion model, and the text vector and the code vector are stored in an associated manner; aiming at the text vector and the coding vector corresponding to each sub-address, establishing a model through a preset incidence relation to establish the incidence relation between the text vector and the coding vector of the adjacent sub-address; and level labeling is carried out on each subaddress after the incidence relation is established according to the preset subaddress level, and a labeled text to be trained is obtained, so that a model to be trained can be trained subsequently according to the text to be trained, a foundation is provided for subsequent labeling of the address text to be processed, and the accuracy of labeling the neural network model can be improved.
Fig. 3 is a schematic flow chart of an address normalization processing method according to a third embodiment of the present invention, where on the basis of any of the foregoing embodiments, as shown in fig. 3, the method includes:
step 301, receiving an address text to be processed;
step 302, marking the level of each sub-address in the address text to be processed through a preset neural network model to obtain a marked address text to be processed;
step 303, sequentially determining a first subaddress with the lowest level in the address text to be processed according to a preset level sequence;
step 304, judging whether the first sub-address is in a preset standard address library or not;
step 305, if not, calculating the similarity between the first sub-address and at least one correct address in the standard address base, wherein the at least one correct address is consistent with the first sub-address in level;
step 306, for each correct address, judging whether the similarity between the correct address and the first sub-address is greater than a preset first threshold value;
step 307, if yes, determining a correct address with the highest similarity, and taking the correct address with the highest similarity as a standard address corresponding to the first sub-address;
step 308, judging whether a second sub-address with the level larger than the first sub-address level exists in the address text to be processed;
and 309, if so, taking the second sub-address as the first sub-address, and returning to execute the step of judging whether the first sub-address is in a preset standard address library until no second sub-address with the level higher than the first level of the first sub-address exists in the address text to be processed.
In this embodiment, after receiving an address text to be processed sent by a user and labeling the address text to be processed through a preset neural network model, the labeled address text to be processed needs to be processed according to a preset standard address library. Specifically, the first subaddress with the lowest level in the text to be processed is determined, wherein the level is gradually increased from P1 to P7. Comparing the first sub-address with a preset standard address library, judging whether the first sub-address exists in the standard address library, if so, representing that the first sub-address has no error, otherwise, representing that the first sub-address is wrongly written, at the moment, calculating the similarity between the first sub-address and a plurality of correct addresses in the standard address library, which are consistent with the first sub-address in level, judging whether the similarity between the first sub-address and the plurality of correct addresses in the standard address library, which are consistent with the first sub-address in level, exceeds a preset first threshold value, and if so, representing that the correct address may be the standard address corresponding to the first sub-address. Therefore, in order to improve the accuracy of address normalization, it is necessary to use the correct address with the highest similarity exceeding the preset first threshold as the standard address corresponding to the first sub-address. After the standard address corresponding to the first sub-address is determined, whether a second sub-address with a level greater than the first sub-address by one level is included in the current address text to be processed or not can be judged, if yes, the second sub-address can be used as the current first sub-address, the steps are repeatedly executed until the second sub-address with the level greater than the first sub-address does not exist in the address text to be processed, all sub-addresses in the current address text to be processed are represented to be standardized, and the standard address corresponding to the address text to be processed is obtained.
In the address standardization processing method provided in this embodiment, a first sub-address with the lowest level in a text to be processed is determined, the first sub-address is compared with a preset standard address library, whether the first sub-address exists in the standard address library or not is determined, if the first sub-address does not exist in the standard address library, a standard address corresponding to the first sub-address is determined according to a similarity between the first sub-address and a correct address in the standard address library, and the above steps are repeatedly performed for the sub-address of each level, so that the standard address corresponding to the text to be processed can be obtained. The accuracy and efficiency of address standardization are improved.
Further, on the basis of the above embodiment, the method includes:
receiving an address text to be processed;
marking the level of each sub-address in the address text to be processed through a preset neural network model to obtain the marked address text to be processed;
sequentially determining a first subaddress with the lowest level in the address text to be processed according to a preset level sequence;
judging whether the first sub-address is in a preset standard address library or not;
if not, calculating the font similarity and the pinyin similarity between the first sub-address and the correct address;
calculating the similarity between the first sub-address and the correct address according to the font similarity and the pinyin similarity;
for each correct address, judging whether the similarity between the correct address and the first sub-address is greater than a preset first threshold value;
if so, determining a correct address with the highest similarity, and taking the correct address with the highest similarity as a standard address corresponding to the first sub-address;
judging whether a second sub-address with the level larger than the first sub-address by one level exists in the address text to be processed;
if so, taking the second sub-address as the first sub-address, and returning to execute the step of judging whether the first sub-address is in a preset standard address library or not until the second sub-address with the level higher than the first sub-address does not exist in the address text to be processed.
In this embodiment, since the address text to be processed input by the user has a plurality of error modes, for example, a font error may be obtained, a hai lake region may be input as a hai-defined region, and a pinyin error may also be obtained, a hai lake region may be input as a hai-dian region, and if the similarity between the first sub-address and the correct address is calculated only by the font similarity, the calculation is not accurate for the condition of the pinyin error, for example, the font similarity between the hai-dian regions input by the hai-lake region is low, but the pinyin similarity is high. Therefore, in order to improve the accuracy of address normalization, the similarity between the first sub-address and a plurality of correct addresses in the standard address base, which are consistent with the first sub-address level, can be calculated in two ways. Specifically, the font similarity and the pinyin similarity between the first sub-address and the correct address may be calculated, and the similarity between the plurality of correct addresses of which the first sub-address level is consistent may be calculated according to the pinyin similarity and the font similarity. Therefore, the standard address determined according to the similarity is more accurate.
The address standardization processing method provided by this embodiment can improve the accuracy of address standardization by calculating the pinyin similarity and the font similarity between the first sub-address and the correct address.
Further, on the basis of any of the above embodiments, the method comprises:
receiving an address text to be processed;
marking the level of each sub-address in the address text to be processed through a preset neural network model to obtain the marked address text to be processed;
sequentially determining a first subaddress with the lowest level in the address text to be processed according to a preset level sequence;
judging whether the first sub-address is in a preset standard address library or not;
if not, calculating the font similarity between the first sub-address and the correct address by at least one preset font similarity calculation method;
calculating the pinyin similarity between the first sub-address and the correct address by at least one preset pinyin similarity calculation method;
calculating the similarity between the first sub-address and the correct address according to the font similarity and the pinyin similarity;
for each correct address, judging whether the similarity between the correct address and the first sub-address is greater than a preset first threshold value;
if so, determining a correct address with the highest similarity, and taking the correct address with the highest similarity as a standard address corresponding to the first sub-address;
judging whether a second sub-address with the level larger than the first sub-address by one level exists in the address text to be processed;
if so, taking the second sub-address as the first sub-address, and returning to execute the step of judging whether the first sub-address is in a preset standard address library or not until the second sub-address with the level higher than the first sub-address does not exist in the address text to be processed.
In this embodiment, in order to further improve the accuracy of similarity calculation, a plurality of methods for calculating the similarity between the font and the pinyin may be selected to implement the similarity between the first sub-address and the correct address. Specifically, the font similarity between the first sub-address and the correct address may be calculated by any of various font similarity calculation methods, which is not limited herein, for example, the address similarity between the first sub-address and the correct address may be calculated by using word-level Jaro Distance, word-level Jaro-willerdistance, word-level Edit Distance, and the like. Correspondingly, the calculation of the pinyin similarity between the first sub-address and the correct address can be realized by adopting any calculation method of multiple pinyin similarities, and the present invention is not limited herein, for example, the pinyin similarity between the first sub-address and the correct address can be calculated by adopting the pinyin level Jaro Distance, the pinyin level Jaro-winner Distance, the pinyin level Edit Distance, and the like.
The address standardization processing method provided by this embodiment calculates the pinyin similarity and the font similarity between the first sub-address and the correct address by using multiple methods, so as to improve the accuracy of address standardization.
Further, on the basis of any of the above embodiments, the method comprises:
receiving an address text to be processed;
marking the level of each sub-address in the address text to be processed through a preset neural network model to obtain the marked address text to be processed;
sequentially determining a first subaddress with the lowest level in the address text to be processed according to a preset level sequence;
judging whether the first sub-address is in a preset standard address library or not;
if not, calculating the font similarity between the first sub-address and the correct address by at least one preset font similarity calculation method;
calculating the pinyin similarity between the first sub-address and the correct address by at least one preset pinyin similarity calculation method;
setting different weights for the font similarity calculated by each font similarity calculation method;
setting different weights for the pinyin similarity calculated by the pinyin similarity calculation method;
calculating the similarity between the first sub-address and the correct address by a weighted average method according to the font similarity and the pinyin similarity;
for each correct address, judging whether the similarity between the correct address and the first sub-address is greater than a preset first threshold value;
if so, determining a correct address with the highest similarity, and taking the correct address with the highest similarity as a standard address corresponding to the first sub-address;
judging whether a second sub-address with the level larger than the first sub-address by one level exists in the address text to be processed;
if so, taking the second sub-address as the first sub-address, and returning to execute the step of judging whether the first sub-address is in a preset standard address library or not until the second sub-address with the level higher than the first sub-address does not exist in the address text to be processed.
In this embodiment, because the pinyin similarity and the font similarity between different sub-addresses and the standard address are different, in order to further improve the accuracy of the similarity between the sub-addresses and the standard address, different weights may be set for the font similarities calculated by different font similarity calculation methods, different weights may be set for the pinyin similarities calculated by the pinyin similarity calculation methods, and the similarity between the sub-addresses and the standard address may be calculated by a weighted average method according to the weights corresponding to the respective methods. In general, since the pinyin similarity is higher than the font similarity, a higher weight may be set for the pinyin similarity.
In the address standardization processing method provided by this embodiment, different weights are set for the font similarities calculated by different font similarity calculation methods, different weights are set for the pinyin similarities calculated by the pinyin similarity calculation methods, and the similarity between the sub-address and the standard address is calculated by using a weighted average method according to the weights corresponding to the respective methods, so that the accuracy of calculating the similarity between the sub-address and the standard address can be improved.
Further, on the basis of any of the above embodiments, the method further includes:
receiving an address text to be processed;
marking the level of each sub-address in the address text to be processed through a preset neural network model to obtain the marked address text to be processed;
sequentially determining a first subaddress with the lowest level in the address text to be processed according to a preset level sequence;
judging whether the first sub-address is in a preset standard address library or not;
if not, calculating the similarity between the first sub-address and at least one correct address in the standard address base, wherein the at least one correct address is consistent with the first sub-address in level;
for each correct address, judging whether the similarity between the correct address and the first sub-address is greater than a preset first threshold value;
and if not, sending a service handling request to the user, wherein the service handling request comprises the first sub-address and the address text to be processed, so that the user can manually process the first sub-address according to the service handling request.
In this embodiment, after receiving an address text to be processed sent by a user and labeling the address text to be processed through a preset neural network model, the labeled address text to be processed needs to be processed according to a preset standard address library. Specifically, the first subaddress with the lowest level in the text to be processed is determined, wherein the level is gradually increased from P1 to P7. Comparing the first sub-address with a preset standard address library, judging whether the first sub-address exists in the standard address library, if so, representing that the first sub-address has no error, if not, representing that the first sub-address is wrongly written, at the moment, calculating the similarity between the first sub-address and a plurality of correct addresses in the standard address library, which are consistent with the first sub-address level, judging whether the similarity between the first sub-address and the plurality of correct addresses in the standard address library, which are consistent with the first sub-address level, exceeds a preset first threshold value, if the similarity is lower than the preset first threshold value, representing that the standard address corresponding to the sub-address does not exist in the standard address library, at the moment, in order to realize the standardization of an address text to be processed, a service handling request needs to be sent to a user, wherein the service handling request comprises the first sub-address and the address text to be processed, so that the user can manually process the first sub-address according to the service transaction request. It can be understood that, if the standard address corresponding to the first sub-address is determined manually, the standard address corresponding to the first sub-address may be added to the standard address library, so as to implement the expansion of the standard address library.
In the address standardization processing method provided in this embodiment, when the similarity between the first sub-address and the plurality of correct addresses in the standard address base, which are consistent with the first sub-address in level, is lower than the preset first threshold, a service handling request is sent to the user, so that the user manually processes the first sub-address according to the service handling request, and thus standardization of all sub-addresses can be achieved.
Further, on the basis of any of the above embodiments, the method further includes:
receiving an address text to be processed;
marking the level of each sub-address in the address text to be processed through a preset neural network model to obtain the marked address text to be processed;
sequentially determining a first subaddress with the lowest level in the address text to be processed according to a preset level sequence;
judging whether the first sub-address is in a preset standard address library or not;
if yes, judging whether a second sub-address with the level larger than the first sub-address by one level exists in the address text to be processed;
if so, taking the second sub-address as the first sub-address, and returning to execute the step of judging whether the first sub-address is in a preset standard address library or not until the second sub-address with the level higher than the first sub-address does not exist in the address text to be processed.
In this embodiment, after receiving an address text to be processed sent by a user and labeling the address text to be processed through a preset neural network model, the labeled address text to be processed needs to be processed according to a preset standard address library. Specifically, the first subaddress with the lowest level in the text to be processed is determined, wherein the level is gradually increased from P1 to P7. And comparing the first sub-address with a preset standard address library, judging whether the first sub-address exists in the standard address library, if so, representing that the first sub-address has no error, and taking the first sub-address as the standard address. And determining whether the address text to be processed comprises a second sub-address with the level greater than the first sub-address by one level, if so, taking the second sub-address as the current first sub-address, and returning to the step of judging whether the first sub-address is in a preset standard address library until the second sub-address with the level greater than the first sub-address by one level does not exist in the address text to be processed. Accordingly, if there is no second sub-address having a level one level greater than that of the first sub-address, the first sub-address may be output as the current standard address.
In the address standardization processing method provided by this embodiment, when the first sub-address exists in the standard address library, the first sub-address is used as the standard address, so that standardization of the address text to be processed can be achieved.
Further, on the basis of any of the above embodiments, the method further includes:
receiving an address text to be processed;
marking the level of each sub-address in the address text to be processed through a preset neural network model to obtain the marked address text to be processed;
detecting whether the address text to be processed comprises a sub-address corresponding to a first level with the lowest level in the level sequence or not according to a preset level sequence;
if so, taking the sub-address corresponding to the first level as the first sub-address;
if not, taking a second level with the level greater than the first level as the current first level, marking the first level as vacant, returning to execute the step of detecting whether the address text to be processed comprises the sub-address corresponding to the first level with the lowest level in the level sequence according to a preset level sequence until the address text to be processed comprises the sub-address corresponding to the first level;
judging whether the first sub-address is in a preset standard address library or not;
if not, calculating the similarity between the first sub-address and at least one correct address in the standard address base, wherein the at least one correct address is consistent with the first sub-address in level;
for each correct address, judging whether the similarity between the correct address and the first sub-address is greater than a preset first threshold value;
if so, determining a correct address with the highest similarity, and taking the correct address with the highest similarity as a standard address corresponding to the first sub-address;
judging whether a second sub-address with the level larger than the first sub-address by one level exists in the address text to be processed;
if so, taking the second sub-address as the first sub-address, and returning to execute the step of judging whether the first sub-address is in a preset standard address library or not until no second sub-address with the level greater than the first sub-address by one level exists in the address text to be processed;
determining all levels marked as vacant in the address text to be processed;
and supplementing all levels marked as vacant in the address text to be processed according to the standard address library.
In this embodiment, after receiving an address text to be processed sent by a user and labeling the address text to be processed through a preset neural network model, the labeled address text to be processed needs to be processed according to a preset standard address library. Because the user may miss some address texts when inputting address texts to be processed, for example, province, city, county, and the user may input less city, the missing information can be supplemented according to other information, the search can be performed according to the preset order of each level, the default level can be P1, if there is no sub-address with level P1 at present, the current level vacancy is represented, the level is marked as vacancy, 1 is automatically added to the level of the current search, that is, the sub-address with level P2 is continuously searched, if there is a sub-address with level P2 at present, the similarity between the rest correct addresses is calculated according to the preset standard address library, and the correct address with the similarity exceeding the preset threshold and the highest similarity is used as the current standard address, the above steps are repeated for each level until the processing of each level is finished, at this time, all levels of the current vacancy are determined, and for the level of the current vacancy, the address is a sub-address which is not filled by the user currently, and at this time, the vacant sub-address can be supplemented according to a standard address library.
As an implementable manner, after receiving a to-be-processed address text sent by a user and labeling the to-be-processed address text through a preset neural network model, the labeled to-be-processed text needs to be processed according to a preset standard address library. When the user inputs the address text to be processed, the address text may be partially omitted, for example, the province, the city and the county, and the user may input less city, so that the omitted information can be supplemented according to other information, specifically, whether the level difference between any two sub-addresses in the address text to be processed exceeds a preset second threshold value is determined, where the second threshold value may be set by the user or may be set by default in the system. If so, the address text to be processed can be compared with a preset standard address library, so that the address text to be processed can be supplemented. For example, if the received address text to be processed is a beijing also banker, kochu, eleven street a mansion, wherein the level of the beijing is P1, and the level of the also banker is P4, the level difference between the two is 3, and the difference exceeds a preset second threshold, the address text to be processed can be compared with a preset standard address library to supplement the address text to be processed.
According to the address standardization processing method provided by the embodiment, the sub-addresses are processed according to the preset level sequence, so that the vacant sub-addresses which are not filled by the user can be supplemented on the basis of correcting the error sub-addresses, the filling of the address text to be processed can be further realized, the accuracy of the address text to be processed is improved, and a basis is provided for the subsequent service development.
Fig. 4 is a schematic structural diagram of an address normalization processing apparatus according to a fourth embodiment of the present invention, as shown in fig. 4, the apparatus includes:
the first receiving module 41 is configured to receive the address text to be processed.
And the first labeling module 42 is configured to label, through a preset neural network model, the level of each sub-address in the address text to be processed, so as to obtain a labeled address text to be processed.
And the processing module 43 is configured to, for each sub-address in the labeled address text to be processed, process the sub-address according to a preset standard address library to obtain a standard address corresponding to the address text to be processed.
The address standardization processing device provided by the embodiment receives the address text to be processed; marking the level of each sub-address in the address text to be processed through a preset neural network model to obtain the marked address text to be processed; and processing the sub-addresses according to a preset standard address library aiming at each sub-address in the marked address text to be processed to obtain a standard address corresponding to the address text to be processed. Therefore, the standard address information corresponding to the address text to be processed can be quickly and accurately determined, the accuracy of address standardization can be improved, and in addition, the manual maintenance cost of the address text can be reduced.
Further, on the basis of the above embodiment, the apparatus further includes:
the first receiving module is used for receiving the address text to be processed;
the training module is used for training a preset model to be trained through the text to be trained after the sub-addresses are labeled, so as to obtain the preset neural network model;
the first labeling module is used for labeling the level of each sub-address in the address text to be processed through a preset neural network model to obtain a labeled address text to be processed;
and the processing module is used for processing the subaddresses in the labeled address text to be processed according to a preset standard address library to obtain a standard address corresponding to the address text to be processed.
The address standardization processing device provided by this embodiment trains a preset model to be trained through a text to be trained after labeling each sub-address, so as to obtain the preset neural network model, and thus, the neural network model can be used to label the address text to be processed, thereby providing a basis for subsequent address standardization.
Further, on the basis of any one of the above embodiments, the apparatus further includes:
the first receiving module is used for receiving the address text to be processed;
the second receiving module is used for receiving the text to be trained;
the removing module is used for removing useless punctuation marks in the text to be trained;
the segmentation module is used for segmenting the text to be trained without the useless punctuations to obtain each sub-address corresponding to the text to be trained;
the second labeling module is used for labeling the level of each sub-address in the text to be trained;
the training module is used for training a preset model to be trained through the text to be trained after the sub-addresses are labeled, so as to obtain the preset neural network model;
the first labeling module is used for labeling the level of each sub-address in the address text to be processed through a preset neural network model to obtain a labeled address text to be processed;
and the processing module is used for processing the subaddresses in the labeled address text to be processed according to a preset standard address library to obtain a standard address corresponding to the address text to be processed.
The address standardization processing device provided by the embodiment can improve the efficiency of subsequent model training and provide a basis for address standardization by removing useless characters in the text to be trained in advance and performing word segmentation on the text to be trained.
Further, there are various methods for removing useless punctuation marks in a text to be trained, and specifically, the removing module includes:
and the removing unit is used for removing useless punctuation marks in the text to be trained by a regular matching method.
Further, on the basis of any one of the above embodiments, the apparatus further includes:
the first receiving module is used for receiving the address text to be processed;
the second receiving module is used for receiving the text to be trained;
the removing module is used for removing useless punctuation marks in the text to be trained;
the segmentation module is used for segmenting the text to be trained without the useless punctuations to obtain each sub-address corresponding to the text to be trained;
the coding module is used for coding each sub-address according to a preset coding mode;
the vector conversion module is used for converting each sub-address and the code corresponding to each sub-address into a text vector and a code vector through a preset vector conversion model, and storing the text vector and the code vector in a correlation manner;
the association module is used for establishing a model for the text vector and the coding vector corresponding to each subaddress through a preset association relationship to establish the association relationship between the text vector and the coding vector of the adjacent subaddress;
the second labeling module comprises:
the marking unit is used for marking the level of each subaddress after the association relationship is established according to the preset subaddress level;
the training module is used for training a preset model to be trained through the text to be trained after the sub-addresses are labeled, so as to obtain the preset neural network model;
the first labeling module is used for labeling the level of each sub-address in the address text to be processed through a preset neural network model to obtain a labeled address text to be processed;
and the processing module is used for processing the subaddresses in the labeled address text to be processed according to a preset standard address library to obtain a standard address corresponding to the address text to be processed.
The address standardization processing device provided in this embodiment converts each sub-address and the code corresponding to each sub-address into a text vector and a code vector through a preset vector conversion model, and stores the text vector and the code vector in association; aiming at the text vector and the coding vector corresponding to each sub-address, establishing a model through a preset incidence relation to establish the incidence relation between the text vector and the coding vector of the adjacent sub-address; and level labeling is carried out on each subaddress after the incidence relation is established according to the preset subaddress level, and a labeled text to be trained is obtained, so that a model to be trained can be trained subsequently according to the text to be trained, a foundation is provided for subsequent labeling of the address text to be processed, and the accuracy of labeling the neural network model can be improved.
Further, on the basis of any of the above embodiments, the apparatus comprises:
the first receiving module is used for receiving the address text to be processed;
the first labeling module is used for labeling the level of each sub-address in the address text to be processed through a preset neural network model to obtain a labeled address text to be processed;
the processing module comprises:
the first sub-address determining unit is used for sequentially determining a first sub-address with the lowest level in the address text to be processed according to a preset level sequence;
the first judging unit is used for judging whether the first sub-address is in a preset standard address library or not;
the similarity calculation unit is used for calculating the similarity between the first sub-address and at least one correct address which is consistent with the first sub-address in the standard address base in level if the first sub-address is not consistent with the first sub-address in level;
a second determining unit, configured to determine, for each correct address, whether a similarity between the correct address and the first sub-address is greater than a preset first threshold;
the standard address determining unit is used for determining a correct address with the highest similarity if the first sub-address corresponds to the first sub-address, and taking the correct address with the highest similarity as a standard address corresponding to the first sub-address;
a third judging unit, configured to judge whether a second sub-address whose level is greater than the first sub-address by one level exists in the address text to be processed;
and if so, the first circulation unit is used for taking the second sub-address as the first sub-address, and returning to execute the step of judging whether the first sub-address is in a preset standard address library or not until no second sub-address with the level higher than the first sub-address is present in the address text to be processed.
The address standardization processing apparatus provided in this embodiment determines a first sub-address with a lowest level in a text to be processed, compares the first sub-address with a preset standard address library, determines whether the first sub-address exists in the standard address library, determines a standard address corresponding to the first sub-address according to a similarity between the first sub-address and a correct address in the standard address library if the first sub-address does not exist in the standard address library, and repeats the above steps for each level of sub-address, so that a standard address corresponding to the text to be processed can be obtained. The accuracy and efficiency of address standardization are improved.
Further, on the basis of the above embodiment, the apparatus includes:
the first receiving module is used for receiving the address text to be processed;
the first labeling module is used for labeling the level of each sub-address in the address text to be processed through a preset neural network model to obtain a labeled address text to be processed;
the processing module comprises:
the first sub-address determining unit is used for sequentially determining a first sub-address with the lowest level in the address text to be processed according to a preset level sequence;
the first judging unit is used for judging whether the first sub-address is in a preset standard address library or not;
the similarity calculation unit is specifically configured to: if not, calculating the font similarity and the pinyin similarity between the first sub-address and the correct address;
calculating the similarity between the first sub-address and the correct address according to the font similarity and the pinyin similarity;
a second determining unit, configured to determine, for each correct address, whether a similarity between the correct address and the first sub-address is greater than a preset first threshold;
the standard address determining unit is used for determining a correct address with the highest similarity if the first sub-address corresponds to the first sub-address, and taking the correct address with the highest similarity as a standard address corresponding to the first sub-address;
a third judging unit, configured to judge whether a second sub-address whose level is greater than the first sub-address by one level exists in the address text to be processed;
and if so, the first circulation unit is used for taking the second sub-address as the first sub-address, and returning to execute the step of judging whether the first sub-address is in a preset standard address library or not until no second sub-address with the level higher than the first sub-address is present in the address text to be processed.
The address standardization processing device provided by the embodiment can improve the accuracy of address standardization by calculating the pinyin similarity and the font similarity between the first sub-address and the correct address.
Further, on the basis of any of the above embodiments, the method comprises:
the first receiving module is used for receiving the address text to be processed;
the first labeling module is used for labeling the level of each sub-address in the address text to be processed through a preset neural network model to obtain a labeled address text to be processed;
the processing module comprises:
the first sub-address determining unit is used for sequentially determining a first sub-address with the lowest level in the address text to be processed according to a preset level sequence;
the first judging unit is used for judging whether the first sub-address is in a preset standard address library or not;
the similarity calculation unit is specifically configured to:
if not, calculating the font similarity between the first sub-address and the correct address by at least one preset font similarity calculation method;
calculating the pinyin similarity between the first sub-address and the correct address by at least one preset pinyin similarity calculation method;
calculating the similarity between the first sub-address and the correct address according to the font similarity and the pinyin similarity;
a second determining unit, configured to determine, for each correct address, whether a similarity between the correct address and the first sub-address is greater than a preset first threshold;
the standard address determining unit is used for determining a correct address with the highest similarity if the first sub-address corresponds to the first sub-address, and taking the correct address with the highest similarity as a standard address corresponding to the first sub-address;
a third judging unit, configured to judge whether a second sub-address whose level is greater than the first sub-address by one level exists in the address text to be processed;
and if so, the first circulation unit is used for taking the second sub-address as the first sub-address, and returning to execute the step of judging whether the first sub-address is in a preset standard address library or not until no second sub-address with the level higher than the first sub-address is present in the address text to be processed.
The address standardization processing device provided by this embodiment calculates the pinyin similarity and the font similarity between the first sub-address and the correct address by using multiple methods, so as to improve the accuracy of address standardization.
Further, on the basis of any of the above embodiments, the apparatus comprises:
the first receiving module is used for receiving the address text to be processed;
the first labeling module is used for labeling the level of each sub-address in the address text to be processed through a preset neural network model to obtain a labeled address text to be processed;
the processing module comprises:
the first sub-address determining unit is used for sequentially determining a first sub-address with the lowest level in the address text to be processed according to a preset level sequence;
the first judging unit is used for judging whether the first sub-address is in a preset standard address library or not;
if not, calculating the font similarity between the first sub-address and the correct address by at least one preset font similarity calculation method;
the similarity calculation unit is used for calculating the pinyin similarity between the first sub-address and the correct address through at least one preset pinyin similarity calculation method;
the similarity calculation unit is specifically configured to: setting different weights for the font similarity calculated by each font similarity calculation method;
setting different weights for the pinyin similarity calculated by the pinyin similarity calculation method;
calculating the similarity between the first sub-address and the correct address by a weighted average method according to the font similarity and the pinyin similarity;
a second determining unit, configured to determine, for each correct address, whether a similarity between the correct address and the first sub-address is greater than a preset first threshold;
the standard address determining unit is used for determining a correct address with the highest similarity if the first sub-address corresponds to the first sub-address, and taking the correct address with the highest similarity as a standard address corresponding to the first sub-address;
a third judging unit, configured to judge whether a second sub-address whose level is greater than the first sub-address by one level exists in the address text to be processed;
and if so, the first circulation unit is used for taking the second sub-address as the first sub-address, and returning to execute the step of judging whether the first sub-address is in a preset standard address library or not until no second sub-address with the level higher than the first sub-address is present in the address text to be processed.
In the address normalization processing apparatus provided in this embodiment, different weights are set for the font similarities calculated by different font similarity calculation methods, different weights are set for the pinyin similarities calculated by the pinyin similarity calculation methods, and the similarity between the sub-address and the standard address is calculated by using a weighted average method according to the weights corresponding to the respective methods, so that the accuracy of calculating the similarity between the sub-address and the standard address can be improved.
Further, on the basis of any one of the above embodiments, the apparatus further includes:
the first receiving module is used for receiving the address text to be processed;
the first labeling module is used for labeling the level of each sub-address in the address text to be processed through a preset neural network model to obtain a labeled address text to be processed;
the processing module comprises:
the first sub-address determining unit is used for sequentially determining a first sub-address with the lowest level in the address text to be processed according to a preset level sequence;
the first judging unit is used for judging whether the first sub-address is in a preset standard address library or not;
the similarity calculation unit is used for calculating the similarity between the first sub-address and at least one correct address which is consistent with the first sub-address in the standard address base in level if the first sub-address is not consistent with the first sub-address in level;
a second determining unit, configured to determine, for each correct address, whether a similarity between the correct address and the first sub-address is greater than a preset first threshold;
and if not, sending a service handling request to the user, wherein the service handling request comprises the first sub-address and the address text to be processed, so that the user can manually process the first sub-address according to the service handling request.
The address standardization processing apparatus provided in this embodiment sends a service transaction request to a user when the similarity between a first sub-address and a plurality of correct addresses in a standard address base, which are consistent with the first sub-address in level, is lower than a preset first threshold, so that the user manually processes the first sub-address according to the service transaction request, thereby implementing standardization of all sub-addresses.
Further, on the basis of any one of the above embodiments, the apparatus further includes:
the first receiving module is used for receiving the address text to be processed;
the first labeling module is used for labeling the level of each sub-address in the address text to be processed through a preset neural network model to obtain a labeled address text to be processed;
the processing module comprises:
the first sub-address determining unit is used for sequentially determining a first sub-address with the lowest level in the address text to be processed according to a preset level sequence;
the first judging unit is used for judging whether the first sub-address is in a preset standard address library or not;
a fifth judging unit, configured to judge whether a second sub-address whose level is greater than the first sub-address level exists in the address text to be processed if the address text to be processed exists;
and if so, taking the second sub-address as the first sub-address, and returning to execute the step of judging whether the first sub-address is in a preset standard address library until no second sub-address with the level greater than the first level of the first sub-address exists in the address text to be processed.
The address standardization processing apparatus provided in this embodiment can realize standardization of the address text to be processed by using the first sub-address as the standard address when the first sub-address exists in the standard address library.
Further, on the basis of any one of the above embodiments, the apparatus further includes:
the first receiving module is used for receiving the address text to be processed;
the first labeling module is used for labeling the level of each sub-address in the address text to be processed through a preset neural network model to obtain a labeled address text to be processed;
the processing module comprises:
the first sub-address determining unit is specifically configured to: detecting whether the address text to be processed comprises a sub-address corresponding to a first level with the lowest level in the level sequence or not according to a preset level sequence;
if so, taking the sub-address corresponding to the first level as the first sub-address;
if not, taking a second level with the level greater than the first level as the current first level, marking the first level as vacant, returning to execute the step of detecting whether the address text to be processed comprises the sub-address corresponding to the first level with the lowest level in the level sequence according to a preset level sequence until the address text to be processed comprises the sub-address corresponding to the first level;
the first judging unit is used for judging whether the first sub-address is in a preset standard address library or not;
the similarity calculation unit is used for calculating the similarity between the first sub-address and at least one correct address which is consistent with the first sub-address in the standard address base in level if the first sub-address is not consistent with the first sub-address in level;
a second determining unit, configured to determine, for each correct address, whether a similarity between the correct address and the first sub-address is greater than a preset first threshold;
a standard address determining unit, configured to determine, if the address is a correct address with the highest similarity, and use the correct address with the highest similarity as a standard address corresponding to the first sub-address;
a third judging unit, configured to judge whether a second sub-address whose level is greater than the first sub-address by one level exists in the address text to be processed;
a first circulation unit, configured to, if yes, use the second sub-address as the first sub-address, and return to perform the step of determining whether the first sub-address is in a preset standard address library, until there is no second sub-address in the to-be-processed address text whose level is one level greater than that of the first sub-address;
the processing module further comprises:
a vacancy level determination unit, configured to determine all levels labeled as vacancies in the address text to be processed;
and the supplement unit is used for supplementing all levels marked as vacant in the address text to be processed according to the standard address library.
The address standardization processing device provided by this embodiment processes the sub-addresses according to the preset rank order, so that the vacant sub-addresses that are not filled in by the user can be supplemented on the basis of correcting the wrong sub-addresses, the filling of the address text to be processed can be further realized, the accuracy of the address text to be processed is improved, and a basis is provided for the subsequent service development.
Fig. 5 is a schematic structural diagram of an address normalization processing apparatus according to a fifth embodiment of the present invention, and as shown in fig. 5, the apparatus includes: a memory 51, a processor 52;
a memory 51; a memory 51 for storing instructions executable by the processor 52;
wherein the processor 52 is configured to execute the address normalization processing method as described above by the processor 52.
Yet another embodiment of the present invention provides a computer-readable storage medium having stored therein computer-executable instructions for implementing the address normalization processing method as described above when the computer-executable instructions are executed by a processor.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (26)

1. An address standardization processing method, comprising:
receiving an address text to be processed;
marking the level of each sub-address in the address text to be processed through a preset neural network model to obtain the marked address text to be processed;
and processing the sub-addresses according to a preset standard address library aiming at each sub-address in the marked address text to be processed to obtain a standard address corresponding to the address text to be processed.
2. The method according to claim 1, wherein before labeling the level of each sub-address in the address text to be processed through a preset neural network model and obtaining the labeled address text to be processed, the method further comprises:
and training a preset model to be trained through the text to be trained after the sub-addresses are labeled, so as to obtain the preset neural network model.
3. The method according to claim 2, wherein before the training of the preset model to be trained through the text to be trained after the sub-addresses are labeled, and the obtaining of the preset neural network model, the method further comprises:
receiving a text to be trained;
removing useless punctuation marks in the text to be trained;
segmenting words of the text to be trained without useless punctuations to obtain sub-addresses corresponding to the text to be trained;
and marking the level of each sub-address in the text to be trained.
4. The method according to claim 3, wherein the removing useless punctuation marks in the text to be trained comprises:
and removing useless punctuation marks in the text to be trained by a regular matching method.
5. The method according to claim 3, wherein after the segmenting the text to be trained from which the useless punctuation marks are removed to obtain the sub-addresses corresponding to the text to be trained, the method further comprises:
coding each sub-address according to a preset coding mode;
converting each sub-address and the code corresponding to each sub-address into a text vector and a code vector through a preset vector conversion model, and storing the text vector and the code vector in a correlation manner;
aiming at the text vector and the coding vector corresponding to each sub-address, establishing a model through a preset incidence relation to establish the incidence relation between the text vector and the coding vector of the adjacent sub-address;
the level labeling of each sub-address in the text to be trained includes:
and marking the level of each subaddress after the association relationship is established according to the preset subaddress level.
6. The method according to claim 1, wherein the processing, according to a preset standard address library, each sub-address in the labeled address text to be processed to obtain a standard address corresponding to the address text to be processed, comprises:
sequentially determining a first subaddress with the lowest level in the address text to be processed according to a preset level sequence;
judging whether the first sub-address is in a preset standard address library or not;
if not, calculating the similarity between the first sub-address and at least one correct address in the standard address base, wherein the at least one correct address is consistent with the first sub-address in level;
for each correct address, judging whether the similarity between the correct address and the first sub-address is greater than a preset first threshold value;
if so, determining a correct address with the highest similarity, and taking the correct address with the highest similarity as a standard address corresponding to the first sub-address;
judging whether a second sub-address with the level larger than the first sub-address by one level exists in the address text to be processed;
if so, taking the second sub-address as the first sub-address, and returning to execute the step of judging whether the first sub-address is in a preset standard address library or not until the second sub-address with the level higher than the first sub-address does not exist in the address text to be processed.
7. The method of claim 6, wherein calculating the similarity between the first sub-address and at least one correct address in the standard address base that is consistent with the first sub-address level comprises:
calculating the font similarity and the pinyin similarity between the first sub-address and the correct address;
and calculating the similarity between the first sub-address and the correct address according to the font similarity and the pinyin similarity.
8. The method of claim 7, wherein the calculating the font similarity and pinyin similarity between the first sub-address and the correct address comprises:
calculating the font similarity between the first sub-address and the correct address by at least one preset font similarity calculation method;
and calculating the pinyin similarity between the first sub-address and the correct address by at least one preset pinyin similarity calculation method.
9. The method of claim 8, wherein the calculating the similarity between the first sub-address and the correct address according to the font similarity and the pinyin similarity comprises:
setting different weights for the font similarity calculated by each font similarity calculation method;
setting different weights for the pinyin similarity calculated by the pinyin similarity calculation method;
and calculating the similarity between the first sub-address and the correct address by a weighted average method according to the font similarity and the pinyin similarity.
10. The method according to claim 6, wherein after determining, for each of the correct addresses, whether the similarity between the correct address and the first sub-address is greater than a preset threshold, the method further comprises:
and if not, sending a service handling request to the user, wherein the service handling request comprises the first sub-address and the address text to be processed, so that the user can manually process the first sub-address according to the service handling request.
11. The method according to claim 6, wherein the sequentially determining the first sub-address with the lowest level in the address text to be processed according to a preset level order comprises:
detecting whether the address text to be processed comprises a sub-address corresponding to a first level with the lowest level in the level sequence or not according to a preset level sequence;
if so, taking the sub-address corresponding to the first level as the first sub-address;
if not, taking a second level with the level greater than the first level as the current first level, marking the first level as vacant, returning to execute the step of detecting whether the address text to be processed comprises the sub-address corresponding to the first level with the lowest level in the level sequence according to a preset level sequence until the address text to be processed comprises the sub-address corresponding to the first level;
if so, taking the second sub-address as the first sub-address, and returning to execute the step of judging whether the first sub-address is in a preset standard address library, until no second sub-address with a level greater than the first sub-address by one level exists in the address text to be processed, further comprising:
determining all levels marked as vacant in the address text to be processed;
and supplementing all levels marked as vacant in the address text to be processed according to the standard address library.
12. The method of claim 6, wherein after determining whether the first sub-address is in a preset standard address bank, the method further comprises:
if yes, judging whether a second sub-address with the level larger than the first sub-address by one level exists in the address text to be processed;
if so, taking the second sub-address as the first sub-address, and returning to execute the step of judging whether the first sub-address is in a preset standard address library or not until the second sub-address with the level higher than the first sub-address does not exist in the address text to be processed.
13. An address normalization processing apparatus, comprising:
the first receiving module is used for receiving the address text to be processed;
the first labeling module is used for labeling the level of each sub-address in the address text to be processed through a preset neural network model to obtain a labeled address text to be processed;
and the processing module is used for processing the subaddresses in the labeled address text to be processed according to a preset standard address library to obtain a standard address corresponding to the address text to be processed.
14. The apparatus of claim 13, further comprising:
and the training module is used for training a preset model to be trained through the text to be trained after the sub-addresses are labeled, so as to obtain the preset neural network model.
15. The apparatus of claim 14, further comprising:
the second receiving module is used for receiving the text to be trained;
the removing module is used for removing useless punctuation marks in the text to be trained;
the segmentation module is used for segmenting the text to be trained without the useless punctuations to obtain each sub-address corresponding to the text to be trained;
and the second labeling module is used for labeling the level of each sub-address in the text to be trained.
16. The apparatus of claim 15, wherein the removal module comprises:
and the removing unit is used for removing useless punctuation marks in the text to be trained by a regular matching method.
17. The apparatus of claim 15, further comprising:
the coding module is used for coding each sub-address according to a preset coding mode;
the vector conversion module is used for converting each sub-address and the code corresponding to each sub-address into a text vector and a code vector through a preset vector conversion model, and storing the text vector and the code vector in a correlation manner;
the association module is used for establishing a model for the text vector and the coding vector corresponding to each subaddress through a preset association relationship to establish the association relationship between the text vector and the coding vector of the adjacent subaddress;
the second labeling module comprises:
and the marking unit is used for marking the level of each subaddress after the association relationship is established according to the preset subaddress level.
18. The apparatus of claim 13, wherein the processing module comprises:
the first sub-address determining unit is used for sequentially determining a first sub-address with the lowest level in the address text to be processed according to a preset level sequence;
the first judging unit is used for judging whether the first sub-address is in a preset standard address library or not;
the similarity calculation unit is used for calculating the similarity between the first sub-address and at least one correct address which is consistent with the first sub-address in the standard address base in level if the first sub-address is not consistent with the first sub-address in level;
a second determining unit, configured to determine, for each correct address, whether a similarity between the correct address and the first sub-address is greater than a preset first threshold;
the standard address determining unit is used for determining a correct address with the highest similarity if the first sub-address corresponds to the first sub-address, and taking the correct address with the highest similarity as a standard address corresponding to the first sub-address;
a third judging unit, configured to judge whether a second sub-address whose level is greater than the first sub-address by one level exists in the address text to be processed;
and if so, the first circulation unit is used for taking the second sub-address as the first sub-address, and returning to execute the step of judging whether the first sub-address is in a preset standard address library or not until no second sub-address with the level higher than the first sub-address is present in the address text to be processed.
19. The apparatus according to claim 18, wherein the similarity calculation unit is specifically configured to:
calculating the font similarity and the pinyin similarity between the first sub-address and the correct address;
and calculating the similarity between the first sub-address and the correct address according to the font similarity and the pinyin similarity.
20. The apparatus according to claim 19, wherein the similarity calculation unit is specifically configured to: calculating the font similarity between the first sub-address and the correct address by at least one preset font similarity calculation method;
and calculating the pinyin similarity between the first sub-address and the correct address by at least one preset pinyin similarity calculation method.
21. The method according to claim 20, wherein the similarity calculation unit is specifically configured to:
setting different weights for the font similarity calculated by each font similarity calculation method;
setting different weights for the pinyin similarity calculated by the pinyin similarity calculation method;
and calculating the similarity between the first sub-address and the correct address by a weighted average method according to the font similarity and the pinyin similarity.
22. The apparatus of claim 18, wherein the processing module further comprises:
and if not, sending a service handling request to the user, wherein the service handling request comprises the first sub-address and the address text to be processed, so that the user can manually process the first sub-address according to the service handling request.
23. The apparatus of claim 13, wherein the first sub-address determination unit is specifically configured to:
detecting whether the address text to be processed comprises a sub-address corresponding to a first level with the lowest level in the level sequence or not according to a preset level sequence;
if so, taking the sub-address corresponding to the first level as the first sub-address;
if not, taking a second level with the level greater than the first level as the current first level, marking the first level as vacant, returning to execute the step of detecting whether the address text to be processed comprises the sub-address corresponding to the first level with the lowest level in the level sequence according to a preset level sequence until the address text to be processed comprises the sub-address corresponding to the first level;
the processing module further comprises:
a vacancy level determination unit, configured to determine all levels labeled as vacancies in the address text to be processed;
and the supplement unit is used for supplementing all levels marked as vacant in the address text to be processed according to the standard address library.
24. The apparatus of claim 18, wherein the processing module further comprises:
a fifth judging unit, configured to judge whether a second sub-address whose level is greater than the first sub-address level exists in the address text to be processed if the address text to be processed exists;
and if so, taking the second sub-address as the first sub-address, and returning to execute the step of judging whether the first sub-address is in a preset standard address library until no second sub-address with the level greater than the first level of the first sub-address exists in the address text to be processed.
25. An address normalization processing apparatus, comprising: a memory, a processor;
a memory; a memory for storing the processor-executable instructions;
wherein the processor is configured to perform the address normalization processing method of any one of claims 1-12 by the processor.
26. A computer-readable storage medium having computer-executable instructions stored therein, which when executed by a processor, are configured to implement the address normalization processing method according to any one of claims 1 to 12.
CN201810965153.8A 2018-08-23 2018-08-23 Address standardization processing method, device, equipment and computer readable storage medium Active CN110895651B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810965153.8A CN110895651B (en) 2018-08-23 2018-08-23 Address standardization processing method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810965153.8A CN110895651B (en) 2018-08-23 2018-08-23 Address standardization processing method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110895651A true CN110895651A (en) 2020-03-20
CN110895651B CN110895651B (en) 2024-02-02

Family

ID=69784769

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810965153.8A Active CN110895651B (en) 2018-08-23 2018-08-23 Address standardization processing method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110895651B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639493A (en) * 2020-05-22 2020-09-08 上海微盟企业发展有限公司 Address information standardization method, device, equipment and readable storage medium
CN111881680A (en) * 2020-08-04 2020-11-03 医渡云(北京)技术有限公司 Text standardization processing method and device, electronic equipment and computer medium
CN112231429A (en) * 2020-11-09 2021-01-15 山东健康医疗大数据有限公司 Address matching method based on machine learning classification algorithm
CN112488200A (en) * 2020-11-30 2021-03-12 上海寻梦信息技术有限公司 Logistics address feature extraction method, system, equipment and storage medium
CN112632213A (en) * 2020-12-03 2021-04-09 大箴(杭州)科技有限公司 Address information standardization method and device, electronic equipment and storage medium
CN113190596A (en) * 2021-04-22 2021-07-30 华中科技大学 Method and device for mixing and matching place name and address
CN113505190A (en) * 2021-09-10 2021-10-15 南方电网数字电网研究院有限公司 Address information correction method, device, computer equipment and storage medium
CN113589993A (en) * 2021-07-16 2021-11-02 青岛海尔科技有限公司 Receiving address generation method and device, electronic equipment and storage medium
CN113642313A (en) * 2021-09-02 2021-11-12 阿里巴巴达摩院(杭州)科技有限公司 Address text processing method, device, equipment, storage medium and program product
CN114048797A (en) * 2021-10-20 2022-02-15 盐城金堤科技有限公司 Method, device, medium and electronic equipment for determining address similarity
CN118296405A (en) * 2024-06-05 2024-07-05 深圳航天智慧城市系统技术研究院有限公司 Address similarity calculation method, device, equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5459717A (en) * 1994-03-25 1995-10-17 Sprint International Communications Corporation Method and apparatus for routing messagers in an electronic messaging system
US6985926B1 (en) * 2001-08-29 2006-01-10 I-Behavior, Inc. Method and system for matching and consolidating addresses in a database
CN101685457A (en) * 2008-09-22 2010-03-31 广州华族文化发展有限公司 Numerical address coding method
CN101719128A (en) * 2009-12-31 2010-06-02 浙江工业大学 Fuzzy matching-based Chinese geo-code determination method
CN102169498A (en) * 2011-04-14 2011-08-31 中国测绘科学研究院 Address model constructing method and address matching method and system
CN102955833A (en) * 2011-08-31 2013-03-06 深圳市华傲数据技术有限公司 Correspondence address identifying and standardizing method
CN104933024A (en) * 2015-05-12 2015-09-23 深圳市华傲数据技术有限公司 Chinese address word segmentation and annotation method
WO2016165538A1 (en) * 2015-04-13 2016-10-20 阿里巴巴集团控股有限公司 Address data management method and device
CN106055650A (en) * 2016-05-31 2016-10-26 深圳市永兴元科技有限公司 Address standardization method and device
CN106161372A (en) * 2015-04-09 2016-11-23 阿里巴巴集团控股有限公司 A kind of Risk Identification Method based on address coupling and device
CN106777300A (en) * 2016-12-30 2017-05-31 深圳市华傲数据技术有限公司 Base address base construction method and system
CN107145577A (en) * 2017-05-08 2017-09-08 上海东方网络金融服务有限公司 Address standardization method, device, storage medium and computer
KR20180057853A (en) * 2016-11-23 2018-05-31 잠쉬딘 허지무하메도브 Method, system and computer program for converting addresses

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5459717A (en) * 1994-03-25 1995-10-17 Sprint International Communications Corporation Method and apparatus for routing messagers in an electronic messaging system
US6985926B1 (en) * 2001-08-29 2006-01-10 I-Behavior, Inc. Method and system for matching and consolidating addresses in a database
CN101685457A (en) * 2008-09-22 2010-03-31 广州华族文化发展有限公司 Numerical address coding method
CN101719128A (en) * 2009-12-31 2010-06-02 浙江工业大学 Fuzzy matching-based Chinese geo-code determination method
CN102169498A (en) * 2011-04-14 2011-08-31 中国测绘科学研究院 Address model constructing method and address matching method and system
CN102955833A (en) * 2011-08-31 2013-03-06 深圳市华傲数据技术有限公司 Correspondence address identifying and standardizing method
CN106161372A (en) * 2015-04-09 2016-11-23 阿里巴巴集团控股有限公司 A kind of Risk Identification Method based on address coupling and device
US20180024943A1 (en) * 2015-04-09 2018-01-25 Alibaba Group Holding Limited Risk identification based on address matching
WO2016165538A1 (en) * 2015-04-13 2016-10-20 阿里巴巴集团控股有限公司 Address data management method and device
CN104933024A (en) * 2015-05-12 2015-09-23 深圳市华傲数据技术有限公司 Chinese address word segmentation and annotation method
CN106055650A (en) * 2016-05-31 2016-10-26 深圳市永兴元科技有限公司 Address standardization method and device
KR20180057853A (en) * 2016-11-23 2018-05-31 잠쉬딘 허지무하메도브 Method, system and computer program for converting addresses
CN106777300A (en) * 2016-12-30 2017-05-31 深圳市华傲数据技术有限公司 Base address base construction method and system
CN107145577A (en) * 2017-05-08 2017-09-08 上海东方网络金融服务有限公司 Address standardization method, device, storage medium and computer

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DW GOLDBERG ET AL: "Address Standardization", 《GIS RESEARCH LABORATORY》 *
郭文龙;卓琳;: "一种基于编码规则的中文地址清洗方法", 闽江学院学报, no. 05 *
马照亭;李志刚;孙伟;印洁;: "一种基于地址分词的自动地理编码算法", 测绘通报, no. 02 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639493A (en) * 2020-05-22 2020-09-08 上海微盟企业发展有限公司 Address information standardization method, device, equipment and readable storage medium
CN111881680A (en) * 2020-08-04 2020-11-03 医渡云(北京)技术有限公司 Text standardization processing method and device, electronic equipment and computer medium
CN112231429A (en) * 2020-11-09 2021-01-15 山东健康医疗大数据有限公司 Address matching method based on machine learning classification algorithm
CN112488200A (en) * 2020-11-30 2021-03-12 上海寻梦信息技术有限公司 Logistics address feature extraction method, system, equipment and storage medium
CN112632213A (en) * 2020-12-03 2021-04-09 大箴(杭州)科技有限公司 Address information standardization method and device, electronic equipment and storage medium
CN113190596A (en) * 2021-04-22 2021-07-30 华中科技大学 Method and device for mixing and matching place name and address
CN113589993A (en) * 2021-07-16 2021-11-02 青岛海尔科技有限公司 Receiving address generation method and device, electronic equipment and storage medium
CN113642313A (en) * 2021-09-02 2021-11-12 阿里巴巴达摩院(杭州)科技有限公司 Address text processing method, device, equipment, storage medium and program product
CN113642313B (en) * 2021-09-02 2024-03-29 阿里巴巴达摩院(杭州)科技有限公司 Address text processing method, device, equipment, storage medium and program product
CN113505190A (en) * 2021-09-10 2021-10-15 南方电网数字电网研究院有限公司 Address information correction method, device, computer equipment and storage medium
CN114048797A (en) * 2021-10-20 2022-02-15 盐城金堤科技有限公司 Method, device, medium and electronic equipment for determining address similarity
CN118296405A (en) * 2024-06-05 2024-07-05 深圳航天智慧城市系统技术研究院有限公司 Address similarity calculation method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN110895651B (en) 2024-02-02

Similar Documents

Publication Publication Date Title
CN110895651A (en) Address standardization processing method, device, equipment and computer readable storage medium
CN110674629B (en) Punctuation mark labeling model, training method, training equipment and storage medium thereof
CN111046946B (en) Burma language image text recognition method based on CRNN
CN110795919B (en) Form extraction method, device, equipment and medium in PDF document
CN106649612B (en) Method and device for automatically matching question and answer templates
CN113901797B (en) Text error correction method, device, equipment and storage medium
CN110222330B (en) Semantic recognition method and device, storage medium and computer equipment
CN111858843B (en) Text classification method and device
CN110555206A (en) named entity identification method, device, equipment and storage medium
CN106326233B (en) address prompting method and device
CN113420546A (en) Text error correction method and device, electronic equipment and readable storage medium
CN111382572A (en) Named entity identification method, device, equipment and medium
CN115862040A (en) Text error correction method and device, computer equipment and readable storage medium
KR101143650B1 (en) An apparatus for preparing a display document for analysis
CN111368066A (en) Method, device and computer readable storage medium for acquiring dialogue abstract
CN110705217B (en) Wrongly written or mispronounced word detection method and device, computer storage medium and electronic equipment
CN115438650A (en) Contract text error correction method, system, equipment and medium fusing multi-source characteristics
CN107783958B (en) Target statement identification method and device
CN111737982B (en) Chinese text mispronounced character detection method based on deep learning
CN103310209A (en) Method and device for identification of character string in image
CN110929514B (en) Text collation method, text collation apparatus, computer-readable storage medium, and electronic device
CN115984876A (en) Text recognition method and device, electronic equipment, vehicle and storage medium
CN115718889A (en) Industry classification method and device for company profile
CN115147847A (en) Text recognition result determining method and device, storage medium and computer equipment
CN115577688B (en) Table structuring processing method, device, storage medium and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: Jingdong Digital Technology Holding Co.,Ltd.

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Digital Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd.

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd.

Address before: Room 221, floor 2, block C, No. 18, Kechuang 11th Street, Beijing Economic and Technological Development Zone, 100175

Applicant before: BEIJING JINGDONG FINANCIAL TECHNOLOGY HOLDING Co.,Ltd.

GR01 Patent grant
GR01 Patent grant