CN112818667A - Address correction method, system, device and storage medium - Google Patents

Address correction method, system, device and storage medium Download PDF

Info

Publication number
CN112818667A
CN112818667A CN202110127696.4A CN202110127696A CN112818667A CN 112818667 A CN112818667 A CN 112818667A CN 202110127696 A CN202110127696 A CN 202110127696A CN 112818667 A CN112818667 A CN 112818667A
Authority
CN
China
Prior art keywords
correction
address
address data
corrected
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110127696.4A
Other languages
Chinese (zh)
Other versions
CN112818667B (en
Inventor
周筠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xunmeng Information Technology Co Ltd
Original Assignee
Shanghai Xunmeng Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xunmeng Information Technology Co Ltd filed Critical Shanghai Xunmeng Information Technology Co Ltd
Priority to CN202110127696.4A priority Critical patent/CN112818667B/en
Priority claimed from CN202110127696.4A external-priority patent/CN112818667B/en
Publication of CN112818667A publication Critical patent/CN112818667A/en
Application granted granted Critical
Publication of CN112818667B publication Critical patent/CN112818667B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/387Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • G06Q10/0838Historical data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Human Resources & Organizations (AREA)
  • Library & Information Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Remote Sensing (AREA)
  • Mathematical Physics (AREA)
  • Character Discrimination (AREA)

Abstract

The invention provides an address correction method, a system, equipment and a storage medium, wherein the method comprises the following steps: acquiring address data before correction; inputting the address data before correction into a trained hidden Markov model; and determining corrected address data according to the output data of the hidden Markov model. Aiming at the problem of errors of the receiving address in the logistics circulation, the hidden Markov model is adopted to effectively identify the wrong content of the address and correct the wrong content, so that the address correction accuracy and efficiency are improved.

Description

Address correction method, system, device and storage medium
Technical Field
The invention relates to the technical field of logistics data processing, in particular to an address correction method, system, equipment and storage medium.
Background
Under the scene of an e-market, after a user places an order, various circulation links required to pass from a delivery place to a receiving place are inferred according to a receiving address filled by the user, so that express circulation is carried out. In the process, if the receiving address is wrongly filled, the whole logistics circulation is disturbed from the source, so that the accuracy of the receiving address is guaranteed to be crucial to the logistics circulation.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide an address correction method, a system, equipment and a storage medium, which can effectively identify and correct the content of address errors and improve the accuracy and efficiency of address correction aiming at the problem of receiving address errors.
The embodiment of the invention provides an address correction method, which comprises the following steps:
acquiring address data before correction;
inputting the address data before correction into a trained hidden Markov model;
and determining corrected address data according to the output data of the hidden Markov model.
In some embodiments, the hidden markov model wherein the pre-correction address data corresponds to an explicit state and the post-correction address data corresponds to a hidden state.
In some embodiments, the address data is address text, and the inputting the address data before correction into the trained hidden markov model includes the following steps:
extracting an address feature sequence before correction based on the address data before correction according to a preset text feature mapping rule;
and inputting the characteristic sequence of the address data before correction into a trained hidden Markov model.
In some embodiments, determining corrected address data from the output data of the hidden markov model comprises:
acquiring a corrected address characteristic sequence output by the hidden Markov model;
and obtaining corrected address data based on the corrected address feature sequence according to the text feature mapping rule.
In some embodiments, extracting a pre-correction sequence of address features based on the pre-correction address data comprises: performing word segmentation on the address data before correction, mapping each word after word segmentation into word vectors according to the text feature mapping rule, and combining the word vectors to obtain an address feature sequence before correction;
obtaining corrected address data based on the corrected address feature sequence, including: splitting the corrected address characteristic sequence to obtain each word vector, mapping each split word vector to a word according to the text characteristic mapping rule, and combining the mapped words to obtain the corrected address characteristic sequence.
In some embodiments, after determining the corrected address data according to the output data of the hidden markov model, the method further comprises the following steps:
comparing the corrected address data with the address data before correction to determine correction information;
judging whether the correction information is allowable correction or not based on a preset correction check rule;
if not, rejecting the correction information, and restoring the text of the corrected address data corresponding to the correction information to the text before correction.
In some embodiments, the correction information includes a correction location and correction content, the correction content including pre-correction text and post-correction text at the correction location;
the preset correction checking rule comprises that the text before correction at the correction position and the text after correction accord with a preset correction content mapping relation.
In some embodiments, the preset correction content mapping relationship includes that the text before correction and the text after correction are homophones or homomorphic characters; or
The preset mapping relation of the corrected contents comprises that the text before correction and the text after correction are harmonious characters or similar characters.
In some embodiments, the correction information includes a correction position and a correction content, and the preset correction checking rule includes a preset position filtering rule;
judging whether the correction information is allowable correction or not based on a preset correction check rule, including:
and judging whether the correction position is a position allowing correction or not based on a preset position filtering rule.
In some embodiments, after determining the corrected address data according to the output data of the hidden markov model, the method further comprises the following steps:
comparing the corrected address data with the address data before correction to determine a correction position;
extracting corrected address segments corresponding to the corrected positions from the corrected address data, and classifying the corrected address segments;
searching in the corresponding classified word stock based on the corrected address fragment, and judging whether the corrected address fragment is a word existing in the classified word stock;
and if the words corresponding to the corrected address segments do not exist in the classified word bank, rejecting correction corresponding to the corrected positions.
In some embodiments, if there is no word corresponding to the corrected address fragment in the classified lexicon, the method further includes the following steps:
extracting an address fragment before correction corresponding to the correction position from the address data before correction, and judging whether the address fragment before correction is a word existing in the classified word bank;
and if the words corresponding to the address segments before correction exist in the classified word bank, rejecting correction corresponding to the correction position, and restoring the text of the corrected address data corresponding to the correction position into the text before correction.
In some embodiments, training the hidden markov model further comprises:
acquiring first sample address data and corresponding second sample address data as training samples;
and training the hidden Markov model by adopting the training sample to obtain model parameters of the hidden Markov model, wherein in the hidden Markov model, the first sample address data corresponds to a hidden state, and the second sample address data corresponds to an apparent state.
In some embodiments, the acquiring first sample address data and corresponding second sample address data comprises:
collecting correct address data as first local address data;
replacing at least part of the first sample local address data based on a preset error correction type to obtain extended address data;
the extended address data is taken as second sample address data.
In some embodiments, replacing at least a portion of the first uniform address data based at least in part on a preset type of error correction comprises replacing at least a portion of the first uniform address data with a corresponding homophone or homomorphic word; or
And replacing at least part of the first uniform address data based on a preset error correction type, wherein the replacement comprises replacing at least part of the first uniform address data with a corresponding harmonic character or a similar character.
In some embodiments, the acquiring first sample address data and corresponding second sample address data comprises:
collecting error address data as second sample address data;
correcting the error content in the second sample address data to obtain corresponding correct address data;
and taking the correct address data as first local address data.
The embodiment of the present invention further provides an address correction system, which is applied to the address correction method, and the system includes:
the data acquisition module is used for acquiring address data before correction;
the model input module is used for inputting the address data before correction into the trained hidden Markov model;
and the correction output module is used for determining corrected address data according to the output data of the hidden Markov model.
An embodiment of the present invention further provides an address correction device, including:
a processor;
a memory having stored therein executable instructions of the processor;
wherein the processor is configured to perform the steps of the address correction method via execution of the executable instructions.
An embodiment of the present invention further provides a computer-readable storage medium, which is used for storing a program, and when the program is executed, the method for correcting the address is implemented.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
The address correction method, system, device and storage medium of the invention have the following beneficial effects:
aiming at the problem of errors of the receiving address in the logistics circulation, the hidden Markov model is adopted to effectively identify the wrong content of the address and correct the wrong content, so that the address correction accuracy and efficiency are improved.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings.
FIG. 1 is a flow diagram of an address correction method according to an embodiment of the invention;
FIG. 2 is a flow diagram of correction filtering based on correction checking rules, according to an embodiment of the present invention;
FIG. 3 is a flow chart of correction filtering based on a classified lexicon according to an embodiment of the present invention;
FIG. 4 is a flow diagram of training a hidden Markov model according to one embodiment of the invention;
FIG. 5 is a block diagram of an address correction system according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of an address correction apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
As shown in fig. 1, an embodiment of the present invention provides an address correction method, including the following steps:
s100: acquiring address data before correction;
s200: inputting the address data before correction into a trained hidden Markov model;
s300: and determining corrected address data according to the output data of the hidden Markov model.
The invention provides an effective address correction method aiming at the problem of errors of a receiving address in logistics circulation, the address data before correction is collected through the step S100, then the hidden Markov model is adopted to effectively recognize the address error content and correct the address error content through the step S200, and the corrected address data can be obtained through the step S300, so that the address correction accuracy and efficiency are improved.
The address correction method can be applied to a server of a logistics platform, can acquire the address data of a user from an e-commerce platform or other channels, and after error correction is carried out by adopting the address correction method, the corrected address data is used for subsequent logistics express circulation. But the invention is not limited thereto. In another alternative embodiment, the address correction method may also be applied to a server of an e-commerce platform, and after the address data input by the user is acquired, the address data is corrected and provided to the logistics party. In yet another alternative embodiment, the address correction method may also be applied to a separate server. In other alternative embodiments, the address correction method may also be applied to other types of devices, such as notebooks, desktop computers, user terminals, and the like.
Hidden Markov Models (HMM) are statistical models that are used to describe a Markov process with Hidden unknown parameters. In this embodiment, in the hidden markov model, the address data before correction corresponds to an explicit state, and the address data after correction corresponds to a hidden state. In the trained hidden Markov model adopted by the invention, the model parameters are determined during training, the model parameters mainly comprise initial state probability, a hidden state transition matrix and a confusion matrix, the input data is a display state sequence, and the output data is a hidden state. Namely, the hidden Markov model is a model for solving a hidden state sequence by knowing an explicit state sequence and model parameters. For example, one piece of address data before correction is "rowan/shan/off/way/gold/rainbow/bridge/country/border/center/heart", and after being input to the hidden markov model as an explicit state, the corresponding hidden state "rowan/mountain/off/way/gold/rainbow/bridge/country/border/center/heart" is output as corrected address data, and the hidden markov model replaces the wrong content "shan" with "mountain", thereby realizing address data correction.
In this embodiment, the address data is address text. In step S100, the obtaining of the address data before correction may include obtaining a receiving address text input by a user in the e-commerce platform, or obtaining a receiving address text input by a user in the logistics-related platform, or obtaining a receiving address text obtained from a designated database. The address text generally comprises information of all levels of sections and detailed address information, wherein the detailed address information is generally input manually or by voice, and errors of some homophones/homographs can occur. For example, the "loving guan golden bridge international center" may be wrongly input as the "loving guan golden bridge international center", that is, the "mountain" is wrongly input as the homophone "shan". For another example, the word "good" in the detailed address may be incorrectly entered as its isomorphic word "crystal". The homophone herein generally refers to two words with the same initial and final, such as blue and blue, and a neutral clock. Homomorphic characters generally refer to characters with the same shape, such as Ben and Yao, thallium and Tourbo, etc. Further, some words may also be input as their harmonic or near-to-form words due to dialect, lack of pronunciation, sloppy writing, miswriting, etc. during user input. Harmonious characters refer to characters having the same or similar tones, such as the shore and ice, mountain and shang, etc. The shape-similar characters refer to characters with similar characters, such as human and income, memory and era.
In this embodiment, the step S200: inputting the address data before correction into the trained hidden Markov model, comprising the following steps:
extracting an address feature sequence before correction based on the address data before correction according to a preset text feature mapping rule;
inputting the characteristic sequence of the address data before correction into a trained hidden Markov model, wherein the input characteristic sequence of the address data before correction corresponds to a display state sequence of the hidden Markov model.
Further, extracting an address feature sequence before correction based on the address data before correction, including: and performing word segmentation on the address data before correction, mapping each word after word segmentation into word vectors according to the text feature mapping rule, and combining the word vectors to obtain an address feature sequence before correction. The text feature mapping rule may be a preset mapping from a text to a Word vector, for example, mapping an address text to a one-hot coded vector, and the like, and the mapping of the address text may be directly performed based on a preset mapping relationship between a lexicon and the Word vector, or may be implemented by using a trained Word vector model, for example, a Word2vec model, a GloVe model, and the like, but the present invention is not limited thereto; after the address text is mapped to the word vectors, the word vectors of all the words are combined to obtain the characteristic sequence of the address data before correction.
In this embodiment, the step S300: determining corrected address data from the output data of the hidden Markov model, comprising the steps of:
acquiring a corrected address characteristic sequence output by the hidden Markov model;
and obtaining corrected address data based on the corrected address feature sequence according to the text feature mapping rule.
Further, obtaining corrected address data based on the corrected address feature sequence includes: splitting the corrected address characteristic sequence to obtain each word vector, mapping each split word vector to a word according to the text characteristic mapping rule, and combining the mapped words to obtain the corrected address characteristic sequence. The text feature mapping rule may be a preset mapping from a text to a word vector, that is, mapping the word vector in the corrected address feature sequence to a corresponding text, and then combining the texts according to the arrangement sequence of the word vectors in the corrected address feature sequence to obtain corrected address data.
As shown in fig. 2, in this embodiment, the step S300: after the corrected address data is determined according to the output data of the hidden Markov model, the method further comprises the following steps:
s410: comparing the corrected address data with the address data before correction to determine correction information; the correction information includes a correction position and a correction content, the correction content includes a text before correction and a text after correction at the correction position, for example, in the previous example, the "loving guan golden bridge international center" is corrected to be the "loving guan golden bridge international center", the correction position is the position of the word "shan", and the correction content is to replace "shan" to be "mountain";
s420: judging whether the correction information is allowable correction or not based on a preset correction check rule;
s430: if not, the correction information is rejected and the process continues to step S440: restoring the text of the corrected address data corresponding to the correction information into the text before correction;
s450: if so, the correction information is retained.
In an embodiment, the preset correction checking rule includes a content checking rule, and specifically, the content checking rule includes that the text before correction at the correction position and the text after correction conform to a preset correction content mapping relationship. Further, the preset mapping relationship of the corrected content includes that the text before correction and the text after correction are homophones or the text before correction and the text after correction are homomorphic. For example, if "loving Guanyu Jinhong bridge International center" is corrected to "loving Guanyu Jinhong bridge International center", the correction content is to replace "shan" with "mountain", and since "Shaan" and "mountain" are homophones, the correction information is considered to conform to the preset correction check rule, the correction information is retained, that is, the correction is allowed. And if the 'Roushan gateway Jinhong bridge International center' is corrected to 'Roushan gateway Jinhong bridge International center', because 'mountain' and 'rock' are not homophone and isomorphic, the correction information is considered to be not in accordance with the preset correction check rule, the correction information is rejected, and the position of the corrected address data corresponding to the 'rock' is restored to 'mountain'.
Further, in an alternative embodiment, in order to expand the correction range, the preset correction content mapping relationship may also be set to include that the text before correction and the text after correction are harmonious characters or similar characters. In other alternative embodiments, the correction content mapping relationship may also be other mapping relationships, and all of them fall within the protection scope of the present invention.
In another alternative embodiment, the predetermined correction checking rules include predetermined position filtering rules. The determining whether the correction information is an allowable correction based on a preset correction checking rule includes:
and judging whether the correction position is a position allowing correction or not based on a preset position filtering rule.
For example, it may be preset that the first two words in an address do not allow correction, and after the comparison determines the correction position, if the first word or the second word in the address is corrected, the correction is rejected. Specifically, the position allowing correction may be set as needed.
In this embodiment, the content filtering rules and the location filtering rules may be used individually, for example, only the content filtering rules, or only the location filtering rules, or may be used in combination, for example, the content filtering rules and the location filtering rules are used together to filter the correction information, and the correction information is allowed only if the content filtering rules and the location filtering rules are satisfied simultaneously.
In the embodiment, the corrected address data is further filtered by setting the correction check rule, and check is taken into account in the embodiment, so that the accuracy of address correction is further improved, and the influence on the correct application of the subsequent address data due to correction errors is avoided.
The hidden Markov model-based correction is a means for correcting local information in the address data, and after the hidden Markov model is used for correcting the address, the correction can be further filtered based on the classified lexicon, so that the correction accuracy is further improved.
As shown in fig. 3, the step S300: after determining the corrected address data according to the output data of the hidden markov model, the method may further include the following steps:
s510: comparing the corrected address data with the address data before correction to determine a correction position;
for example, when the 'roof and mountain road closure rainbow bridge international center' is corrected to the 'roof and rock road closure rainbow bridge international center', the correction position is the position corresponding to the 'mountain';
s520: extracting corrected address segments corresponding to the corrected positions from the corrected address data, and classifying the corrected address segments;
for example, when the "lokuwa road closure rainbow bridge international center" is corrected to the "lokuwa road closure rainbow bridge international center", the corrected address segment is the "lokuwa road closure", and the corrected address segment can be extracted by a method of searching for a keyword and dividing words from the corrected address data, for example, the keyword "road" is searched, the part in front of the keyword is used as a road name, and classification can be performed based on the keyword during classification, that is, after the keyword "road" is searched, the address segment "lokuwa road" is classified as a road name;
the classification here may mean classification according to contents contained in a general address. For example, the categories may include "road names," "mall names," "office building names," and the like;
s530: searching in the corresponding classified word stock based on the corrected address fragment, and judging whether the corrected address fragment is a word existing in the classified word stock;
for example, for the address segment "roche closes a road", a classification lexicon corresponding to the road name is searched, and the names of all the roads known at present are stored in the classification lexicon;
s540: and if the words corresponding to the corrected address segments exist in the classified word bank, allowing correction corresponding to the corrected positions. For example, when "Loushan road closure" is corrected to "Loushan road closure", and the same road name is found in the taxonomy thesaurus, the correction is allowed.
As shown in fig. 3, if there is no word corresponding to the corrected address fragment in the classified lexicon, the following steps are continued:
s550: extracting an address fragment before correction corresponding to the correction position from the address data before correction, and judging whether the address fragment before correction is a word existing in the classified word bank;
s560: if the words corresponding to the address segments before correction exist in the classified word bank, rejecting correction corresponding to the correction position, and restoring the text of the corrected address data corresponding to the correction position into the text before correction;
for example, when the 'rochan gateway golden rainbow bridge international center' is corrected to the 'rochan gateway golden rainbow bridge international center', the address segment before correction is 'rochan gateway', the address segment after correction is 'rochan gateway', the 'rochan gateway' does not exist in the classified lexicon, the 'rochan gateway' exists in the classified lexicon, the correction is refused, and the 'rochan gateway' after correction is reduced to the 'rochan gateway';
s570: if the words corresponding to the corrected address segments do not exist in the classified word bank, and the words corresponding to the address segments before correction do not exist, the correction corresponding to the corrected position can be directly refused, or the situation can be set to be provided for workers to manually check under the condition.
As shown in fig. 4, in this embodiment, the address correction method further includes training the hidden markov model by:
s610: acquiring first sample address data and corresponding second sample address data as training samples;
s620: and training the hidden Markov model by adopting the training sample to obtain model parameters of the hidden Markov model, wherein in the hidden Markov model, the first sample address data corresponds to a hidden state, and the second sample address data corresponds to an apparent state. The model parameters of the hidden Markov model mainly comprise initial state probability, a transition matrix of a hidden state and a confusion matrix. In the training process, a transition matrix from an explicit state to a implicit state is mainly learned.
Namely, the first sample address data is correct address data, and the second sample address data is corresponding error address data.
In one embodiment, the step S610: collecting first sample address data and corresponding second sample address data, comprising the steps of:
collecting error address data as second sample address data; for example, some historical address data are collected from a logistics platform or an e-commerce platform in advance, some error address data with errors in input are found in the historical address data, and the error address data are used as second sample address data;
and correcting the error content in the second sample address data to obtain corresponding correct address data, and then taking the correct address data as first sample address data.
For example, the collected error address data is a "Lowa gateway golden bridge international center" as a second sample address data, and is corrected manually to be a "Lowa gateway golden bridge international center" as a corresponding first sample address data.
In another embodiment, the correct sample address data may be collected first and then expanded to obtain the incorrect sample address data. Specifically, the step S610: collecting first sample address data and corresponding second sample address data, comprising the steps of:
collecting correct address data as first local address data;
replacing at least part of the first sample local address data based on a preset error correction type to obtain extended address data;
the extended address data is taken as second sample address data.
The substitution may be performed in various ways, that is, corresponding to one piece of first sample address data, multiple pieces of extended address data may be obtained, that is, for each piece of first sample address data, multiple pieces of second sample address data may be corresponding to the extended address data. For example, first, a first sample address data "loqu guan golden bridge international center" is acquired, a "mountain" therein is replaced by a "shan", a second sample address data "loqu guan golden bridge international center" is acquired, a "guan" therein is replaced by an "official", then another second sample address data "loqu guan golden bridge international center" is acquired, a "rainbow" therein is replaced by a "red", and then another second sample address data "loqu guan golden bridge international center" is acquired. Therefore, the expansion of the sample data can be realized, and the optimization of parameters in the hidden Markov model training process is facilitated.
In this embodiment, replacing at least a portion of the first uniform address data based at least in part on a preset type of error correction includes replacing at least a portion of the first uniform address data with a corresponding homophone or a homomorphic word. Further, in order to further realize extension of the sample data, at least part of the first sample local address data is replaced based on a preset error correction type, including replacing at least part of the first sample local address data with a corresponding harmonic character or a similar character, so that a more correction application range of the trained hidden markov model can be further realized.
As shown in fig. 5, an embodiment of the present invention further provides an address correction system, which is applied to the address correction method, and the system includes:
the data acquisition module M100 is used for acquiring address data before correction;
a model input module M200, configured to input address data before correction into a trained hidden markov model, in this embodiment, in the hidden markov model, the address data before correction corresponds to an explicit state, and the address data after correction corresponds to a hidden state;
and a correction output module M300, configured to determine corrected address data according to the output data of the hidden markov model.
The invention provides an effective address correction system aiming at the problem of errors of a receiving address in logistics circulation, the address data before correction is collected through a data collection module M100, then the address error content is effectively recognized and corrected by adopting a hidden Markov model through a model input module M200, and the corrected address data can be obtained through a correction output module M300, so that the address correction accuracy and efficiency are improved.
The address correction system can be deployed in a server of a logistics platform, can acquire the address data of a user from an e-commerce platform or other channels, and after error correction is performed by the address correction system, the corrected address data is used for subsequent logistics express circulation. But the invention is not limited thereto. In another alternative embodiment, the address correction system may be deployed in a server of an e-commerce platform, and after acquiring address data input by a user, perform correction and provide the corrected address data to a logistics party. In yet another alternative embodiment, the address correction system may also be deployed on a separate server. In other alternative embodiments, the address correction system may also be applied to other types of devices, such as notebooks, desktop computers, user terminals, and the like.
In this embodiment, the acquiring, by the data acquisition module M100, address data before correction includes acquiring a receiving address text input by a user in an e-commerce platform, or acquiring a receiving address text input by a user in a platform related to logistics, or acquiring a receiving address text acquired from a designated database.
In this embodiment, the model input module M200 inputting the address data before correction into the trained hidden markov model includes: extracting an address feature sequence before correction based on the address data before correction according to a preset text feature mapping rule; inputting the characteristic sequence of the address data before correction into a trained hidden Markov model, wherein the input characteristic sequence of the address data before correction corresponds to a display state sequence of the hidden Markov model.
In this embodiment, the correction output module M300 determines corrected address data according to the output data of the hidden markov model, including: acquiring a corrected address characteristic sequence output by the hidden Markov model; and obtaining corrected address data based on the corrected address feature sequence according to the text feature mapping rule.
In this embodiment, the address correction system further includes a correction filtering module, where the correction filtering module is configured to determine whether the correction is an allowable correction based on a preset correction checking rule. Specifically, the correction filtering module compares the corrected address data with the address data before correction to determine correction information, where the correction information includes a correction position and correction content, and the correction content includes a text before correction and a text after correction at the correction position; judging whether the correction information is allowable correction or not based on a preset correction check rule; if not, rejecting the correction information, and restoring the text of the corrected address data corresponding to the correction information to the text before correction; if so, the correction information is retained.
In another alternative embodiment, the correction filtering module may be further configured to filter the correction based on the classified lexicon, so as to further improve the accuracy of the correction, and a specific correction manner may adopt the implementation manners of steps S510 to S570 in the address correction method.
In this embodiment, the address correction system further includes a model training module, configured to collect first sample address data and corresponding second sample address data as training samples; and training the hidden Markov model by adopting the training sample to obtain model parameters of the hidden Markov model, wherein in the hidden Markov model, the first sample address data corresponds to a hidden state, and the second sample address data corresponds to an apparent state.
In one embodiment, the model training module collects first sample address data and corresponding second sample address data, including: collecting error address data as second sample address data; for example, some historical address data are collected from a logistics platform or an e-commerce platform in advance, some error address data with errors in input are found in the historical address data, and the error address data are used as second sample address data; and correcting the error content in the second sample address data to obtain corresponding correct address data, and then taking the correct address data as first sample address data.
In another embodiment, the model training module collects first sample address data and corresponding second sample address data, including: collecting correct address data as first local address data; replacing at least part of the first sample local address data based on a preset error correction type to obtain extended address data; the extended address data is taken as second sample address data, so that extension of the sample address data can be realized.
The embodiment of the invention also provides address correction equipment, which comprises a processor; a memory having stored therein executable instructions of the processor; wherein the processor is configured to perform the steps of the address correction method via execution of the executable instructions.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" platform.
An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 6, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 that connects the various system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.
Wherein the storage unit stores program code executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention described in the address correction method section above in this specification. For example, the processing unit 610 may perform the steps as shown in fig. 1.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
By adopting the address correction device provided by the invention, the processor executes the address correction method when executing the executable instruction, thereby obtaining the beneficial effect of the address correction method.
An embodiment of the present invention further provides a computer-readable storage medium, which is used for storing a program, and when the program is executed, the method for correcting the address is implemented. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the invention described in the address correction method section above of this specification when the program product is executed on the terminal device.
Referring to fig. 7, a program product 800 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be executed on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
By adopting the computer-readable storage medium provided by the present invention, the program stored therein, when being executed, implements the steps of the address correction method, whereby the advantageous effects of the above-described address correction method can be obtained.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (18)

1. An address correction method, comprising:
acquiring address data before correction;
inputting the address data before correction into a trained hidden Markov model;
and determining corrected address data according to the output data of the hidden Markov model.
2. The address correction method according to claim 1, wherein in the hidden markov model, the address data before correction corresponds to an explicit state, and the address data after correction corresponds to a hidden state.
3. The address correction method according to claim 1, wherein the address data is an address text, and the inputting of the address data before correction into a trained hidden markov model comprises the steps of:
extracting an address feature sequence before correction based on the address data before correction according to a preset text feature mapping rule;
and inputting the characteristic sequence of the address data before correction into a trained hidden Markov model.
4. The address correction method according to claim 3, wherein determining corrected address data from the output data of the hidden Markov model comprises the steps of:
acquiring a corrected address characteristic sequence output by the hidden Markov model;
and obtaining corrected address data based on the corrected address feature sequence according to the text feature mapping rule.
5. The address correction method according to claim 4, wherein extracting a pre-correction address feature sequence based on the pre-correction address data comprises: performing word segmentation on the address data before correction, mapping each word after word segmentation into word vectors according to the text feature mapping rule, and combining the word vectors to obtain an address feature sequence before correction;
obtaining corrected address data based on the corrected address feature sequence, including: splitting the corrected address characteristic sequence to obtain each word vector, mapping each split word vector to a word according to the text characteristic mapping rule, and combining the mapped words to obtain the corrected address characteristic sequence.
6. The address correction method according to claim 1, further comprising, after determining corrected address data from the output data of the hidden markov model, the steps of:
comparing the corrected address data with the address data before correction to determine correction information;
judging whether the correction information is allowable correction or not based on a preset correction check rule;
if not, rejecting the correction information, and restoring the text of the corrected address data corresponding to the correction information to the text before correction.
7. The address correction method according to claim 6, wherein the correction information includes a correction position and correction contents, the correction contents including pre-correction text and post-correction text at the correction position;
the preset correction checking rule comprises that the text before correction at the correction position and the text after correction accord with a preset correction content mapping relation.
8. The address correction method according to claim 7, wherein the preset correction content mapping relationship includes that the text before correction and the text after correction are homophones or homographs; or
The preset mapping relation of the corrected contents comprises that the text before correction and the text after correction are harmonious characters or similar characters.
9. The address correction method according to any one of claims 6 to 8, wherein the correction information includes a correction position and a correction content, and the preset correction checking rule includes a preset position filtering rule;
judging whether the correction information is allowable correction or not based on a preset correction check rule, including:
and judging whether the correction position is a position allowing correction or not based on a preset position filtering rule.
10. The address correction method according to claim 1, further comprising, after determining corrected address data from the output data of the hidden markov model, the steps of:
comparing the corrected address data with the address data before correction to determine a correction position;
extracting corrected address segments corresponding to the corrected positions from the corrected address data, and classifying the corrected address segments;
searching in the corresponding classified word stock based on the corrected address fragment, and judging whether the corrected address fragment is a word existing in the classified word stock;
and if the words corresponding to the corrected address segments do not exist in the classified word bank, rejecting correction corresponding to the corrected positions.
11. The address correction method according to claim 10, wherein if there is no word corresponding to the corrected address fragment in the classified thesaurus, further comprising the steps of:
extracting an address fragment before correction corresponding to the correction position from the address data before correction, and judging whether the address fragment before correction is a word existing in the classified word bank;
and if the words corresponding to the address segments before correction exist in the classified word bank, rejecting correction corresponding to the correction position, and restoring the text of the corrected address data corresponding to the correction position into the text before correction.
12. The address correction method of claim 1, further comprising training the hidden markov model using the steps of:
acquiring first sample address data and corresponding second sample address data as training samples;
and training the hidden Markov model by adopting the training sample to obtain model parameters of the hidden Markov model, wherein in the hidden Markov model, the first sample address data corresponds to a hidden state, and the second sample address data corresponds to an apparent state.
13. The address correction method of claim 12, wherein the collecting of the first sample address data and the corresponding second sample address data comprises the steps of:
collecting correct address data as first local address data;
replacing at least part of the first sample local address data based on a preset error correction type to obtain extended address data;
the extended address data is taken as second sample address data.
14. The address correction method of claim 13, wherein replacing at least a portion of the first uniform address data based on a preset type of error correction comprises replacing at least a portion of the first uniform address data with a corresponding homophone or a homomorphic word; or
And replacing at least part of the first uniform address data based on a preset error correction type, wherein the replacement comprises replacing at least part of the first uniform address data with a corresponding harmonic character or a similar character.
15. The address correction method of claim 13, wherein the collecting of the first sample address data and the corresponding second sample address data comprises the steps of:
collecting error address data as second sample address data;
correcting the error content in the second sample address data to obtain corresponding correct address data;
and taking the correct address data as first local address data.
16. An address correction system applied to the address correction method of any one of claims 1 to 15, the system comprising:
the data acquisition module is used for acquiring address data before correction;
the model input module is used for inputting the address data before correction into the trained hidden Markov model;
and the correction output module is used for determining corrected address data according to the output data of the hidden Markov model.
17. An electronic device, characterized in that the electronic device comprises:
a processor;
memory on which a computer program is stored which, when executed by the processor, performs the address correction method according to any one of claims 1 to 15.
18. A computer storage medium, characterized in that a computer program is stored which, when being executed by a processor, performs an address correction method according to any one of claims 1 to 15.
CN202110127696.4A 2021-01-29 Address correction method, system, device and storage medium Active CN112818667B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110127696.4A CN112818667B (en) 2021-01-29 Address correction method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110127696.4A CN112818667B (en) 2021-01-29 Address correction method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN112818667A true CN112818667A (en) 2021-05-18
CN112818667B CN112818667B (en) 2024-07-02

Family

ID=

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343670A (en) * 2021-05-26 2021-09-03 武汉大学 Address text element extraction method based on hidden Markov and classification algorithm coupling
CN113438280A (en) * 2021-06-03 2021-09-24 多点生活(成都)科技有限公司 Vehicle starting control method and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102314417A (en) * 2011-09-22 2012-01-11 西安电子科技大学 Method for identifying Web named entity based on statistical model
CN102750351A (en) * 2012-06-11 2012-10-24 迪尔码国际营销服务(北京)有限公司 Matching method of address information based on rules
CN107526967A (en) * 2017-07-05 2017-12-29 阿里巴巴集团控股有限公司 A kind of risk Address Recognition method, apparatus and electronic equipment
CN110197284A (en) * 2019-04-30 2019-09-03 腾讯科技(深圳)有限公司 A kind of address dummy recognition methods, device and equipment
CN110892394A (en) * 2017-06-29 2020-03-17 亚马逊科技公司 Identification of incorrect addresses for package delivery
CN111222345A (en) * 2020-01-15 2020-06-02 合肥慧图软件有限公司 Place name address visualization analysis method based on semantic word segmentation technology
CN111695355A (en) * 2020-05-26 2020-09-22 平安银行股份有限公司 Address text recognition method, device, medium and electronic equipment
CN111724110A (en) * 2020-06-16 2020-09-29 苏宁云计算有限公司 Address information processing method and device, computer equipment and storage medium
CN113221558A (en) * 2021-05-28 2021-08-06 中邮信息科技(北京)有限公司 Express delivery address error correction method and device, storage medium and electronic equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102314417A (en) * 2011-09-22 2012-01-11 西安电子科技大学 Method for identifying Web named entity based on statistical model
CN102750351A (en) * 2012-06-11 2012-10-24 迪尔码国际营销服务(北京)有限公司 Matching method of address information based on rules
CN110892394A (en) * 2017-06-29 2020-03-17 亚马逊科技公司 Identification of incorrect addresses for package delivery
CN107526967A (en) * 2017-07-05 2017-12-29 阿里巴巴集团控股有限公司 A kind of risk Address Recognition method, apparatus and electronic equipment
CN110197284A (en) * 2019-04-30 2019-09-03 腾讯科技(深圳)有限公司 A kind of address dummy recognition methods, device and equipment
CN111222345A (en) * 2020-01-15 2020-06-02 合肥慧图软件有限公司 Place name address visualization analysis method based on semantic word segmentation technology
CN111695355A (en) * 2020-05-26 2020-09-22 平安银行股份有限公司 Address text recognition method, device, medium and electronic equipment
CN111724110A (en) * 2020-06-16 2020-09-29 苏宁云计算有限公司 Address information processing method and device, computer equipment and storage medium
CN113221558A (en) * 2021-05-28 2021-08-06 中邮信息科技(北京)有限公司 Express delivery address error correction method and device, storage medium and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343670A (en) * 2021-05-26 2021-09-03 武汉大学 Address text element extraction method based on hidden Markov and classification algorithm coupling
CN113343670B (en) * 2021-05-26 2023-07-28 武汉大学 Address text element extraction method based on coupling of hidden Markov and classification algorithm
CN113438280A (en) * 2021-06-03 2021-09-24 多点生活(成都)科技有限公司 Vehicle starting control method and device

Similar Documents

Publication Publication Date Title
CN107908635B (en) Method and device for establishing text classification model and text classification
EP3709295B1 (en) Methods, apparatuses, and storage media for generating training corpus
CN108491373B (en) Entity identification method and system
CN110276023B (en) POI transition event discovery method, device, computing equipment and medium
CN107797985B (en) Method and device for establishing synonymous identification model and identifying synonymous text
US9390084B2 (en) Natural language parsers to normalize addresses for geocoding
CN111460250B (en) Image data cleaning method, image data cleaning device, image data cleaning medium, and electronic apparatus
CN112016304A (en) Text error correction method and device, electronic equipment and storage medium
CN111309915A (en) Method, system, device and storage medium for training natural language of joint learning
US11328708B2 (en) Speech error-correction method, device and storage medium
US11031009B2 (en) Method for creating a knowledge base of components and their problems from short text utterances
CN113642316B (en) Chinese text error correction method and device, electronic equipment and storage medium
CN111125317A (en) Model training, classification, system, device and medium for conversational text classification
CN109947924B (en) Dialogue system training data construction method and device, electronic equipment and storage medium
CN113205814B (en) Voice data labeling method and device, electronic equipment and storage medium
CN115328756A (en) Test case generation method, device and equipment
CN111597800B (en) Method, device, equipment and storage medium for obtaining synonyms
CN111339292A (en) Training method, system, equipment and storage medium of text classification network
US11182665B2 (en) Recurrent neural network processing pooling operation
CN111161724B (en) Method, system, equipment and medium for Chinese audio-visual combined speech recognition
CN111401012A (en) Text error correction method, electronic device and computer readable storage medium
CN110929499B (en) Text similarity obtaining method, device, medium and electronic equipment
CN112836497A (en) Address correction method, device, electronic equipment and storage medium
CN112818667B (en) Address correction method, system, device and storage medium
CN112818667A (en) Address correction method, system, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant