CN117131861A - Address checking method and device - Google Patents

Address checking method and device Download PDF

Info

Publication number
CN117131861A
CN117131861A CN202210535023.7A CN202210535023A CN117131861A CN 117131861 A CN117131861 A CN 117131861A CN 202210535023 A CN202210535023 A CN 202210535023A CN 117131861 A CN117131861 A CN 117131861A
Authority
CN
China
Prior art keywords
address
administrative
determining
administrative information
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210535023.7A
Other languages
Chinese (zh)
Inventor
潘明瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Three Hundred And Sixty Degree E Commerce Co ltd
Original Assignee
Beijing Jingdong Three Hundred And Sixty Degree E Commerce Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Three Hundred And Sixty Degree E Commerce Co ltd filed Critical Beijing Jingdong Three Hundred And Sixty Degree E Commerce Co ltd
Priority to CN202210535023.7A priority Critical patent/CN117131861A/en
Publication of CN117131861A publication Critical patent/CN117131861A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for address inspection, and relates to the technical field of computers. One embodiment of the method comprises the following steps: acquiring an address to be checked; determining one or more administrative address keywords, administrative information corresponding to the one or more administrative address keywords and detailed address texts from the address to be checked; respectively determining cascading relations of the administrative information; determining a test result of the detailed address text according to the address verification model; and determining the checking result of the address to be checked according to the cascading relation and/or the checking result of the detailed address text. By splitting the detailed address text from the address text to be verified, the complex long address input as the model is converted into a simple detailed address, the burden of model operation is reduced, and the processing speed is ensured. Meanwhile, through judging the cascade relation, the accuracy of the verification result is ensured.

Description

Address checking method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for address verification.
Background
Address checking techniques, i.e. detecting whether an address text is a valid address. At present, the address checking technology can be used for monitoring a user bill and address cheating, and can be used for monitoring the address quality in the situations of logistics, operators and credit cards, so that invalid dispatch and the like are reduced, and the operation cost of enterprises is reduced.
In the prior art, a partially complex prediction model is usually adopted, or address inspection is converted into an abnormality detection problem. A more complex predictive model can affect the speed of address verification and make it difficult to support online services. The address inspection is converted into an abnormality detection problem, and the ratio of the abnormal addresses is usually found out to be low, so that the accuracy of the address inspection is reduced.
Disclosure of Invention
In view of this, an embodiment of the present invention provides a method and an apparatus for address verification, where detailed address text and administrative information are split from address text to be verified, and a cascade relationship of the administrative information and a verification result of the detailed address text are determined, and a verification result of the address to be verified is determined according to a common verification result of the two. Therefore, the complex long address is converted into a simple detailed address to be used as a model input, the burden of model operation is reduced, and the processing speed is ensured. And the accuracy of the verification result is ensured through the judgment of the cascade relation.
To achieve the above object, according to a first aspect of an embodiment of the present invention, there is provided a method of address verification.
The address checking method of the embodiment of the invention comprises the following steps:
acquiring an address to be checked; determining one or more administrative address keywords, administrative information corresponding to the one or more administrative address keywords and detailed address text from the address to be checked; respectively determining cascading relations of the administrative information; determining the test result of the detailed address text according to an address verification model; and determining the checking result of the address to be checked according to the cascading relation and/or the checking result of the detailed address text.
Optionally, the determining the verification result of the address to be verified according to the cascade relation and/or the verification result of the detailed address text includes:
under the condition that the cascade relation is incorrect, determining that the checking result of the address to be checked is a false address;
and/or under the condition that the cascade relation is correct, taking the detailed address text as the input of the address verification model, and determining the verification result of the address to be verified according to the output of the address verification model.
Optionally, the determining the cascade relation of each administrative information includes: determining whether the number of the administrative information is greater than or equal to a preset number threshold;
if yes, judging whether the cascade relation of the administrative information meets a preset administrative subordinate relation, and determining that the cascade relation is correct under the condition that the subordinate relation is met; determining that the cascade relationship is incorrect if the dependency relationship is not satisfied;
if not, determining that the cascade relationship is incorrect.
Optionally, after determining that the number of the administrative information is greater than or equal to the preset number threshold, before the determining whether the cascade relationship of the administrative information meets the preset administrative subordinate relationship, the method further includes: judging whether repeated administrative information exists in the administrative information or not;
If yes, deleting the repeated administrative information, and repeatedly executing the step of judging whether the number of the administrative information is larger than or equal to a preset number threshold;
otherwise, judging whether the cascade relation of the administrative information meets the administrative subordinate relation or not.
Optionally, in the case that the address to be checked includes a plurality of identical administrative address keywords, target administrative information corresponding to the identical administrative address keywords respectively is obtained: determining a cascading relationship of the target administrative information according to other keywords adjacent to the administrative levels of the administrative address keywords in the address to be checked; and respectively determining whether the cascade relation of each piece of target administrative information meets the administrative subordinate relation.
Optionally, the determining the test result of the detailed address text according to the address verification model includes: taking the detailed address as the input of the address verification model, and outputting the confidence; judging whether the confidence coefficient meets a probability threshold value or not; if yes, determining the test result of the detailed address text as a true address; otherwise, determining the test result of the detailed address text as a false address.
Optionally, the address verification model is a fasttext model, and the loss function in the fasttext model is a focalloss loss function.
To achieve the above object, according to a second aspect of the embodiments of the present invention, there is provided an apparatus for address verification.
The address checking device of the embodiment of the invention comprises:
the acquisition module is used for acquiring the address to be checked;
the identification module is used for determining one or more administrative address keywords, administrative information corresponding to the one or more administrative address keywords and detailed address texts from the address to be checked;
the checking module is used for respectively determining cascading relations of the administrative information; determining the test result of the detailed address text according to an address verification model; and determining the checking result of the address to be checked according to the cascading relation and/or the checking result of the detailed address text.
Optionally, the verification module is further configured to determine that a verification result of the address to be verified is a false address if the cascade relationship is incorrect.
Optionally, the verification module is further configured to, when the cascade relationship is correct, use the detailed address text as an input of the address verification model, and determine a verification result of the address to be verified according to an output of the address verification model.
Optionally, the inspection module is further configured to determine whether the number of administrative information is greater than or equal to a preset number threshold; if yes, judging whether the cascade relation of the administrative information meets a preset administrative subordinate relation, and determining that the cascade relation is correct under the condition that the subordinate relation is met; determining that the cascade relationship is incorrect if the dependency relationship is not satisfied; if not, determining that the cascade relationship is incorrect.
Optionally, the inspection module is further configured to determine whether repeated administrative information exists in the plurality of administrative information before determining whether the cascade relationship of the administrative information satisfies a preset administrative subordinate relationship after determining that the number of the administrative information is greater than or equal to a preset number threshold; if yes, deleting the repeated administrative information, and repeatedly executing the step of judging whether the number of the administrative information is larger than or equal to a preset number threshold; otherwise, judging whether the cascade relation of the administrative information meets the administrative subordinate relation or not.
Optionally, the verification module is further configured to, when the address to be verified includes a plurality of identical administrative address keywords, target administrative information corresponding to the identical administrative address keywords respectively: determining a cascading relationship of the target administrative information according to other keywords adjacent to the administrative levels of the administrative address keywords in the address to be checked; and respectively determining whether the cascade relation of each piece of target administrative information meets the administrative subordinate relation.
Optionally, the verification module is further configured to output a confidence level by taking the detailed address as an input of the address verification model; judging whether the confidence coefficient meets a probability threshold value or not; if yes, determining the test result of the detailed address text as a true address; otherwise, determining the test result of the detailed address text as a false address.
Optionally, the address verification model is a fasttext model, and the loss function in the fasttext model is a focalloss loss function.
To achieve the above object, according to a third aspect of embodiments of the present invention, there is provided an apparatus for address verification.
The address checking device of the embodiment of the invention comprises: one or more processors; a storage system for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods of address verification of embodiments of the present invention.
To achieve the above object, according to a fourth aspect of the embodiments of the present invention, there is provided a computer-readable medium.
The computer readable medium of the embodiment of the present invention stores a computer program, which when executed by a processor, implements the method of address verification of the embodiment of the present invention.
One embodiment of the above invention has the following advantages or benefits: in the embodiment of the invention, the detailed address text and the administrative information are split from the address text to be verified, the cascade relation of the administrative information and the test result of the detailed address text are judged, and the test result of the address to be tested is determined according to the test result which is common to the two. Therefore, the complex long address is converted into a simple detailed address to be used as a model input, the burden of model operation is reduced, and the processing speed is ensured. And the accuracy of the verification result is ensured through the judgment of the cascade relation.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of a main flow of a method for address verification according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a main flow of determining cascade relationships between administrative information according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a main flow of deleting duplicate administrative information according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the main flow when the address to be checked includes a plurality of identical administrative address keywords according to the embodiment of the present invention;
FIG. 5 is a schematic diagram of the main flow of determining the verification result of the detailed address text according to the address verification model in the embodiment of the present invention;
FIG. 6 is a schematic diagram of a main flow of a method for address verification according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of the main modules of an address checking apparatus according to an embodiment of the present invention;
FIG. 8 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 9 is a schematic diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
According to a first aspect of an embodiment of the present invention, there is provided a method applied to address verification of a server.
FIG. 1 is a schematic diagram of a main flow of a method for address verification according to an embodiment of the present application. As shown in fig. 1, the method mainly comprises:
step S101: acquiring an address to be checked;
step S102: determining one or more administrative address keywords, administrative information corresponding to the one or more administrative address keywords and detailed address texts from the address to be checked;
step S103: respectively determining cascading relations of the administrative information;
step S104: determining a test result of the detailed address text according to the address verification model;
step S105: and determining the checking result of the address to be checked according to the cascading relation and/or the checking result of the detailed address text.
The address to be checked is a user address to be checked, typically a long address with a large number of words, for example, DD town xxxx in CC, BB, AA. And because of this, only a complicated prediction model is needed in the prior art to accurately judge the address to be tested, which results in the problems of high delay and low efficiency.
In the present application, the administrative address keyword is a word that may represent a specific administrative level, and may be one or more of provinces (direct jurisdiction), district-level cities, county-level cities, villages and towns, streets, and communities. Because the existing malicious address is usually falsified in a detailed address text part and is relatively easy to identify by country falsification, in an alternative embodiment, four administrative levels of province (straight jurisdiction city), district-level city, county-level city and village and town are selected as administrative address keywords of the application, and parts of province (straight jurisdiction city), district-level city, county-level city and town are removed as detailed address texts, so that the aim of splitting a long address into a short address is fulfilled. Furthermore, each administrative address keyword corresponds to a plurality of administrative information, wherein the administrative information is a specific province (direct administration city), a local city, a county city and a village and town name corresponding to the administrative address keyword. Because the association relation between the administrative address and the administrative information is pre-stored according to the national administrative district library, namely, the association relation comprises all provinces (directly administered city), district-level cities, county-level cities and villages, the accuracy can be ensured, and the problem of low accuracy when the address inspection is converted into abnormal detection can be avoided.
Illustratively, the address to be checked is DD town xxxx in BB city, AA province, and then the administrative information is determined according to four administrative address keywords of province (direct jurisdiction city), district-level city, county-level city, and village-town: AA province [ province ], BB city [ district level city ], CC city [ county level city ], DD town [ village town ], the remaining XXXXX being the detailed address text.
In an alternative embodiment, steps S101 to S102 may be implemented by a CRF (sequence labeling algorithm, conditional random field) training model, which is a light model, typically used as a basic model in natural language processing, widely used for word segmentation. In the application, however, the CRF is used to find out the administrative division entity (that is, the administrative information corresponding to one or more administrative address keywords) in the address to be checked, and the accuracy of judgment can be ensured on the basis of ensuring the rapid average reasoning speed by applying the CRF model. Further, after the CRF finds the administrative division entity, step S103 is completed by matching the administrative division entity with the administrative division library.
In an alternative embodiment, the cascade relationship between the administrative information is a cascade relationship between each administrative information and the upper and lower level administrative information. Taking the example of DD town XXXXX in CC city of BB, AA province, since the highest level administrative address keyword is province (directly administered city), the cascade relationship of AA province [ province ] is AA province [ province ] -BB city [ local city ]. The upper level of the district-level city is province (directly administered city), and the lower level is county-level city, so that the cascade relationship of BB city [ district-level city ] is AA province [ province ] -BB city [ district-level city ] -CC city [ county-level city ]. Similarly, a cascade relationship corresponding to the administrative information may be determined for each administrative information.
For step S103, in an alternative embodiment, as shown in fig. 2, it includes:
step S201: determining whether the number of administrative information is greater than or equal to a preset number threshold;
if yes, go to step S202, if not, go to step S204;
step S202: judging whether the cascade relation of the administrative information meets a preset administrative subordinate relation or not; in the case where the affiliation is satisfied, step S203 is performed: in the case where the affiliation is not satisfied, step S204 is performed;
step S203: determining that the cascade relation is correct;
step S204: the cascade relationship is determined to be incorrect.
Since the administrative information corresponding to the administrative address keywords may be duplicated or missing in some addresses, the judgment may be performed according to the number of administrative information, and in an alternative embodiment, the preset number threshold is set to 4, so as to ensure that there are four levels of administrative levels of province (direct jurisdiction), district city, county city, village and town in the addresses to be inspected. When the number of administrative information is smaller than a preset number threshold, for example, the number of administrative information is 3 in BB city of AA province, and the number of administrative information is smaller than the number threshold, it is indicated that in the address to be checked, the administrative information is not complete enough, for example, the administrative information is lack of county level city or the administrative information is lack of district level city, which may possibly cause the level in each administrative level to be lack, so that the cascade relationship is determined to be incorrect. In practice, it is found that the preset number threshold is set to be 4, that is, the cascade relation of the administrative information is judged by adopting four-level administrative levels, so that both the operation amount and the accuracy of address inspection can be considered, that is, the verification speed can be ensured on the premise of ensuring the accuracy of address inspection.
When the number of the administrative information is greater than or equal to the preset number threshold, it may be further determined whether the cascade relationship satisfies a preset administrative subordinate relationship, where the administrative subordinate relationship is whether a plurality of administrative information in the cascade relationship belongs to a same link, that is, whether a layer-by-layer nesting relationship exists among four administrative levels of province (directly administered city), district-level city, county-level city, and village. In an alternative embodiment, it may be determined whether a plurality of administrative information belong to the same link according to a tree structure based on a national administrative division library. For example, the address to be checked is DD town XXXXX in CC, BB, AA. The cascade relation of the AA province [ province ] is AA province [ province ] -BB city [ district level city ], the cascade relation of the BB city [ district level city ] is AA province [ province ] -BB city [ district level city ] -CC city [ county level city ], the cascade relation of the CC city [ county level city ] is BB city [ district level city ] -CC city [ county level city ] -DD town [ village town ], the method is divided according to the administrative level in the national administrative division library, the province (straight city) is taken as a root node, whether the BB city [ district level city ] exists in the AA province [ province ] is judged firstly, whether the CC city [ county level city ] exists in the BB city [ district level city ], and finally whether the DD city [ county town ] exists in the CC city [ county level city ]. When the layer-by-layer judgment is correct according to the tree structure taking province (directly administered city) as the root node, the administrative information is considered to belong to the same link, namely the cascade relationship meets the preset administrative subordinate relationship.
When the number of administrative information is greater than or equal to the preset number threshold, there may be a phenomenon that the administrative information is repeated, so in an alternative embodiment, as shown in fig. 3, after determining that the number of administrative information is greater than or equal to the preset number threshold, before determining whether the cascade relationship of the administrative information satisfies the preset administrative subordinate relationship, the method further includes:
step S301: judging whether repeated administrative information exists in the plurality of administrative information;
if yes, step S302 is executed: deleting the repeated administrative information, and repeatedly executing the step of judging whether the number of the administrative information is greater than or equal to a preset number threshold;
otherwise, step S303 is executed: judging whether the cascade relation of the administrative information meets the administrative subordinate relation.
Through steps S301 to S303, it is possible to solve the case where the cascade relationship of the respective pieces of administrative information cannot be determined when the number of pieces of administrative information is greater than or equal to the preset number threshold value due to repetition of the administrative information. For example, the address to be checked is BB city, DD town, XXXXX, in which two BB cities [ district level city ] appear, and although the number of administrative information is greater than or equal to the preset number threshold, the address checking is performed only by the lack of CC city [ county level city ], which results in the need to traverse all county level cities subordinate to BB city [ district level city ], which has the problems of low efficiency and high delay, so that the step of deleting repeated administrative information and then executing again whether the number of administrative information is greater than or equal to the preset number threshold is required. And if the quantity of the remained administrative information after the deletion is smaller than a preset quantity threshold value, determining that the cascade relation is incorrect.
When the number of administrative information is greater than or equal to the preset number threshold, there may be a plurality of identical administrative address keywords, for example, the address to be checked is DD town XXXXX in BB city of AA province, and at this time, there are 5 administrative information, where two administrative information corresponding to AA province [ province ] and EE city [ in je city ] respectively, and then it is necessary to determine cascade relationships of the administrative information corresponding to the two provinces (in je city) respectively. In an alternative embodiment, for the target administrative information corresponding to each of the plurality of identical administrative address keywords, as shown in fig. 4, the method includes:
step S401: determining a cascading relation of target administrative information according to other keywords adjacent to administrative levels of a plurality of identical administrative address keywords in the address to be checked;
step S402: and respectively determining whether the cascade relation of each piece of target administrative information meets the administrative subordinate relation.
In this embodiment, the target administrative information corresponding to the province (in the straight administration city) is AA province [ province ] and EE city [ in the straight administration city ], and the adjacent other keywords are ground level cities, so that the cascade relationship of the target administrative information is AA province [ province ] -BB city [ ground level city ] and EE city [ in the straight administration city ] -BB city [ ground level city ], and at this time, whether AA province [ province ] -BB city [ ground level city ] and EE city [ in the straight administration city ] -BB city [ ground level city ] satisfy the administrative subordinate relationship needs to be determined respectively, and when one of them satisfies the administrative subordinate relationship, the cascade relationship is considered to be correct.
For step S105, in an alternative embodiment, the cascade relationship may be determined first, and in case the cascade relationship is correct, the verification result of the detailed address text may be determined. Specifically, the method comprises the following steps: under the condition that the cascade relation is incorrect, determining that the checking result of the address to be checked is a false address; and under the condition that the cascade relation is correct, taking the detailed address text as the input of the address verification model, and determining the verification result of the address to be verified according to the output of the address verification model. By determining the cascade relation, a large number of addresses to be checked can be continuously screened, and only the cascade relation is correctly input into the address verification model, so that the calculated amount of the address verification model is reduced, and the efficiency is improved. In another alternative embodiment, the sequence of the test results of determining the cascade relation and determining the detailed address text is not limited, and only when the results of the cascade relation and determining the detailed address text are both correct, the test result of the address to be tested is a true address, and when any result is incorrect, the test result of the address to be tested is a false address.
In an alternative embodiment, for the test result of determining the detailed address text according to the address verification model in step S104, as shown in fig. 5, specifically may include:
Step S501: taking the detailed address as the input of the address verification model, and outputting the confidence coefficient;
step S502: judging whether the confidence coefficient meets a probability threshold value or not; if yes, step S503 is executed: determining the test result of the detailed address text as a true address; otherwise, step S504 is performed: and determining the test result of the detailed address text as a false address.
Because the existing two-classification model cannot exhaust the distribution mode of the false address, the false address which is not trained lacks the discrimination capability, so that in order to ensure that the number of the trained samples is as large as possible, the number of the negative samples is far larger than that of the positive samples in the training process, and the problem of sample imbalance can be caused. For example, in a practical application scenario, the detailed address containing the geographic entity is limited, i.e. the number of positive samples is limited, while the detailed address (negative sample) not containing the geographic entity is infinite, and the user may input any detailed address, such as "haha", "fine", "random", etc., and may be any text. Thus, an input that is a negative sample far more than an input that is a positive sample may cause the model to exhibit a tendency to discriminate against the negative sample, and when the positive sample and the negative sample are present simultaneously in one address, the model tends to discriminate that the address is a false address. Thus, in an alternative embodiment of the present application, the address verification model is a fasttext model, in which the penalty function is a focalloss penalty function.
In an alternative embodiment, the fasttext model comprises three layers, an input layer (emmbedding layer), an hidden layer (projection layer), and an output layer (softmax layer), respectively. The specific method comprises the following steps:
(1) And (3) data construction: taking the detailed address containing the geographic entity as a positive sample, such as "scientific eleven street MN building"; the detailed address which does not contain the geographic entity is taken as a negative sample, and is constructed based on corpus such as news, novel, chat, ancient text and the like, for example, telephone connection, and the number of the negative samples is significantly larger than that of the positive samples in the data construction process.
(2) Model entry: randomly initializing word vectors of training samples;
specifically, the random initialization process is:
step one: a word2id dictionary is built that randomly initializes a unique index number for each word in all samples, for example, if there are two training samples now, the positive sample "kechu eleven street MN building", the negative sample "phone contact", the dictionary can be built as { co: 0, creation: 1, ten: 2, one: 3, street: 4, M:5,N:6, big: 7, building: 8, electricity: 9, if so: 10, combination: 11, the system is: 12};
step two: initializing word vectors through an embellishing layer, firstly searching index numbers of each word in word2id in each training sample, for example, index numbers of 'families' are 0, and then mapping vectors with specified dimensions according to the index numbers of each word by the embellishing layer.
(3) Model structure optimization:
after the empdding layer, an attention mechanism is added, which increases the weight of the geographic vocabulary in the sample, and makes the model focus more on the learning of geographic entities, for example, in a scientific eleven street MN building, the weight of the positive sample "MN building" is increased.
(4) Model loss optimization:
step one: the modification of the softmax loss to the focalloss loss, since this problem is a two-class problem, the modification of the loss function essentially converts the two-class cross entropy loss (a special case of softmax under two classes) to a focalloss loss. The existing softmax loses the following formula (one), and the embodiment of the invention specifically realizes the following formula (two):
cross entropy loss:
focalloss loss:
wherein p is probability of positive samples, y is label value of samples, gamma is used for adjusting classification of difficult-to-separate samples, and alpha is used for adjusting influence caused by imbalance of positive and negative samples.
(5) Model super-parameter debugging: and (3) adjusting model super parameters including learning rate, gamma value, alpha value of a loss function and the like, and performing training verification of the model.
Through the optimization of the loss function in the fasttet model, namely, the cross entropy loss function of the two classifications is modified into the focalloss loss function, the problem of unbalance of positive and negative samples can be solved, training deviation brought by unbalance of the samples to the model is corrected, and judgment tendency of the negative samples is avoided.
FIG. 6 is a schematic diagram of a main flow of a method for address verification according to an embodiment of the present invention, as shown in FIG. 6, including: firstly, acquiring an address text to be verified; and then, carrying out entity identification by using a CRF model, splitting a four-level administrative division and a detailed address text from the address text to be verified, and then, respectively carrying out correctness judgment on the four-level administrative division and the detailed address text. The method comprises the steps of firstly judging the correctness of a four-level administrative division, and then judging the correctness of a detailed address text under the condition that a result is correct. The calculation workload of the model for judging the detailed address text can be reduced, and the calculation efficiency is improved.
And judging the correctness of the four-level administrative division, searching according to an administrative division library, judging whether the four-level administrative division is complete, if not, judging whether the address text to be verified is invalid as a false address, if so, judging whether the cascade is correct, and if not, judging whether the address text to be verified is invalid as a false address, and if so, judging the correctness of the detailed address text.
And for judging the detailed address text, taking the detailed address text as the input of an improved fastatex model, judging whether a positive sample containing a geographic entity is available to obtain a judging result, if the geographic entity is included, considering that the address text to be verified is valid as a true address, otherwise, considering that the address text to be verified is invalid as a false address.
According to the address checking method, the detailed address text is split from the address text to be checked, the cascade relation and the checking result of the detailed address text are judged at the same time, and the checking result of the address to be checked is determined according to the common checking result of the cascade relation and the detailed address text. The complex long address input as the model is converted into a simple detailed address, so that the burden of model operation is reduced, and the processing speed is ensured. Meanwhile, through judging the cascade relation, the accuracy of the verification result is ensured.
According to a second aspect of an embodiment of the present invention, there is provided an apparatus for address verification.
Fig. 7 is a schematic diagram of main modules of an apparatus 700 for address verification according to a second aspect of an embodiment of the invention. As shown in fig. 7, includes:
an obtaining module 701, configured to obtain an address to be checked;
the identifying module 702 is configured to determine one or more administrative address keywords, administrative information corresponding to the one or more administrative address keywords, and detailed address text from the address to be checked;
a checking module 703, configured to determine cascade relationships of the administrative information respectively; determining the test result of the detailed address text according to an address verification model; and determining the checking result of the address to be checked according to the cascading relation and/or the checking result of the detailed address text.
In an alternative embodiment of the present invention, the checking module 703 is further configured to determine that the checking result of the address to be checked is a false address in the case that the cascade relationship is incorrect.
In an alternative embodiment of the present invention, the verification module 703 is further configured to, in a case that the cascade relationship is correct, use the detailed address text as an input of the address verification model, and determine a verification result of the address to be verified according to an output of the address verification model.
In an alternative embodiment of the present invention, the checking module 703 is further configured to determine whether the amount of the administrative information is greater than or equal to a preset amount threshold; if yes, judging whether the cascade relation of the administrative information meets a preset administrative subordinate relation, and determining that the cascade relation is correct under the condition that the subordinate relation is met; determining that the cascade relationship is incorrect if the dependency relationship is not satisfied; if not, determining that the cascade relationship is incorrect.
In an alternative embodiment of the present invention, the checking module 703 is further configured to, after determining that the number of administrative information is greater than or equal to a preset number threshold, determine whether repeated administrative information exists in a plurality of administrative information before the determining whether the cascade relationship of the administrative information satisfies a preset administrative dependency relationship; if yes, deleting the repeated administrative information, and repeatedly executing the step of judging whether the number of the administrative information is larger than or equal to a preset number threshold; otherwise, judging whether the cascade relation of the administrative information meets the administrative subordinate relation or not.
In an alternative embodiment of the present invention, the checking module 703 is further configured to, in a case where the address to be checked includes a plurality of identical administrative address keywords, respectively correspond to the target administrative information for the plurality of identical administrative address keywords: determining a cascading relationship of the target administrative information according to other keywords adjacent to the administrative levels of the administrative address keywords in the address to be checked; and respectively determining whether the cascade relation of each piece of target administrative information meets the administrative subordinate relation.
In an alternative embodiment of the invention, the preset number threshold is 4.
In an alternative embodiment of the present invention, the checking module 703 is further configured to take the detailed address as an input of the address verification model and output a confidence level; judging whether the confidence coefficient meets a probability threshold value or not; if yes, determining the test result of the detailed address text as a true address; otherwise, determining the test result of the detailed address text as a false address.
In an alternative embodiment of the present invention, the address verification model is a fasttext model, and the penalty function in the fasttext model is a focalloss penalty function.
Fig. 8 illustrates an exemplary system architecture 800 of a method of address verification or an apparatus of address verification to which embodiments of the invention may be applied.
As shown in fig. 8, a system architecture 800 may include terminal devices 801, 802, 803, a network 804, and a server 805. The network 804 serves as a medium for providing communication links between the terminal devices 801, 802, 803 and the server 805. The network 804 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user can interact with the server 805 through the network 804 using the terminal devices 801, 802, 803 to transmit a task execution request or receive response information of the request, or the like. Various communication client applications may be installed on the terminal devices 801, 802, 803, such as an online service application, a web browser application, a search class application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 801, 802, 803 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 805 may be a server providing various services, such as a background management server providing support for address verification requests sent by users using the terminal devices 801, 802, 803. The background management server may perform processing such as inspection analysis on the received data such as the address to be inspected, and feedback a processing result (for example, an inspection result of the address to be inspected) to the terminal device.
It should be noted that the method for address verification provided in the first aspect of the embodiment of the present invention is generally performed by the server 805, and accordingly, the device for address verification provided in the second aspect of the embodiment of the present invention is generally disposed in the server 805.
It should be understood that the number of terminal devices, networks and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 9, there is illustrated a schematic diagram of a computer system 900 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 9 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU) 901, which can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the system 900 are also stored. The CPU 701, ROM 902, and RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
The following components are connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 905 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 910 so that a computer program read out therefrom is installed into the storage section 908 as needed.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909 and/or installed from the removable medium 911. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 901.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes an acquisition module, an identification module, and a verification module. The names of these modules do not in any way limit the module itself, for example, the acquisition module can also be described as "module for acquiring the address to be checked".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include:
acquiring an address to be checked; determining one or more administrative address keywords, administrative information corresponding to the one or more administrative address keywords and detailed address text from the address to be checked; respectively determining cascading relations of the administrative information; determining the test result of the detailed address text according to an address verification model; and determining the checking result of the address to be checked according to the cascading relation and/or the checking result of the detailed address text.
According to the address checking method and device, detailed address text and administrative information are split from address text to be checked, the cascade relation of the administrative information and the checking result of the detailed address text are judged, and the checking result of the address to be checked is determined according to the common checking result of the two. Therefore, the complex long address is converted into a simple detailed address to be used as a model input, the burden of model operation is reduced, and the processing speed is ensured. And the accuracy of the verification result is ensured through the judgment of the cascade relation.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method of address verification, the method comprising:
acquiring an address to be checked;
determining one or more administrative address keywords, administrative information corresponding to the one or more administrative address keywords and detailed address text from the address to be checked;
respectively determining cascading relations of the administrative information;
determining the test result of the detailed address text according to an address verification model;
and determining the checking result of the address to be checked according to the cascading relation and/or the checking result of the detailed address text.
2. Method according to claim 1, wherein said determining the verification result of the address to be verified based on the concatenation relation and/or the verification result of the detailed address text comprises:
Under the condition that the cascade relation is incorrect, determining that the checking result of the address to be checked is a false address;
and/or the number of the groups of groups,
and under the condition that the cascade relation is correct, taking the detailed address text as the input of the address verification model, and determining the verification result of the address to be verified according to the output of the address verification model.
3. The method of claim 2, wherein said separately determining the cascading relationship of each of said administrative information comprises:
determining whether the number of the administrative information is greater than or equal to a preset number threshold;
if yes, judging whether the cascade relation of the administrative information meets a preset administrative subordinate relation, and determining that the cascade relation is correct under the condition that the subordinate relation is met; determining that the cascade relationship is incorrect if the dependency relationship is not satisfied;
if not, determining that the cascade relationship is incorrect.
4. The method of claim 3, wherein the step of,
after determining that the number of the administrative information is greater than or equal to the preset number threshold, before determining whether the cascade relationship of the administrative information satisfies the preset administrative subordinate relationship, the method further includes:
Judging whether repeated administrative information exists in the administrative information or not;
if yes, deleting the repeated administrative information, and repeatedly executing the step of judging whether the number of the administrative information is larger than or equal to a preset number threshold;
otherwise, judging whether the cascade relation of the administrative information meets the administrative subordinate relation or not.
5. A method according to claim 3, wherein, in the case where a plurality of identical administrative address keywords are included in the address to be checked,
target administrative information corresponding to the same administrative address keywords respectively: determining a cascading relationship of the target administrative information according to other keywords adjacent to the administrative levels of the administrative address keywords in the address to be checked;
and respectively determining whether the cascade relation of each piece of target administrative information meets the administrative subordinate relation.
6. The method of claim 1, wherein said determining a test result of said detailed address text according to an address verification model comprises:
taking the detailed address as the input of the address verification model, and outputting the confidence;
judging whether the confidence coefficient meets a probability threshold value or not; if yes, determining the test result of the detailed address text as a true address; otherwise, determining the test result of the detailed address text as a false address.
7. The method of claim 6, wherein the address validation model is a fasttext model, and wherein the penalty function in the fasttext model is a focalloss penalty function.
8. An apparatus for address verification, comprising:
the acquisition module is used for acquiring the address to be checked;
the identification module is used for determining one or more administrative address keywords, administrative information corresponding to the one or more administrative address keywords and detailed address texts from the address to be checked;
the checking module is used for respectively determining cascading relations of the administrative information; determining the test result of the detailed address text according to an address verification model; and determining the checking result of the address to be checked according to the cascading relation and/or the checking result of the detailed address text.
9. An apparatus for address verification, comprising: one or more processors;
a storage system for storing one or more programs,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.
10. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.
CN202210535023.7A 2022-05-17 2022-05-17 Address checking method and device Pending CN117131861A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210535023.7A CN117131861A (en) 2022-05-17 2022-05-17 Address checking method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210535023.7A CN117131861A (en) 2022-05-17 2022-05-17 Address checking method and device

Publications (1)

Publication Number Publication Date
CN117131861A true CN117131861A (en) 2023-11-28

Family

ID=88849486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210535023.7A Pending CN117131861A (en) 2022-05-17 2022-05-17 Address checking method and device

Country Status (1)

Country Link
CN (1) CN117131861A (en)

Similar Documents

Publication Publication Date Title
WO2021017679A1 (en) Address information parsing method and apparatus, system and data acquisition method
CN110598157B (en) Target information identification method, device, equipment and storage medium
CN110390408B (en) Transaction object prediction method and device
CN113722493B (en) Text classification data processing method, apparatus and storage medium
US11238027B2 (en) Dynamic document reliability formulation
US11074043B2 (en) Automated script review utilizing crowdsourced inputs
CN112487188A (en) Public opinion monitoring method and device, electronic equipment and storage medium
CN117131281A (en) Public opinion event processing method, apparatus, electronic device and computer readable medium
CN114036921A (en) Policy information matching method and device
CN111738290A (en) Image detection method, model construction and training method, device, equipment and medium
CN116402166A (en) Training method and device of prediction model, electronic equipment and storage medium
CN116501846A (en) Open dialogue method, device, electronic equipment and medium
US20200159824A1 (en) Dynamic Contextual Response Formulation
US20220405487A1 (en) Causal Knowledge Identification and Extraction
CN117131861A (en) Address checking method and device
US11163953B2 (en) Natural language processing and candidate response evaluation
CN113762846B (en) Method and device for distinguishing face sheet text
CN114297235A (en) Risk address identification method and system and electronic equipment
US11586973B2 (en) Dynamic source reliability formulation
Bourne et al. What's in the laundromat? Mapping and characterising offshore owned domestic property in London
CN116069673B (en) Simulation application operation control method, device, electronic equipment and medium
CN113052509B (en) Model evaluation method, model evaluation device, electronic apparatus, and storage medium
CN117235744B (en) Source file online method, device, electronic equipment and computer readable medium
CN114492413B (en) Text proofreading method and device and electronic equipment
CN113052509A (en) Model evaluation method, model evaluation apparatus, electronic device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination