CN114372094A - Query method and device for user repeated address - Google Patents

Query method and device for user repeated address Download PDF

Info

Publication number
CN114372094A
CN114372094A CN202111586439.3A CN202111586439A CN114372094A CN 114372094 A CN114372094 A CN 114372094A CN 202111586439 A CN202111586439 A CN 202111586439A CN 114372094 A CN114372094 A CN 114372094A
Authority
CN
China
Prior art keywords
address
vector
similarity
target
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111586439.3A
Other languages
Chinese (zh)
Inventor
陈庆良
张堉灵
林翰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202111586439.3A priority Critical patent/CN114372094A/en
Publication of CN114372094A publication Critical patent/CN114372094A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a query method and a query device for a user duplicate address, which are used for improving query efficiency of the duplicate address. The method comprises the following steps: in response to a repeated address query request sent by a user, determining a first address vector corresponding to address information based on the address information corresponding to the repeated address query request; obtaining first similarity between the first address vector and the central vectors of the address classes respectively by using the first address vector and the central vectors of the address classes; determining a target address category corresponding to the first address vector according to the first similarity; obtaining at least one target address vector through second similarity between the first address vector and each second address vector corresponding to the target address category, wherein any second address vector is a vector corresponding to pre-stored address information; and determining a target address corresponding to the target address vector as a duplicate address corresponding to the address information.

Description

Query method and device for user repeated address
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to a method and an apparatus for querying a user duplicate address.
Background
Since different address names may exist in the same user address in the system, it is necessary to query such duplicate addresses with different names but the same actual address, and replace and update each duplicate address.
In the prior art, similarity is calculated by using a vector of address information input by a user and vectors of all other address information in a database, so as to determine other addresses which are overlapped with the input address information. However, this approach results in inefficient lookup of duplicate addresses.
Disclosure of Invention
The exemplary embodiment of the disclosure provides a method and a device for querying a duplicate address of a user, which are used for improving the efficiency of querying the duplicate address.
A first aspect of the present disclosure provides a method for querying a user duplicate address, where the method includes:
in response to a repeated address query request sent by a user, determining a first address vector corresponding to address information based on the address information corresponding to the repeated address query request;
obtaining first similarity between the first address vector and the central vectors of the address classes respectively by using the first address vector and the central vectors of the address classes;
determining a target address category corresponding to the first address vector according to first similarity between the first address vector and the central vectors of the address categories;
obtaining at least one target address vector through second similarity between the first address vector and each second address vector corresponding to the target address category, wherein any one second address vector is a vector corresponding to pre-stored address information;
and determining a target address corresponding to the target address vector as a duplicate address corresponding to the address information.
In this embodiment, a target address category corresponding to the first address vector is determined according to first similarities between the first address vector and the central vectors of the address categories, at least one target address vector is obtained according to second similarities between the first address vector and second address vectors corresponding to the target address category, and then a target address corresponding to the target address vector is determined as a duplicate address corresponding to the address information. Therefore, in the embodiment, the first address vector only needs to determine the second similarity between the second address vectors corresponding to the target address type, so as to determine the duplicate address, and the query efficiency of the duplicate address is improved.
In one embodiment, the obtaining at least one target address vector according to a second similarity between the first address vector and each second address vector corresponding to the target address category includes:
sequencing the second similarities to obtain the arrangement sequence of the second similarities;
determining each second target similarity meeting the specified conditions from the arrangement sequence of each second similarity;
aiming at any one second target similarity, inputting a second address vector corresponding to the second target similarity and the first address vector into a pre-trained neural network to obtain a third similarity between the second address vector and the first address vector;
and if the third similarity is larger than a specified threshold value, determining the second address vector as the target address vector.
In this embodiment, the second similarities are ranked to obtain an arrangement order of the second similarities, the second target similarities meeting specified conditions are determined from the arrangement order of the second similarities, and then, for any one second target similarity, a second address vector corresponding to the second target similarity and the first address vector are input into a pre-trained neural network to obtain a third similarity between the second address vector and the first address vector; and if the third similarity is larger than a specified threshold value, determining the second address vector as the target address vector. Therefore, in the embodiment, the target address vector is determined through two times of similarity calculation, so that the determination of the target address vector is more accurate.
In one embodiment, the determining, according to a first similarity between the first address vector and the central vectors of the address classes, a target address class corresponding to the first address vector includes:
determining a central vector corresponding to the highest similarity from the first similarities; and the number of the first and second electrodes,
and determining the address category corresponding to the central vector as the target address category.
In this embodiment, a central vector corresponding to the highest similarity is determined from the first similarities; and determining the address category corresponding to the central vector as the target address category. Therefore, the accuracy of the target address category is improved.
In one embodiment, each address class is determined by:
clustering second address vectors corresponding to pre-stored address information by using a preset clustering algorithm to obtain address sets and second address vectors respectively corresponding to the address sets;
sequentially distributing category identifications for each address set, and determining each category identification as each address category;
and determining second address vectors corresponding to a target address set as the second address vectors corresponding to the address classes aiming at any address class, wherein the target address set is the address set corresponding to the address classes.
The implementation clusters second address vectors corresponding to prestored address information through a preset clustering algorithm to obtain address sets and second address vectors corresponding to the address sets respectively, then sequentially allocates category identifications for the address sets, determines the category identifications as the address categories, and determines the second address vectors corresponding to a target address set as the second address vectors corresponding to the address categories according to any address category. Therefore, the query efficiency is further improved.
In one embodiment, the obtaining, by using the first address vector and the central vectors of the address classes, first similarities between the first address vector and the central vectors of the address classes respectively includes:
and multiplying the first address vector and the central vector of the address category aiming at the central vector of any address category to obtain a first similarity between the first address vector and the central vector of the address category.
In this embodiment, for a center vector of any address category, the first address vector is multiplied by the center vector of the address category to obtain a first similarity between the first address vector and the center vector of the address category. Therefore, the first similarity can be determined more accurately.
A second aspect of the present disclosure provides an apparatus for querying a user duplicate address, the apparatus comprising:
the first address vector determining module is used for responding to a repeated address query request sent by a user and determining a first address vector corresponding to address information based on the address information corresponding to the repeated address query request;
a first similarity determining module, configured to obtain first similarities between the first address vector and the central vectors of the address classes respectively by using the first address vector and the central vectors of the address classes;
a target address category determining module, configured to determine a target address category corresponding to the first address vector according to a first similarity between the first address vector and a central vector of each address category;
a target address vector determining module, configured to obtain at least one target address vector according to a second similarity between the first address vector and each second address vector corresponding to the target address category, where any one second address vector is a vector corresponding to pre-stored address information;
and the repeated address determining module is used for determining the target address corresponding to the target address vector as the repeated address corresponding to the address information.
In one embodiment, the target address vector determination module is specifically configured to:
sequencing the second similarities to obtain the arrangement sequence of the second similarities;
determining each second target similarity meeting the specified conditions from the arrangement sequence of each second similarity;
aiming at any one second target similarity, inputting a second address vector corresponding to the second target similarity and the first address vector into a pre-trained neural network to obtain a third similarity between the second address vector and the first address vector;
and if the third similarity is larger than a specified threshold value, determining the second address vector as the target address vector.
In one embodiment, the target address category determining module is specifically configured to:
determining a central vector corresponding to the highest similarity from the first similarities; and the number of the first and second electrodes,
and determining the address category corresponding to the central vector as the target address category.
In one embodiment, the apparatus further comprises:
an address class determination module for determining each address class by:
clustering second address vectors corresponding to pre-stored address information by using a preset clustering algorithm to obtain address sets and second address vectors respectively corresponding to the address sets;
sequentially distributing category identifications for each address set, and determining each category identification as each address category;
and determining second address vectors corresponding to a target address set as the second address vectors corresponding to the address classes aiming at any address class, wherein the target address set is the address set corresponding to the address classes.
In an embodiment, the first similarity determination module is specifically configured to:
and multiplying the first address vector and the central vector of the address category aiming at the central vector of any address category to obtain a first similarity between the first address vector and the central vector of the address category.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions for execution by the at least one processor; the instructions are executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.
According to a fourth aspect provided by an embodiment of the present disclosure, there is provided a computer storage medium storing a computer program for executing the method according to the first aspect.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a schematic diagram of a suitable scenario in accordance with an embodiment of the present disclosure;
FIG. 2 is a flowchart illustrating one embodiment of a user repeat address query method according to the present disclosure;
FIG. 3 is a schematic flow chart illustrating the determination of address classes according to one embodiment of the present disclosure;
FIG. 4 is a second flowchart illustrating a query method for a user repeat address according to an embodiment of the present disclosure;
FIG. 5 is a query device for a user repeat address according to one embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
The term "and/or" in the embodiments of the present disclosure describes an association relationship of associated objects, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
The application scenario described in the embodiment of the present disclosure is for more clearly illustrating the technical solution of the embodiment of the present disclosure, and does not form a limitation on the technical solution provided in the embodiment of the present disclosure, and as a person having ordinary skill in the art knows, with the occurrence of a new application scenario, the technical solution provided in the embodiment of the present disclosure is also applicable to similar technical problems. In the description of the present disclosure, the term "plurality" means two or more unless otherwise specified.
In the prior art, similarity is calculated by using a vector of address information input by a user and vectors of all other address information in a database, so as to determine other addresses which are overlapped with the input address information. However, this approach results in inefficient lookup of duplicate addresses.
Therefore, the present disclosure provides a query method for a user duplicated address, which determines a target address category corresponding to a first address vector through a first similarity between the first address vector and a central vector of each address category, obtains at least one target address vector through a second similarity between the first address vector and each second address vector corresponding to the target address category, and then determines a target address corresponding to the target address vector as a duplicated address corresponding to the address information. Therefore, in the embodiment, the first address vector only needs to determine the second similarity between the second address vectors corresponding to the target address type, so as to determine the duplicate address, and the query efficiency of the duplicate address is improved. The embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
As shown in fig. 1, an application scenario of the query method for the user duplicate address includes a plurality of terminal devices 110 and a server 120, and three terminal devices 110 are taken as an example in fig. 1, and the number of terminal devices 110 is not limited in practice. The terminal device 110 may be a mobile phone, a tablet computer, a personal computer, and the like. The server 120 may be implemented by a single server or may be implemented by a plurality of servers. The server 120 may be implemented by a physical server or may be implemented by a virtual server.
In one possible application scenario, the server 120, in response to a duplicate address query request sent by a user, determines, based on address information corresponding to the duplicate address query request, a first address vector corresponding to the address information; obtaining first similarity between the first address vector and the central vectors of the address classes respectively by using the first address vector and the central vectors of the address classes; then, the server 120 determines a target address category corresponding to the first address vector according to first similarities between the first address vector and the central vectors of the address categories, and obtains at least one target address vector according to second similarities between the first address vector and second address vectors corresponding to the target address category, where any one of the second address vectors is a vector corresponding to pre-stored address information; finally, the server 120 determines the target address corresponding to the target address vector as the duplicate address corresponding to the address information, and sends the duplicate address corresponding to the address information to the terminal device 110 for display.
As shown in fig. 2, a schematic flow chart of the query method for the user repeat address of the present disclosure may include the following steps:
step 201: in response to a repeated address query request sent by a user, determining a first address vector corresponding to address information based on the address information corresponding to the repeated address query request;
wherein the repeated address query request includes the address information.
In the embodiment, the ALBERT model is obtained by adding a dropout layer behind the last full-connection layer, and adding a full-connection layer behind the dropout layer, so that overfitting of the ALBERT model is prevented.
It should be noted that: the determining of the first address vector through the ALBERT model in this embodiment is only for illustration, and the model for determining the first vector may be set according to an actual situation, which is not limited herein.
Step 202: obtaining first similarity between the first address vector and the central vectors of the address classes respectively by using the first address vector and the central vectors of the address classes;
in one embodiment, a first similarity between the first vectors and the center vectors of the address classes, respectively, is determined by:
and multiplying the first address vector and the central vector of the address category aiming at the central vector of any address category to obtain a first similarity between the first address vector and the central vector of the address category. Wherein the first similarity may be determined by equation (1):
Figure BDA0003427997670000081
wherein S is1In order to be the first degree of similarity,
Figure BDA0003427997670000082
for the purpose of the first address vector, the address vector,
Figure BDA0003427997670000083
is the center vector of the address class.
Step 203: determining a target address category corresponding to the first address vector according to first similarity between the first address vector and the central vectors of the address categories;
in one embodiment, the target address class corresponding to the first address vector is determined by: determining a central vector corresponding to the highest similarity from the first similarities; and determining the address category corresponding to the central vector as the target address category.
For example, the address categories include: address class 1, address class 2, address class 3, address class 4, and address class 5. If the first similarity between the first address vector and the central vector of the address class 3 is the highest, determining the address class 3 as the target address class of the first address vector.
And obtaining a center vector of any address category based on a second address vector in the address category.
Step 204: obtaining at least one target address vector through second similarity between the first address vector and each second address vector corresponding to the target address category, wherein any one second address vector is a vector corresponding to pre-stored address information;
in one embodiment, the at least one target address vector is determined by:
sequencing the second similarities to obtain the arrangement sequence of the second similarities; determining each second target similarity meeting the specified conditions from the arrangement sequence of each second similarity; aiming at any one second target similarity, inputting a second address vector corresponding to the second target similarity and the first address vector into a pre-trained neural network to obtain a third similarity between the second address vector and the first address vector; and if the third similarity is larger than a specified threshold value, determining the second address vector as the target address vector.
If the second similarity degrees are sorted from large to small, the specified condition is the first specified number of second similarity degrees before the sort order. If the second similarities are sorted from small to large, the specified condition is the second similarity of the last first specified number in the sorting order.
For example, the second address vectors corresponding to the second target similarity are respectively: address vector 1, address vector 2, address vector 3, address vector 4, and address vector 5. If the similarity between the address vector 1 and the first address vector is 50%, the similarity between the address vector 2 and the first address vector is 45%, the similarity between the address vector 3 and the first address vector is 90%, and the similarity between the address vector 4 and the first address vector is 88% obtained through a pre-trained neural network. The similarity between address 5 and the first address vector is 40%. Taking the specified threshold of 80% as an example, the target address vectors are determined to be address vector 3 and address vector 4.
It should be noted that: the designated threshold in this embodiment may be set according to actual conditions, and the specific value of the designated threshold is not limited in this embodiment. The neural network trained in advance in this embodiment is the aforementioned ALBERT model.
Step 205: and determining a target address corresponding to the target address vector as a duplicate address corresponding to the address information.
For example, address vector 3 and address vector 4 are destination address vectors, and if the destination address corresponding to address vector 3 is address 3, the destination address corresponding to address vector 3 is address 4. It is determined that the address 3 and the address 4 are duplicate addresses corresponding to the address information.
In order to make the determination of each address category more accurate, in an embodiment, as shown in fig. 3, a schematic flow chart for determining each address category may include the following steps:
step 301: clustering second address vectors corresponding to pre-stored address information by using a preset clustering algorithm to obtain address sets and second address vectors respectively corresponding to the address sets;
the preset clustering algorithm in this embodiment may be set according to an actual situation, and this embodiment is not limited herein.
Step 302: sequentially distributing category identifications for each address set, and determining each category identification as each address category;
for example, if the address sets obtained by clustering are: after the address sets A, B, C and D are allocated with the classification identifiers, the address sets are sequentially as follows: address set 1, address set 2, address set 3, and address set 4. Then address set 1 is determined to be address class 1, address set 2 is determined to be address class 2, address set 3 is determined to be address class 3, and address set 4 is determined to be address class 4.
Step 303: and determining second address vectors corresponding to a target address set as the second address vectors corresponding to the address classes aiming at any address class, wherein the target address set is the address set corresponding to the address classes.
For example, the second address vectors corresponding to address set 1 are: vector 1 and vector 2. Determining each second address vector corresponding to the address class 1 includes: vector 1 and vector 2.
For further understanding of the technical solution of the present disclosure, the following detailed description with reference to fig. 4 may include the following steps:
step 401: in response to a repeated address query request sent by a user, determining a first address vector corresponding to address information based on the address information corresponding to the repeated address query request;
step 402: obtaining first similarity between the first address vector and the central vectors of the address classes respectively by using the first address vector and the central vectors of the address classes;
step 403: determining a central vector corresponding to the highest similarity from the first similarities; step 404: determining the address category corresponding to the central vector as the target address category;
step 405: sequencing the second similarities to obtain the arrangement sequence of the second similarities;
step 406: determining each second target similarity meeting the specified conditions from the arrangement sequence of each second similarity;
step 407: aiming at any one second target similarity, inputting a second address vector corresponding to the second target similarity and the first address vector into a pre-trained neural network to obtain a third similarity between the second address vector and the first address vector; if the third similarity is greater than a specified threshold, determining the second address vector as the target address vector;
step 408: and determining a target address corresponding to the target address vector as a duplicate address corresponding to the address information.
Based on the same disclosure concept, the query method for the user duplicate address of the present disclosure as described above can also be implemented by a query device for the user duplicate address. The effect of the query device for the user repeated address is similar to that of the method, and is not repeated herein.
Fig. 5 is a schematic structural diagram of an inquiry apparatus for a user duplicate address according to an embodiment of the present disclosure.
As shown in fig. 5, the apparatus 500 for querying a duplicate address of a user of the present disclosure may include a first address vector determination module 510, a first similarity determination module 520, a target address category determination module 530, a target address vector determination module 540, and a duplicate address determination module 550.
A first address vector determination module 510, configured to determine, in response to a duplicate address query request sent by a user, a first address vector corresponding to address information based on the address information corresponding to the duplicate address query request;
a first similarity determining module 520, configured to obtain first similarities between the first address vectors and the central vectors of the address categories, respectively, by using the first address vectors and the central vectors of the address categories;
a destination address category determining module 530, configured to determine, according to a first similarity between the first address vector and a central vector of each address category, a destination address category corresponding to the first address vector;
a target address vector determining module 540, configured to obtain at least one target address vector according to a second similarity between the first address vector and each second address vector corresponding to the target address category, where any one second address vector is a vector corresponding to pre-stored address information;
a duplicate address determination module 550, configured to determine a target address corresponding to the target address vector as a duplicate address corresponding to the address information.
In an embodiment, the target address vector determining module 540 is specifically configured to:
sequencing the second similarities to obtain the arrangement sequence of the second similarities;
determining each second target similarity meeting the specified conditions from the arrangement sequence of each second similarity;
aiming at any one second target similarity, inputting a second address vector corresponding to the second target similarity and the first address vector into a pre-trained neural network to obtain a third similarity between the second address vector and the first address vector;
and if the third similarity is larger than a specified threshold value, determining the second address vector as the target address vector.
In an embodiment, the target address category determining module 530 is specifically configured to:
determining a central vector corresponding to the highest similarity from the first similarities; and the number of the first and second electrodes,
and determining the address category corresponding to the central vector as the target address category.
In one embodiment, the apparatus further comprises:
an address class determination module 560, configured to determine each address class by:
clustering second address vectors corresponding to pre-stored address information by using a preset clustering algorithm to obtain address sets and second address vectors respectively corresponding to the address sets;
sequentially distributing category identifications for each address set, and determining each category identification as each address category;
and determining second address vectors corresponding to a target address set as the second address vectors corresponding to the address classes aiming at any address class, wherein the target address set is the address set corresponding to the address classes.
In an embodiment, the first similarity determining module 520 is specifically configured to:
and multiplying the first address vector and the central vector of the address category aiming at the central vector of any address category to obtain a first similarity between the first address vector and the central vector of the address category.
After a method and an apparatus for querying a user duplicate address according to an exemplary embodiment of the present disclosure are introduced, an electronic device according to another exemplary embodiment of the present disclosure is introduced.
As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
In some possible implementations, an electronic device in accordance with the present disclosure may include at least one processor, and at least one computer storage medium. The computer storage medium stores program code, which, when executed by a processor, causes the processor to perform the steps of the user repeat address query method described above in this specification according to various exemplary embodiments of the present disclosure. For example, the processor may perform step 201 and 205 as shown in FIG. 2.
An electronic device 600 according to this embodiment of the disclosure is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present disclosure.
As shown in fig. 6, the electronic device 600 is represented in the form of a general electronic device. The components of the electronic device 600 may include, but are not limited to: the at least one processor 601, the at least one computer storage medium 602, and the bus 603 that connects the various system components (including the computer storage medium 602 and the processor 601).
Bus 603 represents one or more of any of several types of bus structures, including a computer storage media bus or computer storage media controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.
The computer storage media 602 may include readable media in the form of volatile computer storage media, such as random access computer storage media (RAM)621 and/or cache storage media 622, and may further include read-only computer storage media (ROM) 623.
The computer storage medium 602 may also include a program/utility 625 having a set (at least one) of program modules 624, such program modules 624 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The electronic device 600 may also communicate with one or more external devices 604 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other electronic devices. Such communication may occur via input/output (I/O) interfaces 605. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 606. As shown, the network adapter 606 communicates with the other modules for the electronic device 600 over the bus 603. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
In some possible embodiments, various aspects of a user duplicate address query method provided by the present disclosure may also be implemented in the form of a program product including program code for causing a computer device to perform the steps in the user duplicate address query method according to various exemplary embodiments of the present disclosure described above in this specification when the program product is run on the computer device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable diskette, a hard disk, a random access computer storage media (RAM), a read-only computer storage media (ROM), an erasable programmable read-only computer storage media (EPROM or flash memory), an optical fiber, a portable compact disc read-only computer storage media (CD-ROM), an optical computer storage media piece, a magnetic computer storage media piece, or any suitable combination of the foregoing.
The program product of the user repeat address query of the embodiments of the present disclosure may employ a portable compact disc read-only computer storage medium (CD-ROM) and include program code, and may be run on an electronic device. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device and partly on a remote electronic device, or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic devices may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (for example, through the internet using an internet service provider).
It should be noted that although several modules of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the modules described above may be embodied in one module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module described above may be further divided into embodiments by a plurality of modules.
Further, while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk computer storage media, CD-ROMs, optical computer storage media, and the like) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the present disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable computer storage medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable computer storage medium produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications can be made in the present disclosure without departing from the spirit and scope of the disclosure. Thus, if such modifications and variations of the present disclosure fall within the scope of the claims of the present disclosure and their equivalents, the present disclosure is intended to include such modifications and variations as well.

Claims (12)

1. A query method for a user repeat address is characterized by comprising the following steps:
in response to a repeated address query request sent by a user, determining a first address vector corresponding to address information based on the address information corresponding to the repeated address query request;
obtaining first similarity between the first address vector and the central vectors of the address classes respectively by using the first address vector and the central vectors of the address classes;
determining a target address category corresponding to the first address vector according to first similarity between the first address vector and the central vectors of the address categories;
obtaining at least one target address vector through second similarity between the first address vector and each second address vector corresponding to the target address category, wherein any one second address vector is a vector corresponding to pre-stored address information;
and determining a target address corresponding to the target address vector as a duplicate address corresponding to the address information.
2. The method of claim 1, wherein obtaining at least one target address vector according to a second similarity between the first address vector and each second address vector corresponding to the target address class comprises:
sequencing the second similarities to obtain the arrangement sequence of the second similarities;
determining each second target similarity meeting the specified conditions from the arrangement sequence of each second similarity;
aiming at any one second target similarity, inputting a second address vector corresponding to the second target similarity and the first address vector into a pre-trained neural network to obtain a third similarity between the second address vector and the first address vector;
and if the third similarity is larger than a specified threshold value, determining the second address vector as the target address vector.
3. The method according to claim 1, wherein the determining a target address class corresponding to the first address vector according to the first similarity between the first address vector and the central vectors of the address classes respectively comprises:
determining a central vector corresponding to the highest similarity from the first similarities; and the number of the first and second electrodes,
and determining the address category corresponding to the central vector as the target address category.
4. The method of claim 1, wherein each address class is determined by:
clustering second address vectors corresponding to pre-stored address information by using a preset clustering algorithm to obtain address sets and second address vectors respectively corresponding to the address sets;
sequentially distributing category identifications for each address set, and determining each category identification as each address category;
and determining second address vectors corresponding to a target address set as the second address vectors corresponding to the address classes aiming at any address class, wherein the target address set is the address set corresponding to the address classes.
5. The method according to claim 1, wherein the obtaining a first similarity between the first address vector and the central vectors of the address classes by using the first address vector and the central vectors of the address classes comprises:
and multiplying the first address vector and the central vector of the address category aiming at the central vector of any address category to obtain a first similarity between the first address vector and the central vector of the address category.
6. An apparatus for querying a duplicate address of a user, the apparatus comprising:
the first address vector determining module is used for responding to a repeated address query request sent by a user and determining a first address vector corresponding to address information based on the address information corresponding to the repeated address query request;
a first similarity determining module, configured to obtain first similarities between the first address vector and the central vectors of the address classes respectively by using the first address vector and the central vectors of the address classes;
a target address category determining module, configured to determine a target address category corresponding to the first address vector according to a first similarity between the first address vector and a central vector of each address category;
a target address vector determining module, configured to obtain at least one target address vector according to a second similarity between the first address vector and each second address vector corresponding to the target address category, where any one second address vector is a vector corresponding to pre-stored address information;
and the repeated address determining module is used for determining the target address corresponding to the target address vector as the repeated address corresponding to the address information.
7. The apparatus of claim 6, wherein the target address vector determination module is specifically configured to:
sequencing the second similarities to obtain the arrangement sequence of the second similarities;
determining each second target similarity meeting the specified conditions from the arrangement sequence of each second similarity;
aiming at any one second target similarity, inputting a second address vector corresponding to the second target similarity and the first address vector into a pre-trained neural network to obtain a third similarity between the second address vector and the first address vector;
and if the third similarity is larger than a specified threshold value, determining the second address vector as the target address vector.
8. The apparatus of claim 6, wherein the target address class determination module is specifically configured to:
determining a central vector corresponding to the highest similarity from the first similarities; and the number of the first and second electrodes,
and determining the address category corresponding to the central vector as the target address category.
9. The apparatus of claim 6, further comprising:
an address class determination module for determining each address class by:
clustering second address vectors corresponding to pre-stored address information by using a preset clustering algorithm to obtain address sets and second address vectors respectively corresponding to the address sets;
sequentially distributing category identifications for each address set, and determining each category identification as each address category;
and determining second address vectors corresponding to a target address set as the second address vectors corresponding to the address classes aiming at any address class, wherein the target address set is the address set corresponding to the address classes.
10. The apparatus of claim 6, wherein the first similarity determination module is specifically configured to:
and multiplying the first address vector and the central vector of the address category aiming at the central vector of any address category to obtain a first similarity between the first address vector and the central vector of the address category.
11. An electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions for execution by the at least one processor; the instructions are executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A computer storage medium, characterized in that the computer storage medium stores a computer program for performing the method according to any one of claims 1-5.
CN202111586439.3A 2021-12-23 2021-12-23 Query method and device for user repeated address Pending CN114372094A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111586439.3A CN114372094A (en) 2021-12-23 2021-12-23 Query method and device for user repeated address

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111586439.3A CN114372094A (en) 2021-12-23 2021-12-23 Query method and device for user repeated address

Publications (1)

Publication Number Publication Date
CN114372094A true CN114372094A (en) 2022-04-19

Family

ID=81140692

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111586439.3A Pending CN114372094A (en) 2021-12-23 2021-12-23 Query method and device for user repeated address

Country Status (1)

Country Link
CN (1) CN114372094A (en)

Similar Documents

Publication Publication Date Title
US11128668B2 (en) Hybrid network infrastructure management
US9558045B2 (en) Realizing graph processing based on the MapReduce architecture
CN110502519B (en) Data aggregation method, device, equipment and storage medium
CN112507098B (en) Question processing method, question processing device, electronic equipment, storage medium and program product
CN112800197A (en) Method and device for determining target fault information
CN111210109A (en) Method and device for predicting user risk based on associated user and electronic equipment
CN112559631A (en) Data processing method and device of distributed graph database and electronic equipment
CN111581344B (en) Interface information auditing method and device, computer equipment and storage medium
CN112559578A (en) Data processing method and device, electronic equipment and storage medium
CN115146653B (en) Dialogue scenario construction method, device, equipment and storage medium
CN116866047A (en) Method, medium and device for determining malicious equipment in industrial equipment network
US10460242B1 (en) System and method for clustering interest patterns based on a plurality of priority values
US10339037B1 (en) Recommendation engine for recommending prioritized performance test workloads based on release risk profiles
CN114372094A (en) Query method and device for user repeated address
CN114357180A (en) Knowledge graph updating method and electronic equipment
CN114565105A (en) Data processing method and deep learning model training method and device
CN114329164A (en) Method, apparatus, device, medium and product for processing data
US11163953B2 (en) Natural language processing and candidate response evaluation
US8712995B2 (en) Scoring records for sorting by user-specific weights based on relative importance
CN110852080B (en) Order address identification method, system, equipment and storage medium
CN112860840A (en) Search processing method, device, equipment and storage medium
CN117370472B (en) Data processing method, device, equipment and storage medium
CN110705642B (en) Classification model, classification method, classification device, electronic equipment and storage medium
CN110719260B (en) Intelligent network security analysis method and device and computer readable storage medium
CN109977221B (en) User verification method and device based on big data, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination