WO2022152018A1 - Method and device for identifying multiple accounts belonging to the same person - Google Patents

Method and device for identifying multiple accounts belonging to the same person Download PDF

Info

Publication number
WO2022152018A1
WO2022152018A1 PCT/CN2022/070277 CN2022070277W WO2022152018A1 WO 2022152018 A1 WO2022152018 A1 WO 2022152018A1 CN 2022070277 W CN2022070277 W CN 2022070277W WO 2022152018 A1 WO2022152018 A1 WO 2022152018A1
Authority
WO
WIPO (PCT)
Prior art keywords
account
accounts
pair
information
account pair
Prior art date
Application number
PCT/CN2022/070277
Other languages
French (fr)
Chinese (zh)
Inventor
李佳璐
Original Assignee
北京沃东天骏信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2022152018A1 publication Critical patent/WO2022152018A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/908Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of the present application relate to the field of computer technologies, and in particular, to a method and an apparatus for identifying one person with multiple accounts.
  • e-commerce platforms it is very common for the same person to have multiple accounts. For example, in addition to having a personal account on the JD.com platform, a user also has a corporate account for conducting corporate business. For another example, the user has registered multiple personal accounts on different devices.
  • the embodiments of the present application propose a method and device for identifying one person with multiple accounts.
  • the embodiments of the present application provide a method for identifying one person with multiple accounts, including: acquiring user information and related identification information corresponding to an account in an account pair; determining whether the account in the account pair satisfies a predetermined Assuming association information, the preset association information is used to represent whether the accounts in the account pair have the possibility of belonging to the same person; in response to determining that the accounts in the account pair satisfy the preset association information, determine whether the accounts in the account pair satisfy the preset judgment.
  • the preset judgment condition is used to determine whether the accounts in the account pair belong to the same person; in response to determining that the account in the account pair does not meet the preset judgment condition, the user information and identification information of the account in the account pair are processed.
  • the preset judgment condition includes a first judgment condition and a second judgment condition
  • the first judgment condition is used to determine that the accounts in the account pair belong to the same person
  • the second judgment condition is used to determine that the accounts in the account pair do not belong to the same person. belong to the same person.
  • the above-mentioned processing of the user information and identification information of the accounts in the account pair to obtain the feature vector corresponding to the account pair includes: processing the corresponding item information between the accounts in the account pair to obtain each The sub-feature vector corresponding to the corresponding item information; splicing each sub-feature vector to obtain the feature vector.
  • the user information includes user portrait information, consumption habit information, and receipt information; the above-mentioned processing of the corresponding item information between the accounts in the account pair to obtain the sub-feature vector corresponding to each corresponding item information, including : According to the user information of the accounts in the account pair, the sub-feature vector representing the similarity of the corresponding item information between the accounts is determined.
  • the above-mentioned processing of the corresponding item information between the accounts in the account pair to obtain the sub-feature vector corresponding to each corresponding item information includes: for each type of identification information, performing the following operations: according to The identification information of this type of the account in the account pair determines the sub-feature vector representing the number of intersection identification information involved in the accounts in the account pair; for each account in the account pair, according to the account number and the intersection identification information.
  • the number of associations, and the number of associations with all identification information of this type determine the sub-feature vector representing the attribution degree of the intersection identification information relative to the account number; determine the sub-feature vector representing the sum of the attribution degrees of the accounts in the account pair, and a sub-feature vector representing the difference between the belonging degrees of the accounts in the account pair; for each identification information in this type, according to the number of accounts associated with the identification information, determine the sub-feature vector representing the sharing degree of the identification information.
  • the identification model is obtained by training in the following manner: obtaining user information and the involved identification information corresponding to each account in the account set; Multiple account pairs; screen out account pairs that meet the preset judgment conditions from the multiple account pairs, obtain multiple training account pairs, and set each training account pair according to the preset judgment conditions to characterize the account in the training account pair Whether it belongs to the label of the same person; process the user information and identification information of the account in each training account pair in multiple account pairs, and obtain the feature vector corresponding to the training account pair; use machine learning methods to train account pairs.
  • the corresponding feature vector is used as the input, and the input training account pair is used as the expected output to train the initial recognition model to obtain the recognition model.
  • an embodiment of the present application provides an apparatus for identifying one person with multiple accounts, including: an obtaining unit configured to obtain user information and related identification information corresponding to an account in an account pair; a unit configured to determine whether an account in an account pair satisfies preset association information, and the preset association information is used to represent whether the account in the account pair has the possibility of belonging to the same person; a second determining unit is configured to respond to determining The accounts in the account pair satisfy the preset association information, and determine whether the accounts in the account pair satisfy the preset judgment condition, wherein the preset judgment condition is used to determine whether the accounts in the account pair belong to the same person; the processing unit is configured to respond After it is determined that the account in the account pair does not meet the preset judgment condition, the user information and identification information of the account in the account pair are processed to obtain a feature vector corresponding to the account pair; the third determining unit is configured to input the feature vector into a preset The trained identification model determines whether the accounts in the account pair belong to the same person
  • the preset judgment condition includes a first judgment condition and a second judgment condition
  • the first judgment condition is used to determine that the accounts in the account pair belong to the same person
  • the second judgment condition is used to determine that the accounts in the account pair do not belong to the same person. belong to the same person.
  • the processing unit includes: a processing subunit, configured to process the corresponding item information between the accounts in the account pair to obtain a sub-feature vector corresponding to each corresponding item information; a splicing unit, configured to Concatenate each sub-eigenvector to get the eigenvector.
  • the user information includes user portrait information, consumption habit information, and receipt information; the processing subunit is further configured to: according to the user information of the accounts in the account pair, determine the information representing the corresponding items of information between the accounts. A sub-feature vector of similarity.
  • the processing subunit is further configured to: for each type of identification information, perform the following operation: according to the type of identification information of the account in the account pair, determine that the accounts in the account pair are related to The sub-feature vector of the number of intersection identification information; for each account in the account pair, according to the number of associations between the account and the intersection identification information, and the number of associations with all identification information of this type, determine the representative intersection identification information.
  • the sub-feature vector relative to the attribution degree of the account determine the sub-feature vector representing the sum of the attribution degrees of the accounts in the account pair, and the sub-feature vector representing the difference between the attribution degrees of the accounts in the account pair; For each identification information of , a sub-feature vector representing the sharing degree of the identification information is determined according to the number of accounts associated with the identification information.
  • the above-mentioned device further includes: a training unit, configured to obtain the recognition model by training in the following manner: acquiring user information corresponding to each account in the account set and the involved identification information; Set the account combination of the associated information as an account pair, and obtain multiple account pairs; screen out the account pairs that meet the preset judgment conditions from the multiple account pairs, obtain multiple training account pairs, and train each training account according to the preset judgment conditions.
  • a training unit configured to obtain the recognition model by training in the following manner: acquiring user information corresponding to each account in the account set and the involved identification information; Set the account combination of the associated information as an account pair, and obtain multiple account pairs; screen out the account pairs that meet the preset judgment conditions from the multiple account pairs, obtain multiple training account pairs, and train each training account according to the preset judgment conditions.
  • the account pair is set to indicate whether the accounts in the training account pair belong to the same person; the user information and identification information of the accounts in each training account pair in the multiple account pairs are processed to obtain the corresponding features of the training account pair Vector; using the machine learning method, the feature vector corresponding to the training account pair is used as input, and the input training account pair corresponding label is used as the expected output to train the initial recognition model to obtain the recognition model.
  • embodiments of the present application provide a computer-readable medium on which a computer program is stored, wherein the computer program implements the method described in any implementation manner of the first aspect when the computer program is executed by a processor.
  • an embodiment of the present application provides an electronic device, including: one or more processors; a storage device, on which one or more programs are stored, when the one or more programs are processed by the one or more processors Execution causes one or more processors to implement a method as described in any implementation form of the first aspect.
  • the method and device for identifying one person with multiple accounts provided by the embodiments of the present application, by acquiring the user information corresponding to the account in the account pair and the involved identification information;
  • the association information is used to represent whether the accounts in the account pair have the possibility of belonging to the same person; in response to determining that the accounts in the account pair meet the preset association information, determine whether the accounts in the account pair meet the preset judgment conditions, wherein the preset The judgment condition is used to determine whether the accounts in the account pair belong to the same person; in response to determining that the account in the account pair does not meet the preset judgment condition, the user information and identification information of the account in the account pair are processed, and the corresponding account pair is obtained.
  • the pre-trained identification model to determine whether the accounts in the account pair belong to the same person, wherein the identification model is used to characterize whether the feature vector corresponding to the account pair and the account in the account pair belong to the same person.
  • the corresponding relationship between the results provides a method for identifying one person and multiple accounts, and improves the identification accuracy.
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application may be applied;
  • FIG. 2 is a flowchart of an embodiment of a method for identifying one person with multiple accounts according to the present application
  • FIG. 3 is a schematic diagram of an application scenario of the method for identifying one person with multiple accounts according to the present embodiment
  • FIG. 4 is a flow chart of another embodiment of a method for identifying one person with multiple accounts according to the present application.
  • FIG. 5 is a structural diagram of an embodiment of an apparatus for identifying one person with multiple accounts according to the present application.
  • FIG. 6 is a schematic structural diagram of a computer system suitable for implementing the embodiments of the present application.
  • FIG. 1 shows an exemplary architecture 100 to which the method and apparatus for identifying one person with multiple accounts of the present application can be applied.
  • the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 .
  • the communication connections among the terminal devices 101 , 102 , and 103 constitute a topology network, and the network 104 is used to provide a medium for communication links between the terminal devices 101 , 102 , and 103 and the server 105 .
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • the terminal devices 101, 102, and 103 may be hardware devices or software that support network connection for data interaction and data processing.
  • the terminal devices 101, 102, and 103 are hardware, they can be various electronic devices that support network connection, information acquisition, interaction, display, processing and other functions, including but not limited to smart phones, tablet computers, e-book readers, Laptops and desktops, etc.
  • the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented, for example, as multiple software or software modules for providing distributed services, or as a single software or software module. There is no specific limitation here.
  • the server 105 may be a server that provides various services, for example, a background processing server that obtains user information corresponding to the accounts associated with the terminal devices 101, 102, and 103, the involved identification information, and determines one person and multiple accounts.
  • the background processing server will need to determine whether two accounts belonging to the same person form an account pair, and obtain whether the account in the account pair is based on the triple identification of whether the preset association information is met, whether the preset judgment condition is met, and the identification model is used for identification. belong to the same person.
  • the server 105 may be a cloud server.
  • the server may be hardware or software.
  • the server can be implemented as a distributed server cluster composed of multiple servers, or can be implemented as a single server.
  • the server is software, it can be implemented as a plurality of software or software modules (for example, software or software modules for providing distributed services), or can be implemented as a single software or software module. There is no specific limitation here.
  • the method for identifying one person with multiple accounts may be executed by a server, a terminal device, or a server and a terminal device in cooperation with each other.
  • each part for example, each unit, sub-unit, module, sub-module
  • each part can be all set in the server, or all can be set in the terminal device, and can also be set in the server respectively. and terminal equipment.
  • terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
  • the system architecture may only include the electronic device on which the method for identifying multiple accounts is executed (for example, server or terminal device).
  • a flow 200 of an embodiment of a method for identifying one person with multiple accounts is shown, including the following steps:
  • Step 201 Obtain the user information and the involved identification information corresponding to the accounts in the account pair.
  • the execution body of the method for identifying one person with multiple accounts can obtain the user information corresponding to the accounts in the account pair from a remote or locally through a wired connection or a wireless connection. , the identification information involved.
  • the account number may be various types of accounts, such as a Jingdong account, a Taobao account, a QQ account, and the like.
  • the two accounts in the account pair are accounts that need to be determined whether they belong to the same person.
  • the above-mentioned execution body may obtain the user information corresponding to the account and the involved identification information.
  • User information represents any information of the user that can be obtained based on the account.
  • the user information includes registration information filled in by the user when registering an account.
  • the identification information represents any type of hardware identification information and software identification information associated with the account.
  • the identification information includes a device ID (Identity document, identification number), an Open ID (open identification number) and a Union ID (joint identification number) of the applet, and the like.
  • the device ID includes but is not limited to IMEI (International Mobile Equipment Identity, International Mobile Equipment Identity) of Android system, AID (Andriod ID, identity identifier of Andriod system), IOS (mobile operating system developed by Apple) IDFA (Identifier for Advertising, advertising identifier), Open UDID (Unique Device Identifier, the unique device identifier of the device), etc.
  • IMEI International Mobile Equipment Identity, International Mobile Equipment Identity
  • AID Andriod ID, identity identifier of Andriod system
  • IOS mobile operating system developed by Apple
  • IDFA Identifier for Advertising, advertising identifier
  • Open UDID Unique Device Identifier, the unique device identifier of the device
  • the account is a JD account
  • the above-mentioned execution entity may collect user information and identification information corresponding to the account from various business channels of the entire JD e-commerce platform.
  • Step 202 Determine whether the accounts in the account pair satisfy preset association information.
  • the above-mentioned execution body can determine whether the account in the account pair satisfies the preset association information.
  • the preset association information is used to represent whether the accounts in the account pair have the possibility of belonging to the same person.
  • the preset association information can be specifically set according to the accounts in the account pair.
  • the preset association information may be that at least one of the identification information related to the two accounts has the same identification information. For example, two accounts have logged into the same device, or two accounts have been connected to the same wireless hotspot.
  • the association between accounts of the same person can be represented by preset association information.
  • the above-mentioned execution body may determine that the accounts in the account pair have the possibility of belonging to the same person; when the two accounts in the account pair do not satisfy the preset association information, the above-mentioned The execution subject can determine that the accounts in the account pair do not have the possibility of belonging to the same person.
  • Step 203 in response to determining that the accounts in the account pair satisfy the preset association information, determine whether the accounts in the account pair satisfy the preset determination condition.
  • the above-mentioned execution body may determine whether the account in the account pair satisfies the preset determination condition in response to determining that the account in the account pair satisfies the preset association information.
  • the preset judgment condition is used to determine whether the accounts in the account pair belong to the same person.
  • the preset judgment condition may be specifically set according to the accounts in the account pair.
  • the account is a user account of an e-commerce platform
  • the preset determination condition may be whether the registration information between the accounts is the same. Specifically, when the ID numbers in the registration information of the two accounts are different, it can be determined that the two accounts do not belong to the same person.
  • the preset determination condition includes a first determination condition and a second determination condition.
  • the first determination condition is used to determine that the accounts in the account pair belong to the same person.
  • the second judgment condition is used to determine that the accounts in the account pair do not belong to the same person.
  • the first judgment condition is that the phone numbers in the highest frequency receiving information corresponding to the account are the same, and the users corresponding to the account have the same gender and the same age.
  • the phone numbers in the highest frequency receiving information corresponding to the accounts in the account pair are the same, and the users corresponding to the accounts have the same gender and age, it is determined that the accounts in the account pair belong to the same person;
  • the gender and age of the user corresponding to the account are different, it cannot be determined that the accounts in the account pair do not belong to the same person, and there is still a possibility that they belong to the same person.
  • the second determination condition is that the accounts in the account pair are accounts corresponding to each member of the registered family accounts, or the ID numbers in the registration information of the accounts in the account pair are different.
  • the family account is a plurality of accounts associated with family relationships, and a plurality of the accounts have been authenticated as family relationships.
  • Step 204 in response to determining that the account in the account pair does not meet the preset judgment condition, process the user information and identification information of the account in the account pair to obtain a feature vector corresponding to the account pair.
  • the execution body may process the user information and identification information of the account in the account pair to obtain a feature vector corresponding to the account pair.
  • the above-mentioned execution body may digitize the user information and identification information of the accounts in the account pair based on the same standard, and use the digitized vector as the feature vector corresponding to the account pair.
  • the foregoing execution body performs the foregoing step 204 in the following manner:
  • the corresponding item information between the accounts in the account pair is processed to obtain a sub-feature vector corresponding to each corresponding item information.
  • the corresponding item information represents information corresponding to the accounts in the account pair.
  • an account pair includes an account x and an account y, and the registration information corresponding to the account x and the registration information corresponding to the account y may be regarded as corresponding items of information.
  • the above-mentioned execution body may digitize each group of corresponding item information to obtain a sub-feature vector corresponding to the corresponding item information.
  • the above-mentioned execution body may splicing each sub-feature vector based on a preset sequence to obtain a feature vector.
  • the user information corresponding to the account includes user portrait information, consumption habit information, and receipt information.
  • the execution body may determine a sub-feature vector representing the similarity of the corresponding item information between the accounts according to the user information of the accounts in the account pair.
  • the similarity of the user information corresponding to the accounts in the account pair is relatively high; when the accounts in the account pair do not belong to the same person, the user information corresponding to the accounts in the account pair is relatively high. the similarity is low.
  • the above-mentioned execution entity can obtain the proportion of the number of orders in which the receipt information overlaps between the accounts in the account pair by statistics.
  • the number of orders for account x is o x
  • the number of orders for account y is o y
  • the number of orders for account x and account y with the same recipient name is o xy .
  • obtains the ratio of the overlapping recipient names in the order of account y r y o xy / o y
  • the proportion of orders with overlapping receipt information depicts the possibility that the receipt information is the account holder's information. The higher the proportion, the more likely the receipt information is the account holder's information. The similarity between the two accounts is similar. higher degree.
  • the similarity between the accounts in the account pair is distinguished by a specific numerical value. Taking gender as an example, when the gender of account x and account y are both non-empty and the same, the similarity is determined to be 1; when the gender of account x and account y is empty, the similarity is determined to be 0; The gender of account y and account y are both non-null and different, and the similarity is determined to be -1.
  • the above-mentioned execution body may perform the following operations:
  • a sub-feature vector representing the quantity of intersection identification information involved in the accounts in the account pair is determined.
  • the identification information of this type is the IMEI information of the Android system.
  • the identification information of this type of account x in the account pair includes A, B, C, and D
  • the identification information of this type of account y in the account pair includes B, C, D, and E, then the account x,
  • the intersection identification information of y is B, C, and D, and the number is 3.
  • the above-mentioned execution body can determine the sub-feature vector representing the number of intersection identification information involved in the accounts in the account pair
  • each time an account logs in to the hardware device corresponding to the hardware identification information or the software corresponding to the software identification information it can be determined that the account is associated with the hardware device corresponding to the hardware identification information or the software corresponding to the software identification information.
  • the IMEI set of the device logged in by account x is:
  • the IMEI set of the devices logged in by account y is:
  • the set of IMEIs of the same devices logged in by account x and account y is: where m and n are both positive integers. Note that the total number of times account x has logged in on the IMEI is but Among them, x i is the number of login times of account x on the device corresponding to imei i .
  • y i is the number of login times of account y on the device corresponding to imei i .
  • the attribution of IMEI to account x is defined as Intuitively, for accounts x and y in an account pair, the degree of attribution of IMEI to account x is the ratio of the number of logins of account x on the same IMEI of the two accounts to the number of logins of account x on all IMEIs; IMEI The degree of attribution to account y is the ratio of the number of logins of account y on the same IMEI of the two accounts to the number of logins of account y on all IMEIs.
  • a small ID attribution usually means that the account may only log in to the device corresponding to the ID once occasionally, and there is no strong belonging relationship.
  • a sub-feature vector representing the sum of the attribution degrees of the accounts in the account pair, and a sub-feature vector representing the difference between the attribution degrees of the accounts in the account pair are determined.
  • the sum of the attribution degrees of the accounts in the account pair is and The difference between the attribution degrees of the accounts in the account pair is and poor.
  • a sub-feature vector representing the sharing degree of the identification information is determined.
  • the degree of sharing of the identification information is: the number of accounts that have logged in the identification information. It can be understood that the higher the sharing degree of the identification information, the more likely the device corresponding to the identification information is a public device.
  • Step 205 Input the feature vector into the pre-trained recognition model to determine whether the accounts in the account pair belong to the same person.
  • the above-mentioned execution body may input the feature vector into the pre-trained recognition model to determine whether the accounts in the account pair belong to the same person.
  • the identification model is used to represent the correspondence between the feature vector corresponding to the account pair and the determination result of whether the accounts in the account pair belong to the same person.
  • the recognition model can be any network model with recognition function, including but not limited to a convolutional neural network model, a recurrent neural network model, and a residual neural network model.
  • FIG. 3 is a schematic diagram 300 of an application scenario of the method for identifying one person with multiple accounts according to this embodiment.
  • the account pair includes an e-commerce account x and an e-commerce account y.
  • the server 301 first obtains from the database server 302 the user information and the involved identification information respectively corresponding to the account x and the account y in the account pair. Then, the server determines whether there is at least one identical identification information (that is, preset association information) in the identification information related to the account x and the account y in the account pair.
  • the preset association information is used to represent whether the account x and the account y in the account pair have the possibility of belonging to the same person.
  • the server 301 determines whether the account x and the account y in the account pair satisfy the requirements in the recipient information
  • the phone number of the account is the same, the gender and age of the user corresponding to the account are the same, and the ID number in the registration information of the account in the account pair is different (that is, the preset judgment condition).
  • the preset judgment condition is used to determine whether the accounts in the account pair belong to the same person.
  • the user information and identification information of the account in the account pair are processed to obtain a feature vector corresponding to the account pair.
  • the server 301 inputs the feature vector into the pre-trained recognition model, and determines that the account x and the account y in the account pair do not belong to the same person.
  • the identification model is used to represent the correspondence between the feature vector corresponding to the account pair and the determination result of whether the accounts in the account pair belong to the same person.
  • the preset association information is used to represent the account.
  • the accounts in the pair have the possibility of belonging to the same person; in response to determining that the accounts in the pair of accounts satisfy the preset association information, determine whether the accounts in the pair of accounts satisfy the preset judgment condition, wherein the preset judgment condition is used to determine the account number Whether the accounts in the pair belong to the same person; in response to determining that the accounts in the pair of accounts do not meet the preset judgment conditions, the user information and identification information of the accounts in the pair of accounts are processed to obtain the feature vector corresponding to the pair of accounts; the feature vector Input the pre-trained identification model to determine whether the accounts in the account pair belong to the same person, wherein the identification model is used to represent the corresponding relationship between the feature vector corresponding to the account pair and the judgment result of whether the accounts in the account pair belong to the same person,
  • the identification accuracy is improved.
  • the recognition model is obtained by training in the following manner:
  • the account set includes a large number of accounts.
  • the manner of acquiring the user information corresponding to the account and the involved identification information can be performed with reference to the manner of step 201, and details are not described herein again.
  • the accounts in the account set that satisfy the preset association information are combined into account pairs to obtain multiple account pairs.
  • the preset judgment condition can determine the account pair that belongs to the same person and the account pair that does not belong to the same person.
  • the above-mentioned execution entity may set a label that the accounts in the account pair belong to the same person, and use such account pairs as positive samples.
  • the above-mentioned execution entity may set a label that the accounts in the account pair do not belong to the same person, and use such account pairs as negative samples.
  • the user information and identification information of the accounts in each training account pair in the multiple account pairs are processed to obtain a feature vector corresponding to the training account pair.
  • the feature vector corresponding to the training account pair is used as the input, and the label corresponding to the input training account pair is used as the expected output to train the initial recognition model to obtain the recognition model.
  • the above-mentioned execution subject can cyclically train the initial model through a large number of training account pairs, and complete the training of the initial model in response to reaching the preset end condition.
  • the preset end condition may be, for example, that the loss function of the model converges and the number of training times reaches a certain number of times.
  • the above-mentioned execution body may also perform information processing based on user granularity for multiple accounts belonging to the same person.
  • the user granularity representation is in units of users, not accounts.
  • the information processing may be, for example, information push.
  • the information is pushed with the user to which account x and account y belong as the object of information push.
  • information push may be performed for an account of the user. After one account of the user receives the push information, information is not pushed to another account of the user. It can be understood that information processing based on user granularity can improve user experience.
  • FIG. 4 a schematic flow 400 of an embodiment of the method for identifying one person with multiple accounts according to the present application is shown, including the following steps:
  • Step 401 Obtain the user information and the involved identification information corresponding to the accounts in the account pair.
  • Step 402 Determine whether the accounts in the account pair satisfy the preset association information.
  • the preset association information is used to represent whether the accounts in the account pair have the possibility of belonging to the same person.
  • Step 403 in response to determining that the account in the account pair satisfies the preset association information, determine whether the account in the account pair satisfies the first determination condition.
  • the first determination condition is used to determine that the accounts in the account pair belong to the same person. When the accounts in the account pair meet the first determination condition, it is proved that the accounts in the account pair belong to the same person. It should be noted that when the accounts in the account pair do not meet the first determination condition, it cannot be proved that the accounts in the account pair do not belong to the same person.
  • Step 404 in response to determining that the account in the account pair does not satisfy the first determination condition, determine whether the account in the account pair meets the second determination condition.
  • the second determination condition is used to determine that the accounts in the account pair do not belong to the same person. When the accounts in the account pair meet the second determination condition, it proves that the accounts in the account pair do not belong to the same person. It should be noted that when the accounts in the account pair do not satisfy the second determination condition, it cannot be proved that the accounts in the account pair belong to the same person.
  • Step 405 in response to determining that the account in the account pair does not meet the first judgment condition and the second judgment condition, perform the following operations:
  • Step 4051 according to the user information of the accounts in the account pair, determine a sub-feature vector representing the similarity of the corresponding item information between the accounts.
  • Step 4052 for each type of identification information, perform the following operations:
  • Step 40521 Determine, according to the type of identification information of the accounts in the account pair, a sub-feature vector representing the quantity of intersection identification information involved in all the accounts in the account pair.
  • Step 40522 for each account in the account pair, according to the number of associations between the account and the intersection identification information, and the association times with all identification information of this type, determine the degree of belonging that characterizes the intersection identification information relative to the account. sub-feature vector.
  • Step 40523 Determine a sub-feature vector representing the sum of the attribution degrees of the accounts in the account pair, and a sub-feature vector representing the difference between the attribution degrees of the accounts in the account pair.
  • Step 40524 For each identification information in the type, according to the number of accounts associated with the identification information, determine a sub-feature vector representing the sharing degree of the identification information.
  • Step 406 splicing each sub-feature vector to obtain a feature vector.
  • Step 407 Input the feature vector into the pre-trained recognition model to determine whether the accounts in the account pair belong to the same person.
  • the identification model is used to represent the correspondence between the feature vector corresponding to the account pair and the determination result of whether the accounts in the account pair belong to the same person.
  • the process 400 of the method for identifying one person with multiple accounts in this embodiment specifically describes the processing process of the feature vector and whether the account of the account pair belongs to The determination process of the same person further improves the recognition accuracy.
  • the present disclosure provides an embodiment of an apparatus for identifying one person with multiple accounts, and the apparatus embodiment corresponds to the method embodiment shown in FIG. 2 .
  • the device can be specifically applied to various electronic devices.
  • the device for identifying one person with multiple accounts includes: an obtaining unit 501, configured to obtain user information corresponding to an account in the account pair and the involved identification information; a first determining unit 502, configured by is configured to determine whether the account in the account pair satisfies the preset association information, and the preset association information is used to represent whether the account in the account pair has the possibility of belonging to the same person; the second determining unit 503 is configured to respond to determining the account pair.
  • the accounts in the pair of accounts satisfy the preset association information, and it is determined whether the accounts in the account pair meet the preset judgment conditions, wherein the preset judgment conditions are used to determine whether the accounts in the account pair belong to the same person;
  • the processing unit 504 is configured to respond to It is determined that the account in the account pair does not meet the preset judgment condition, and the user information and identification information of the account in the account pair are processed to obtain a feature vector corresponding to the account pair;
  • the third determining unit 505 is configured to input the feature vector in advance.
  • the trained identification model determines whether the accounts in the account pair belong to the same person, wherein the identification model is used to represent the correspondence between the feature vector corresponding to the account pair and the judgment result of whether the accounts in the account pair belong to the same person.
  • the preset judgment condition includes a first judgment condition and a second judgment condition
  • the first judgment condition is used to determine that the accounts in the account pair belong to the same person
  • the second judgment condition is used to determine that the accounts in the account pair do not belong to the same person. belong to the same person.
  • the processing unit 504 includes: a processing subunit (not shown in the figure), configured to process the corresponding item information between the accounts in the account pair to obtain the subfeature corresponding to each corresponding item information vector; a splicing unit (not shown in the figure), configured to splicing each sub-feature vector to obtain a feature vector.
  • the user information includes user portrait information, consumption habit information and receipt information; the processing subunit (not shown in the figure) is further configured to: determine the representative account according to the user information of the account in the account pair A sub-feature vector of the similarity between the corresponding item information.
  • the processing subunit (not shown in the figure) is further configured to: for each type of identification information, perform the following operations: determine the representation according to the type of identification information of the account in the account pair The sub-feature vector of the number of intersection identification information involved in the accounts in the account pair; for each account in the account pair, according to the number of associations between the account and the intersection identification information, and the association with all identification information of this type determine the sub-feature vector representing the attribution degree of the intersection identification information relative to the account; determine the sub-feature vector representing the sum of the attribution degrees of the accounts in the account pair, and the sub-feature vector representing the difference between the attribution degrees of the accounts in the account pair Feature vector; for each identification information in this type, according to the number of accounts associated with the identification information, determine the sub-feature vector representing the sharing degree of the identification information.
  • the above-mentioned apparatus further includes: a training unit (not shown in the figure), configured to obtain an identification model by training in the following manner: acquiring the user information corresponding to each account in the account set, the involved identification information ; Combining the accounts meeting the preset association information in the account set into account pairs to obtain multiple account pairs; screening out account pairs that meet the preset judgment conditions from the multiple account pairs to obtain multiple training account pairs, and according to the preset Set the judgment condition to set a label for each training account pair to indicate whether the accounts in the training account pair belong to the same person; process the user information and identification information of the account in each training account pair in the multiple account pairs, and obtain The feature vector corresponding to the training account pair; using the machine learning method, the feature vector corresponding to the training account pair is used as input, and the label corresponding to the input training account pair is used as the expected output to train the initial recognition model to obtain the recognition model.
  • a training unit (not shown in the figure), configured to obtain an identification model by training in the following manner: acquiring the user information
  • the obtaining unit in the device for identifying one person with multiple accounts obtains the user information corresponding to the account in the account pair and the involved identification information; the first determining unit determines whether the account in the account pair satisfies the preset association information, the preset association information is used to represent whether the accounts in the account pair have the possibility of belonging to the same person; the second determination unit determines whether the accounts in the account pair satisfy the preset association information in response to determining whether the accounts in the account pair meet the preset association information.
  • the preset judgment condition is used to determine whether the accounts in the account pair belong to the same person;
  • the processing unit responds to determining that the account in the account pair does not meet the preset judgment condition, the user information of the account in the account pair, user information,
  • the identification information is processed to obtain the feature vector corresponding to the account pair;
  • the third determining unit inputs the feature vector into the pre-trained identification model to determine whether the accounts in the account pair belong to the same person, wherein the identification model is used to represent the corresponding features of the account pair
  • the corresponding relationship between the vector and the determination result of whether the accounts in the account pair belong to the same person provides a device for identifying one person and multiple accounts, and improves the identification accuracy.
  • FIG. 6 it shows a schematic structural diagram of a computer system 600 suitable for implementing the devices of the embodiments of the present application (eg, devices 101 , 102 , 103 , and 105 shown in FIG. 1 ).
  • the device shown in FIG. 6 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.
  • a computer system 600 includes a processor (eg, CPU, central processing unit) 601 that can be loaded into a random access memory (RAM) according to a program stored in a read only memory (ROM) 602 or from a storage section 608
  • the program in 603 executes various appropriate actions and processes.
  • various programs and data necessary for the operation of the system 600 are also stored.
  • the processor 601 , the ROM 602 and the RAM 603 are connected to each other through a bus 604 .
  • An input/output (I/O) interface 605 is also connected to bus 604 .
  • the following components are connected to the I/O interface 605: an input section 606 including a keyboard, a mouse, etc.; an output section 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 608 including a hard disk, etc. ; and a communication section 609 including a network interface card such as a LAN card, a modem, and the like. The communication section 609 performs communication processing via a network such as the Internet.
  • a drive 610 is also connected to the I/O interface 605 as needed.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 610 as needed so that a computer program read therefrom is installed into the storage section 608 as needed.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication portion 609 and/or installed from the removable medium 611 .
  • the computer program is executed by the processor 601, the above-mentioned functions defined in the method of the present application are performed.
  • the computer-readable medium of the present application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for performing the operations of the present application may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional procedures, or a combination thereof programming language - such as "C" or a similar programming language.
  • the program code may execute entirely on the client computer, partly on the client computer, as a stand-alone software package, partly on the client computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the client computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider via Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider via Internet connection
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present application may be implemented in a software manner, and may also be implemented in a hardware manner.
  • the described unit may also be provided in the processor, for example, it may be described as: a processor including an acquisition unit, a first determination unit, a second determination unit, a processing unit and a third determination unit.
  • the names of these units do not constitute a limitation of the unit itself under certain circumstances.
  • the third determination unit can also be described as "input the feature vector into the pre-trained recognition model, and determine whether the account in the account pair is not. Units belonging to the same person".
  • the present application also provides a computer-readable medium.
  • the computer-readable medium may be included in the device described in the above embodiments, or may exist alone without being assembled into the device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the device, the computer equipment: obtains the user information corresponding to the account in the account pair and the involved identification information; determines Whether the accounts in the account pair satisfy the preset association information, the preset association information is used to represent whether the accounts in the account pair have the possibility of belonging to the same person; in response to determining that the accounts in the account pair satisfy the preset association information, determine the account pair Whether the accounts in the account pair meet the preset judgment conditions, wherein the preset judgment conditions are used to determine whether the accounts in the account pair belong to the same person; in response to determining that the accounts in the account pair do not meet the preset judgment conditions, the account in the account pair is determined.
  • the user information and identification information of the account pair are processed to obtain the feature vector corresponding to the account pair; the feature vector is input into the pre-trained identification model to determine whether the accounts in the account pair belong to the same person, wherein the identification model is used to represent the corresponding features of the account pair.

Abstract

The present application discloses a method and device for identifying multiple accounts belonging to the same person. One specific embodiment of said method comprises: acquiring user information corresponding to and identification information involved in accounts in an account pair; determining whether the accounts in the account pair satisfy preset association information, wherein the preset association information is used for representing whether the accounts in the account pair have the possibility of belonging to the same person; in response to determining that the accounts in the account pair satisfy the preset association information, determining whether the accounts in the account pair satisfy a preset determination condition, wherein the preset determination condition is used for determining whether the accounts in the account pair belongs to the same person; in response to determining that the accounts in the account pair do not satisfy the preset determination condition, processing the user information and the identification information of the accounts in the account pair to obtain feature vectors corresponding to the account pair; and inputting the feature vectors into an identification model to determine whether the accounts in the account pair belong to the same person. The present application provides a method for identifying multiple accounts belonging to the same person, which improves the identification accuracy.

Description

用于识别一人多账号的方法及装置Method and device for identifying one person with multiple accounts
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本专利申请要求于2021年01月14日提交的、申请号为202110049322.5、发明名称为“用于识别一人多账号的方法及装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。This patent application claims the priority of the Chinese patent application filed on January 14, 2021 with the application number of 202110049322.5 and the invention titled "Method and Device for Identifying One Person and Multiple Accounts", the full text of which is by reference incorporated into this application.
技术领域technical field
本申请实施例涉及计算机技术领域,具体涉及一种用于识别一人多账号的方法及装置。The embodiments of the present application relate to the field of computer technologies, and in particular, to a method and an apparatus for identifying one person with multiple accounts.
背景技术Background technique
在电商平台中,同一人拥有多个账号的情况十分常见。例如,用户在京东平台除了拥有个人账号外,还拥有一个用于开展企业业务的企业账号。又例如,用户在不同的设备上注册了多个个人账号。In e-commerce platforms, it is very common for the same person to have multiple accounts. For example, in addition to having a personal account on the JD.com platform, a user also has a corporate account for conducting corporate business. For another example, the user has registered multiple personal accounts on different devices.
发明内容SUMMARY OF THE INVENTION
本申请实施例提出了一种用于识别一人多账号的方法及装置。The embodiments of the present application propose a method and device for identifying one person with multiple accounts.
第一方面,本申请实施例提供了一种用于识别一人多账号的方法,包括:获取账号对中的账号所对应的用户信息、所涉及的标识信息;确定账号对中的账号是否满足预设关联信息,预设关联信息用于表征账号对中的账号是否具有属于同一人的可能性;响应于确定账号对中的账号满足预设关联信息,确定账号对中的账号是否满足预设判定条件,其中,预设判定条件用于确定账号对中的账号是否属于同一人;响应于确定账号对中的账号不满足预设判定条件,对账号对中的账号的用户信息、标识信息进行处理,得到账号对对应的特征向量;将特征向量输入预先训练的识别模型,确定账号对中的账号是否属于同一人,其中,识别模型用于表征账号对对应的特征向量与账号对中的账号是否属于同一人的判定结果之间的对应关系。In a first aspect, the embodiments of the present application provide a method for identifying one person with multiple accounts, including: acquiring user information and related identification information corresponding to an account in an account pair; determining whether the account in the account pair satisfies a predetermined Assuming association information, the preset association information is used to represent whether the accounts in the account pair have the possibility of belonging to the same person; in response to determining that the accounts in the account pair satisfy the preset association information, determine whether the accounts in the account pair satisfy the preset judgment. The preset judgment condition is used to determine whether the accounts in the account pair belong to the same person; in response to determining that the account in the account pair does not meet the preset judgment condition, the user information and identification information of the account in the account pair are processed. , obtain the feature vector corresponding to the account pair; input the feature vector into the pre-trained identification model to determine whether the accounts in the account pair belong to the same person, wherein the identification model is used to characterize whether the feature vector corresponding to the account pair and the account in the account pair are not Correspondence between judgment results belonging to the same person.
在一些实施例中,预设判定条件包括第一判定条件和第二判定条 件,第一判定条件用于确定账号对中的账号属于同一人,第二判定条件用于确定账号对中的账号不属于同一人。In some embodiments, the preset judgment condition includes a first judgment condition and a second judgment condition, the first judgment condition is used to determine that the accounts in the account pair belong to the same person, and the second judgment condition is used to determine that the accounts in the account pair do not belong to the same person. belong to the same person.
在一些实施例中,上述对账号对中的账号的用户信息、标识信息进行处理,得到账号对对应的特征向量,包括:对账号对中的账号之间的相应项信息进行处理,得到每个相应项信息对应的子特征向量;拼接每个子特征向量,得到特征向量。In some embodiments, the above-mentioned processing of the user information and identification information of the accounts in the account pair to obtain the feature vector corresponding to the account pair includes: processing the corresponding item information between the accounts in the account pair to obtain each The sub-feature vector corresponding to the corresponding item information; splicing each sub-feature vector to obtain the feature vector.
在一些实施例中,用户信息包括用户画像信息、消费习惯信息和收件信息;上述对账号对中的账号之间的相应项信息进行处理,得到每个相应项信息对应的子特征向量,包括:根据账号对中的账号的用户信息,确定表征账号之间的相应项信息的相似度的子特征向量。In some embodiments, the user information includes user portrait information, consumption habit information, and receipt information; the above-mentioned processing of the corresponding item information between the accounts in the account pair to obtain the sub-feature vector corresponding to each corresponding item information, including : According to the user information of the accounts in the account pair, the sub-feature vector representing the similarity of the corresponding item information between the accounts is determined.
在一些实施例中,上述对账号对中的账号之间的相应项信息进行处理,得到每个相应项信息对应的子特征向量,包括:针对于每一类型的标识信息,执行如下操作:根据账号对中的账号的该类型的标识信息,确定表征账号对中的账号均涉及的交集标识信息的数量的子特征向量;针对于账号对中的每个账号,根据该账号与交集标识信息的关联次数,以及与该类型的所有的标识信息的关联次数,确定表征交集标识信息相对于该账号的归属度的子特征向量;确定表征账号对中的账号的归属度之和的子特征向量,以及表征账号对中的账号的归属度之差的子特征向量;对于该类型中的每个标识信息,根据该标识信息关联过的账号的数量,确定表征该标识信息共享度的子特征向量。In some embodiments, the above-mentioned processing of the corresponding item information between the accounts in the account pair to obtain the sub-feature vector corresponding to each corresponding item information includes: for each type of identification information, performing the following operations: according to The identification information of this type of the account in the account pair determines the sub-feature vector representing the number of intersection identification information involved in the accounts in the account pair; for each account in the account pair, according to the account number and the intersection identification information. The number of associations, and the number of associations with all identification information of this type, determine the sub-feature vector representing the attribution degree of the intersection identification information relative to the account number; determine the sub-feature vector representing the sum of the attribution degrees of the accounts in the account pair, and a sub-feature vector representing the difference between the belonging degrees of the accounts in the account pair; for each identification information in this type, according to the number of accounts associated with the identification information, determine the sub-feature vector representing the sharing degree of the identification information.
在一些实施例中,识别模型通过如下方式训练得到:获取账号集合中每个账号所对应的用户信息、所涉及的标识信息;将账号集合中满足预设关联信息的账号组合为账号对,得到多个账号对;从多个账号对中筛选出满足预设判定条件的账号对,得到多个训练账号对,并根据预设判定条件对每个训练账号对设置表征该训练账号对中的账号是否属于同一人的标签;对多个账号对中的每个训练账号对中的账号的用户信息、标识信息进行处理,得到该训练账号对对应的特征向量;利用机器学习方法,以训练账号对对应的特征向量作为输入,以所输入的训练账号对对应的标签作为期望输出,训练初始识别模型,得到识别模型。In some embodiments, the identification model is obtained by training in the following manner: obtaining user information and the involved identification information corresponding to each account in the account set; Multiple account pairs; screen out account pairs that meet the preset judgment conditions from the multiple account pairs, obtain multiple training account pairs, and set each training account pair according to the preset judgment conditions to characterize the account in the training account pair Whether it belongs to the label of the same person; process the user information and identification information of the account in each training account pair in multiple account pairs, and obtain the feature vector corresponding to the training account pair; use machine learning methods to train account pairs. The corresponding feature vector is used as the input, and the input training account pair is used as the expected output to train the initial recognition model to obtain the recognition model.
第二方面,本申请实施例提供了一种用于识别一人多账号的装置,包括:获取单元,被配置成获取账号对中的账号所对应的用户信息、所涉及的标识信息;第一确定单元,被配置成确定账号对中的账号是否满足预设关联信息,预设关联信息用于表征账号对中的账号是否具有属于同一人的可能性;第二确定单元,被配置成响应于确定账号对中的账号满足预设关联信息,确定账号对中的账号是否满足预设判定条件,其中,预设判定条件用于确定账号对中的账号是否属于同一人;处理单元,被配置成响应于确定账号对中的账号不满足预设判定条件,对账号对中的账号的用户信息、标识信息进行处理,得到账号对对应的特征向量;第三确定单元,被配置成将特征向量输入预先训练的识别模型,确定账号对中的账号是否属于同一人,其中,识别模型用于表征账号对对应的特征向量与账号对中的账号是否属于同一人的判定结果之间的对应关系。In a second aspect, an embodiment of the present application provides an apparatus for identifying one person with multiple accounts, including: an obtaining unit configured to obtain user information and related identification information corresponding to an account in an account pair; a unit configured to determine whether an account in an account pair satisfies preset association information, and the preset association information is used to represent whether the account in the account pair has the possibility of belonging to the same person; a second determining unit is configured to respond to determining The accounts in the account pair satisfy the preset association information, and determine whether the accounts in the account pair satisfy the preset judgment condition, wherein the preset judgment condition is used to determine whether the accounts in the account pair belong to the same person; the processing unit is configured to respond After it is determined that the account in the account pair does not meet the preset judgment condition, the user information and identification information of the account in the account pair are processed to obtain a feature vector corresponding to the account pair; the third determining unit is configured to input the feature vector into a preset The trained identification model determines whether the accounts in the account pair belong to the same person, wherein the identification model is used to represent the correspondence between the feature vector corresponding to the account pair and the judgment result of whether the accounts in the account pair belong to the same person.
在一些实施例中,预设判定条件包括第一判定条件和第二判定条件,第一判定条件用于确定账号对中的账号属于同一人,第二判定条件用于确定账号对中的账号不属于同一人。In some embodiments, the preset judgment condition includes a first judgment condition and a second judgment condition, the first judgment condition is used to determine that the accounts in the account pair belong to the same person, and the second judgment condition is used to determine that the accounts in the account pair do not belong to the same person. belong to the same person.
在一些实施例中,处理单元包括:处理子单元,被配置成对账号对中的账号之间的相应项信息进行处理,得到每个相应项信息对应的子特征向量;拼接单元,被配置成拼接每个子特征向量,得到特征向量。In some embodiments, the processing unit includes: a processing subunit, configured to process the corresponding item information between the accounts in the account pair to obtain a sub-feature vector corresponding to each corresponding item information; a splicing unit, configured to Concatenate each sub-eigenvector to get the eigenvector.
在一些实施例中,用户信息包括用户画像信息、消费习惯信息和收件信息;处理子单元,进一步被配置成:根据账号对中的账号的用户信息,确定表征账号之间的相应项信息的相似度的子特征向量。In some embodiments, the user information includes user portrait information, consumption habit information, and receipt information; the processing subunit is further configured to: according to the user information of the accounts in the account pair, determine the information representing the corresponding items of information between the accounts. A sub-feature vector of similarity.
在一些实施例中,处理子单元,进一步被配置成:针对于每一类型的标识信息,执行如下操作:根据账号对中的账号的该类型的标识信息,确定表征账号对中的账号均涉及的交集标识信息的数量的子特征向量;针对于账号对中的每个账号,根据该账号与交集标识信息的关联次数,以及与该类型的所有的标识信息的关联次数,确定表征交集标识信息相对于该账号的归属度的子特征向量;确定表征账号对中的账号的归属度之和的子特征向量,以及表征账号对中的账号的归属 度之差的子特征向量;对于该类型中的每个标识信息,根据该标识信息关联过的账号的数量,确定表征该标识信息共享度的子特征向量。In some embodiments, the processing subunit is further configured to: for each type of identification information, perform the following operation: according to the type of identification information of the account in the account pair, determine that the accounts in the account pair are related to The sub-feature vector of the number of intersection identification information; for each account in the account pair, according to the number of associations between the account and the intersection identification information, and the number of associations with all identification information of this type, determine the representative intersection identification information. The sub-feature vector relative to the attribution degree of the account; determine the sub-feature vector representing the sum of the attribution degrees of the accounts in the account pair, and the sub-feature vector representing the difference between the attribution degrees of the accounts in the account pair; For each identification information of , a sub-feature vector representing the sharing degree of the identification information is determined according to the number of accounts associated with the identification information.
在一些实施例中,上述装置还包括:训练单元,被配置成通过如下方式训练得到识别模型:获取账号集合中每个账号所对应的用户信息、所涉及的标识信息;将账号集合中满足预设关联信息的账号组合为账号对,得到多个账号对;从多个账号对中筛选出满足预设判定条件的账号对,得到多个训练账号对,并根据预设判定条件对每个训练账号对设置表征该训练账号对中的账号是否属于同一人的标签;对多个账号对中的每个训练账号对中的账号的用户信息、标识信息进行处理,得到该训练账号对对应的特征向量;利用机器学习方法,以训练账号对对应的特征向量作为输入,以所输入的训练账号对对应的标签作为期望输出,训练初始识别模型,得到识别模型。In some embodiments, the above-mentioned device further includes: a training unit, configured to obtain the recognition model by training in the following manner: acquiring user information corresponding to each account in the account set and the involved identification information; Set the account combination of the associated information as an account pair, and obtain multiple account pairs; screen out the account pairs that meet the preset judgment conditions from the multiple account pairs, obtain multiple training account pairs, and train each training account according to the preset judgment conditions. The account pair is set to indicate whether the accounts in the training account pair belong to the same person; the user information and identification information of the accounts in each training account pair in the multiple account pairs are processed to obtain the corresponding features of the training account pair Vector; using the machine learning method, the feature vector corresponding to the training account pair is used as input, and the input training account pair corresponding label is used as the expected output to train the initial recognition model to obtain the recognition model.
第三方面,本申请实施例提供了一种计算机可读介质,其上存储有计算机程序,其中,计算机程序被处理器执行时实现如第一方面任一实现方式描述的方法。In a third aspect, embodiments of the present application provide a computer-readable medium on which a computer program is stored, wherein the computer program implements the method described in any implementation manner of the first aspect when the computer program is executed by a processor.
第四方面,本申请实施例提供了一种电子设备,包括:一个或多个处理器;存储装置,其上存储有一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如第一方面任一实现方式描述的方法。In a fourth aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device, on which one or more programs are stored, when the one or more programs are processed by the one or more processors Execution causes one or more processors to implement a method as described in any implementation form of the first aspect.
本申请实施例提供的用于识别一人多账号的方法及装置,通过获取账号对中的账号所对应的用户信息、所涉及的标识信息;确定账号对中的账号是否满足预设关联信息,预设关联信息用于表征账号对中的账号是否具有属于同一人的可能性;响应于确定账号对中的账号满足预设关联信息,确定账号对中的账号是否满足预设判定条件,其中,预设判定条件用于确定账号对中的账号是否属于同一人;响应于确定账号对中的账号不满足预设判定条件,对账号对中的账号的用户信息、标识信息进行处理,得到账号对对应的特征向量;将特征向量输入预先训练的识别模型,确定账号对中的账号是否属于同一人,其中,识别模型用于表征账号对对应的特征向量与账号对中的账号是否属于同一人的判定结果之间的对应关系,从而提供了一种识别一人多账号的 方法,提高了识别准确率。The method and device for identifying one person with multiple accounts provided by the embodiments of the present application, by acquiring the user information corresponding to the account in the account pair and the involved identification information; The association information is used to represent whether the accounts in the account pair have the possibility of belonging to the same person; in response to determining that the accounts in the account pair meet the preset association information, determine whether the accounts in the account pair meet the preset judgment conditions, wherein the preset The judgment condition is used to determine whether the accounts in the account pair belong to the same person; in response to determining that the account in the account pair does not meet the preset judgment condition, the user information and identification information of the account in the account pair are processed, and the corresponding account pair is obtained. input the feature vector into the pre-trained identification model to determine whether the accounts in the account pair belong to the same person, wherein the identification model is used to characterize whether the feature vector corresponding to the account pair and the account in the account pair belong to the same person. The corresponding relationship between the results provides a method for identifying one person and multiple accounts, and improves the identification accuracy.
附图说明Description of drawings
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:Other features, objects and advantages of the present application will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:
图1是本申请的一个实施例可以应用于其中的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application may be applied;
图2是根据本申请用于识别一人多账号的方法的一个实施例的流程图;2 is a flowchart of an embodiment of a method for identifying one person with multiple accounts according to the present application;
图3是根据本实施例的用于识别一人多账号的方法的应用场景的示意图;3 is a schematic diagram of an application scenario of the method for identifying one person with multiple accounts according to the present embodiment;
图4是根据本申请的用于识别一人多账号的方法的又一个实施例的流程图;4 is a flow chart of another embodiment of a method for identifying one person with multiple accounts according to the present application;
图5是根据本申请的用于识别一人多账号的装置的一个实施例的结构图;5 is a structural diagram of an embodiment of an apparatus for identifying one person with multiple accounts according to the present application;
图6是适于用来实现本申请实施例的计算机系统的结构示意图。FIG. 6 is a schematic structural diagram of a computer system suitable for implementing the embodiments of the present application.
具体实施方式Detailed ways
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that the embodiments in the present application and the features of the embodiments may be combined with each other in the case of no conflict. The present application will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
图1示出了可以应用本申请的用于识别一人多账号的方法及装置的示例性架构100。FIG. 1 shows an exemplary architecture 100 to which the method and apparatus for identifying one person with multiple accounts of the present application can be applied.
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。终端设备101、102、103之间通信连接构成拓扑网络,网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、 无线通信链路或者光纤电缆等等。As shown in FIG. 1 , the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 . The communication connections among the terminal devices 101 , 102 , and 103 constitute a topology network, and the network 104 is used to provide a medium for communication links between the terminal devices 101 , 102 , and 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
终端设备101、102、103可以是支持网络连接从而进行数据交互和数据处理的硬件设备或软件。当终端设备101、102、103为硬件时,其可以是支持网络连接,信息获取、交互、显示、处理等功能的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成例如用来提供分布式服务的多个软件或软件模块,也可以实现成单个软件或软件模块。在此不做具体限定。The terminal devices 101, 102, and 103 may be hardware devices or software that support network connection for data interaction and data processing. When the terminal devices 101, 102, and 103 are hardware, they can be various electronic devices that support network connection, information acquisition, interaction, display, processing and other functions, including but not limited to smart phones, tablet computers, e-book readers, Laptops and desktops, etc. When the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented, for example, as multiple software or software modules for providing distributed services, or as a single software or software module. There is no specific limitation here.
服务器105可以是提供各种服务的服务器,例如获取终端设备101、102、103上关联的账号所对应的用户信息、所涉及的标识信息,确定一人多账号的后台处理服务器。后台处理服务器将需要判定是否属于同一人的两个账号组成账号对,并根据是否满足预设关联信息、是否满足预设判定条件以及通过识别模型进行识别的三重识别,得到账号对中的账号是否属于同一人。作为示例,服务器105可以是云端服务器。The server 105 may be a server that provides various services, for example, a background processing server that obtains user information corresponding to the accounts associated with the terminal devices 101, 102, and 103, the involved identification information, and determines one person and multiple accounts. The background processing server will need to determine whether two accounts belonging to the same person form an account pair, and obtain whether the account in the account pair is based on the triple identification of whether the preset association information is met, whether the preset judgment condition is met, and the identification model is used for identification. belong to the same person. As an example, the server 105 may be a cloud server.
需要说明的是,服务器可以是硬件,也可以是软件。当服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务的软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。It should be noted that the server may be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or can be implemented as a single server. When the server is software, it can be implemented as a plurality of software or software modules (for example, software or software modules for providing distributed services), or can be implemented as a single software or software module. There is no specific limitation here.
还需要说明的是,本公开的实施例所提供的用于识别一人多账号的方法可以由服务器执行,也可以由终端设备执行,还可以由服务器和终端设备彼此配合执行。相应地,用于识别一人多账号的装置包括的各个部分(例如各个单元、子单元、模块、子模块)可以全部设置于服务器中,也可以全部设置于终端设备中,还可以分别设置于服务器和终端设备中。It should also be noted that the method for identifying one person with multiple accounts provided by the embodiments of the present disclosure may be executed by a server, a terminal device, or a server and a terminal device in cooperation with each other. Correspondingly, each part (for example, each unit, sub-unit, module, sub-module) included in the device for identifying one person with multiple accounts can be all set in the server, or all can be set in the terminal device, and can also be set in the server respectively. and terminal equipment.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。当用于识别一人多账号的方法运行于其上的电子设备不需要与其他电 子设备进行数据传输时,该系统架构可以仅包括用于识别一人多账号的方法运行于其上的电子设备(例如服务器或终端设备)。It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs. When the electronic device on which the method for identifying one person and multiple accounts is executed does not need to perform data transmission with other electronic devices, the system architecture may only include the electronic device on which the method for identifying multiple accounts is executed (for example, server or terminal device).
继续参考图2,示出了用于识别一人多账号的方法的一个实施例的流程200,包括以下步骤:Continuing to refer to FIG. 2 , a flow 200 of an embodiment of a method for identifying one person with multiple accounts is shown, including the following steps:
步骤201,获取账号对中的账号所对应的用户信息、所涉及的标识信息。Step 201: Obtain the user information and the involved identification information corresponding to the accounts in the account pair.
本实施例中,用于识别一人多账号的方法的执行主体(例如图1中的服务器)可以通过有线连接方式或无线连接方式从远程,或从本地获取账号对中的账号所对应的用户信息、所涉及的标识信息。In this embodiment, the execution body of the method for identifying one person with multiple accounts (for example, the server in FIG. 1 ) can obtain the user information corresponding to the accounts in the account pair from a remote or locally through a wired connection or a wireless connection. , the identification information involved.
其中,账号可以是各种类型的账号,例如京东账号、淘宝账号、QQ账号等。账号对中的两个账号为需要判定是否属于同一人的账号。针对于账号对中的每个账号,上述执行主体可以获取该账号所对应的用户信息、所涉及的标识信息。The account number may be various types of accounts, such as a Jingdong account, a Taobao account, a QQ account, and the like. The two accounts in the account pair are accounts that need to be determined whether they belong to the same person. For each account in the account pair, the above-mentioned execution body may obtain the user information corresponding to the account and the involved identification information.
用户信息表征基于账号所能获取的用户的任意信息。例如,用户信息包括用户在注册账号时填写的注册信息。标识信息表征账号关联过的任意类型的硬件标识信息和软件标识信息。例如,标识信息包括设备ID(Identity document,身份标识号)、小程序的Open ID(开放的身份标识号)和Union ID(联合的身份标识号)等。其中,设备ID包括但不限于是安卓系统的IMEI(International Mobile Equipment Identity,国际移动设备识别码),AID(Andriod ID,Andriod系统的身份标识符),IOS(苹果公司开发的移动操作系统)的IDFA(Identifier for Advertising,广告识别符),Open UDID(Unique Device Identifier,设备的唯一设备识别符)等。本实施例中,当账号在硬件标识信息所对应的硬件设备登录过,或者在软件标识信息对应的软件登录过,则认为标识信息为账号所涉及或所关联的标识信息。User information represents any information of the user that can be obtained based on the account. For example, the user information includes registration information filled in by the user when registering an account. The identification information represents any type of hardware identification information and software identification information associated with the account. For example, the identification information includes a device ID (Identity document, identification number), an Open ID (open identification number) and a Union ID (joint identification number) of the applet, and the like. Among them, the device ID includes but is not limited to IMEI (International Mobile Equipment Identity, International Mobile Equipment Identity) of Android system, AID (Andriod ID, identity identifier of Andriod system), IOS (mobile operating system developed by Apple) IDFA (Identifier for Advertising, advertising identifier), Open UDID (Unique Device Identifier, the unique device identifier of the device), etc. In this embodiment, when the account has logged in the hardware device corresponding to the hardware identification information, or has logged in the software corresponding to the software identification information, the identification information is considered to be identification information involved or associated with the account.
作为示例,账号为京东账号,上述执行主体可以从京东电商全平台的各个业务渠道采集账号对应的用户信息和标识信息。As an example, the account is a JD account, and the above-mentioned execution entity may collect user information and identification information corresponding to the account from various business channels of the entire JD e-commerce platform.
步骤202,确定账号对中的账号是否满足预设关联信息。Step 202: Determine whether the accounts in the account pair satisfy preset association information.
本实施例中,上述执行主体可以确定账号对中的账号是否满足预 设关联信息。其中,预设关联信息用于表征账号对中的账号是否具有属于同一人的可能性。In this embodiment, the above-mentioned execution body can determine whether the account in the account pair satisfies the preset association information. The preset association information is used to represent whether the accounts in the account pair have the possibility of belonging to the same person.
预设关联信息可以根据账号对中的账号具体设置。作为示例,预设关联信息可以是两个账号涉及的标识信息中有至少一个相同的标识信息。例如,两个账号登陆过同一台设备,或者是两个账号连接过同一个无线热点。The preset association information can be specifically set according to the accounts in the account pair. As an example, the preset association information may be that at least one of the identification information related to the two accounts has the same identification information. For example, two accounts have logged into the same device, or two accounts have been connected to the same wireless hotspot.
可以理解,属于同一人的两个账号之间一般存在关联性。通过预设关联信息可以表征同一人账号之间的关联性。当账号对中的两个账号满足预设关联信息时,上述执行主体可以确定账号对中的账号存在属于同一人的可能性;当账号对中的两个账号不满足预设关联信息时,上述执行主体可以确定账号对中的账号不存在属于同一人的可能性。It can be understood that there is generally a correlation between two accounts belonging to the same person. The association between accounts of the same person can be represented by preset association information. When the two accounts in the account pair satisfy the preset association information, the above-mentioned execution body may determine that the accounts in the account pair have the possibility of belonging to the same person; when the two accounts in the account pair do not satisfy the preset association information, the above-mentioned The execution subject can determine that the accounts in the account pair do not have the possibility of belonging to the same person.
步骤203,响应于确定账号对中的账号满足预设关联信息,确定账号对中的账号是否满足预设判定条件。 Step 203 , in response to determining that the accounts in the account pair satisfy the preset association information, determine whether the accounts in the account pair satisfy the preset determination condition.
本实施例中,上述执行主体可以响应于确定账号对中的账号满足预设关联信息,确定账号对中的账号是否满足预设判定条件。其中,预设判定条件用于确定账号对中的账号是否属于同一人。In this embodiment, the above-mentioned execution body may determine whether the account in the account pair satisfies the preset determination condition in response to determining that the account in the account pair satisfies the preset association information. The preset judgment condition is used to determine whether the accounts in the account pair belong to the same person.
预设判定条件可以根据账号对中的账号具体设置。作为示例,账号为电商平台的用户账号,预设判定条件可以为账号之间的注册信息是否相同。具体的,当两个账号的注册信息中的身份证号不同时,可以确定两个账号不属于同一人。The preset judgment condition may be specifically set according to the accounts in the account pair. As an example, the account is a user account of an e-commerce platform, and the preset determination condition may be whether the registration information between the accounts is the same. Specifically, when the ID numbers in the registration information of the two accounts are different, it can be determined that the two accounts do not belong to the same person.
在本实施例的一些可选的实现方式中,预设判定条件包括第一判定条件和第二判定条件。其中,第一判定条件用于确定账号对中的账号属于同一人。当账号对中的账号满足第一判定条件时,可以确定账号对中的账号属于同一人;但是,当账号对中的账号不满足第一判定条件时,并不能确定账号对中的账号不属于同一人。第二判定条件用于确定账号对中的账号不属于同一人,当账号对中的账号满足第二判定条件时,可以确定账号对中的账号不属于同一人;但是,当账号对中的账号不满足第二判定条件时,并不能确定账号对中的账号属于同一人。In some optional implementations of this embodiment, the preset determination condition includes a first determination condition and a second determination condition. The first determination condition is used to determine that the accounts in the account pair belong to the same person. When the accounts in the account pair satisfy the first judgment condition, it can be determined that the accounts in the account pair belong to the same person; however, when the accounts in the account pair do not meet the first judgment condition, it cannot be determined that the accounts in the account pair do not belong to the same person. same person. The second judgment condition is used to determine that the accounts in the account pair do not belong to the same person. When the accounts in the account pair meet the second judgment condition, it can be determined that the accounts in the account pair do not belong to the same person; however, when the accounts in the account pair do not belong to the same person; When the second determination condition is not met, it cannot be determined that the accounts in the account pair belong to the same person.
继续以账号为电商平台的用户账号为例,第一判定条件为账号对 应的最高频的收件信息中的电话相同、账号对应的用户的性别相同且年龄相同。当确定账号对中的账号对应的最高频的收件信息中的电话相同、账号对应的用户的性别相同且年龄相同,则认定账号对中的账号属于同一人;但是,当账号对中的账号收件信息中的电话、账号对应的用户的性别、年龄中存在至少一项不同时,并不能确定账号对中的账号不属于同一人,其仍存在属于同一人的可能性。第二判定条件为账号对中的账号为已注册的家庭账号中的各成员对应的账号,或者账号对中的账号的注册信息中的身份证号不同。其中,家庭账号为以家庭关系进行关联的多个账号,其中的多个账号已被认证为家庭关系。Continuing to take the account of the user account of the e-commerce platform as an example, the first judgment condition is that the phone numbers in the highest frequency receiving information corresponding to the account are the same, and the users corresponding to the account have the same gender and the same age. When it is determined that the phone numbers in the highest frequency receiving information corresponding to the accounts in the account pair are the same, and the users corresponding to the accounts have the same gender and age, it is determined that the accounts in the account pair belong to the same person; When at least one of the phone number in the account receipt information, the gender and age of the user corresponding to the account are different, it cannot be determined that the accounts in the account pair do not belong to the same person, and there is still a possibility that they belong to the same person. The second determination condition is that the accounts in the account pair are accounts corresponding to each member of the registered family accounts, or the ID numbers in the registration information of the accounts in the account pair are different. The family account is a plurality of accounts associated with family relationships, and a plurality of the accounts have been authenticated as family relationships.
当确定账号对中的账号不属于家庭账号中的各成员对应的账号,或者并不能确定账号对中的账号的注册信息中的身份证号不同,此时,并不能认定账号对中的账号就属于同一人。When it is determined that the accounts in the account pair do not belong to the accounts corresponding to each member of the family account, or it is not determined that the ID numbers in the registration information of the accounts in the account pair are different, at this time, it cannot be determined that the accounts in the account pair are belong to the same person.
可以理解,通过预设判定条件可能不能确定出账号对中的账号是否属于同一人。也即,账号对中的账号既不满足第一判定条件,无法被确定为属于同一人,也不满足第二判条件,无法被确定为不属于同一人。It can be understood that it may not be possible to determine whether the accounts in the account pair belong to the same person through the preset judgment conditions. That is, the accounts in the account pair neither satisfy the first judgment condition and cannot be determined to belong to the same person, nor do they meet the second judgment condition and cannot be determined to not belong to the same person.
步骤204,响应于确定账号对中的账号不满足预设判定条件,对账号对中的账号的用户信息、标识信息进行处理,得到账号对对应的特征向量。 Step 204, in response to determining that the account in the account pair does not meet the preset judgment condition, process the user information and identification information of the account in the account pair to obtain a feature vector corresponding to the account pair.
本实施例中,上述执行主体可以响应于确定账号对中的账号不满足预设判定条件,对账号对中的账号的用户信息、标识信息进行处理,得到账号对对应的特征向量。In this embodiment, in response to determining that the account in the account pair does not meet the preset judgment condition, the execution body may process the user information and identification information of the account in the account pair to obtain a feature vector corresponding to the account pair.
作为示例,上述执行主体可以基于相同的标准,将账号对中的账号的用户信息、标识信息进行数字化,将数字化后得到的向量作为账号对对应的特征向量。As an example, the above-mentioned execution body may digitize the user information and identification information of the accounts in the account pair based on the same standard, and use the digitized vector as the feature vector corresponding to the account pair.
在本实施例的一些可选的实现方式中,上述执行主体通过如下方式执行上述步骤204:In some optional implementation manners of this embodiment, the foregoing execution body performs the foregoing step 204 in the following manner:
第一,对账号对中的账号之间的相应项信息进行处理,得到每个相应项信息对应的子特征向量。其中,相应项信息表征账号对中的账号之间相对应的信息。例如,账号对中包括账号x和账号y,账号x 对应的注册信息和账号y对应的注册信息可以认为是相应项信息。上述执行主体可以针对于每一组相应项信息,进行数字化,得到该相应项信息对应的子特征向量。First, the corresponding item information between the accounts in the account pair is processed to obtain a sub-feature vector corresponding to each corresponding item information. The corresponding item information represents information corresponding to the accounts in the account pair. For example, an account pair includes an account x and an account y, and the registration information corresponding to the account x and the registration information corresponding to the account y may be regarded as corresponding items of information. The above-mentioned execution body may digitize each group of corresponding item information to obtain a sub-feature vector corresponding to the corresponding item information.
第二,拼接每个子特征向量,得到特征向量。Second, concatenate each sub-eigenvector to get the eigenvector.
作为示例,上述执行主体可以基于预先设定的顺序拼接每个子特征向量,得到特征向量。As an example, the above-mentioned execution body may splicing each sub-feature vector based on a preset sequence to obtain a feature vector.
在本实施例的一些可选的实现方式中,账号所对应的用户信息包括用户画像信息、消费习惯信息和收件信息。针对于上述用户信息,上述执行主体可以根据账号对中的账号的用户信息,确定表征账号之间的相应项信息的相似度的子特征向量。In some optional implementation manners of this embodiment, the user information corresponding to the account includes user portrait information, consumption habit information, and receipt information. Regarding the user information, the execution body may determine a sub-feature vector representing the similarity of the corresponding item information between the accounts according to the user information of the accounts in the account pair.
可以理解,当账号对中的账号属于同一人时,账号对中的账号对应的用户信息的相似度较高;当账号对中的账号不属于同一人时,账号对中的账号对应的用户信息的相似度较低。It can be understood that when the accounts in the account pair belong to the same person, the similarity of the user information corresponding to the accounts in the account pair is relatively high; when the accounts in the account pair do not belong to the same person, the user information corresponding to the accounts in the account pair is relatively high. the similarity is low.
以收件信息为例,上述执行主体可以统计得到账号对中的账号之间重合的收件信息的订单数占比。针对于收件人姓名,记账号x的订单数为o x,账号y的订单数为o y,记账号x和账号y收件人姓名相同的订单数为o xy。上述执行主体得到重合的收件人姓名在账号x的订单中的占比r x=o xy/o x,得到重合的收件人姓名在账号y的订单中的占比r y=o xy/o y。重合收件信息的订单数占比刻画了收件信息是账号持有者信息的可能性,占比越高,说明收件信息越可能是账号持有者的信息,两个账号之间的相似度越高。 Taking the receipt information as an example, the above-mentioned execution entity can obtain the proportion of the number of orders in which the receipt information overlaps between the accounts in the account pair by statistics. For the recipient's name, the number of orders for account x is o x , the number of orders for account y is o y , and the number of orders for account x and account y with the same recipient name is o xy . The above executive body obtains the ratio of the overlapping recipient names in the order of account x r x =o xy /o x , and obtains the ratio of the overlapping recipient names in the order of account y r y =o xy / o y . The proportion of orders with overlapping receipt information depicts the possibility that the receipt information is the account holder's information. The higher the proportion, the more likely the receipt information is the account holder's information. The similarity between the two accounts is similar. higher degree.
针对于用户信息中如性别、购买力、常用支付方式等可分类的信息,通过特定数值区分账号对中的账号之间的相似度。以性别为例,当账号x和账号y的性别均非空且相同时,相似度确定为1;当账号x和账号y中有性别为空的情况时,相似度确定为0;当账号x和账号y的性别均非空且不同,相似度确定为-1。For the information that can be classified in the user information, such as gender, purchasing power, and common payment methods, the similarity between the accounts in the account pair is distinguished by a specific numerical value. Taking gender as an example, when the gender of account x and account y are both non-empty and the same, the similarity is determined to be 1; when the gender of account x and account y is empty, the similarity is determined to be 0; The gender of account y and account y are both non-null and different, and the similarity is determined to be -1.
在本实施例的一些可选的实现方式中,针对于每一类型的标识信息,上述执行主体可以执行如下操作:In some optional implementations of this embodiment, for each type of identification information, the above-mentioned execution body may perform the following operations:
第一,根据账号对中的账号的该类型的标识信息,确定表征账号对中的账号均涉及的交集标识信息的数量的子特征向量。First, according to the type of identification information of the accounts in the account pair, a sub-feature vector representing the quantity of intersection identification information involved in the accounts in the account pair is determined.
作为示例,该类型的标识信息为安卓系统的IMEI信息。账号对中的账号x的该类型的标识信息包括A、B、C、D,账号对中的账号y的该类型的标识信息包括B、C、D、E,则账号对中的账号x、y的交集标识信息为B、C、D,数量为3。进而,上述执行主体可以确定表征账号对中的账号均涉及的交集标识信息的数量的子特征向量As an example, the identification information of this type is the IMEI information of the Android system. The identification information of this type of account x in the account pair includes A, B, C, and D, and the identification information of this type of account y in the account pair includes B, C, D, and E, then the account x, The intersection identification information of y is B, C, and D, and the number is 3. Further, the above-mentioned execution body can determine the sub-feature vector representing the number of intersection identification information involved in the accounts in the account pair
第二,针对于账号对中的每个账号,根据该账号与交集标识信息的关联次数,以及与该类型的所有的标识信息的关联次数,确定表征交集标识信息相对于该账号的归属度的子特征向量。Second, for each account in the account pair, according to the number of associations between the account and the intersection identification information, and the association times with all identification information of the type, determine the degree of belonging of the intersection identification information relative to the account. sub-feature vector.
本实现方式中,账号每登录一次硬件标识信息对应的硬件设备或软件标识信息对应的软件,可以认定账号与硬件标识信息对应的硬件设备或软件标识信息对应的软件关联一次。In this implementation, each time an account logs in to the hardware device corresponding to the hardware identification information or the software corresponding to the software identification information, it can be determined that the account is associated with the hardware device corresponding to the hardware identification information or the software corresponding to the software identification information.
继续以该类型的标识信息为安卓系统的IMEI信息为例,账号x登陆过的设备的IMEI集合为
Figure PCTCN2022070277-appb-000001
账号y登陆过的设备的IMEI集合为
Figure PCTCN2022070277-appb-000002
则账号x和账号y登陆过的相同的设备的IMEI的集合为
Figure PCTCN2022070277-appb-000003
其中,m和n均为正整数。记账号x在IMEI上登陆的总次数为
Figure PCTCN2022070277-appb-000004
Figure PCTCN2022070277-appb-000005
Figure PCTCN2022070277-appb-000006
其中,x i为账号x在imei i对应的设备上的登录次数。类似地,记账号y在IMEI上登陆的总次数为
Figure PCTCN2022070277-appb-000007
Figure PCTCN2022070277-appb-000008
其中,y i为账号y在imei i对应的设备上的登录次数。确定账号x和账号y在相同地IMEI上登陆的总次数分别为
Figure PCTCN2022070277-appb-000009
Figure PCTCN2022070277-appb-000010
Figure PCTCN2022070277-appb-000011
Figure PCTCN2022070277-appb-000012
我们将IMEI对账号x的归属度定义为
Figure PCTCN2022070277-appb-000013
将IMEI对账号y的归属度定义为
Figure PCTCN2022070277-appb-000014
Figure PCTCN2022070277-appb-000015
直观来看,对于账号对中的账号x和y,IMEI对账号x的归属度即为账号x在两个账号的相同IMEI上的登陆次数与账号x在所有IMEI上登陆次数的占比;IMEI对账号y的归属度即为账号y在两个账号的相同IMEI上的登陆次数与账号y在所有IMEI上登陆次数的占比。ID归属度越大,说明该ID对应的设备越有可能归账号对 应的用户拥有。相对应的,ID归属度小通常说明账号可能只是偶尔在该ID对应的设备登陆一次,并没有强烈的所属关系。
Continue to take this type of identification information as the IMEI information of the Android system as an example, the IMEI set of the device logged in by account x is:
Figure PCTCN2022070277-appb-000001
The IMEI set of the devices logged in by account y is:
Figure PCTCN2022070277-appb-000002
Then the set of IMEIs of the same devices logged in by account x and account y is:
Figure PCTCN2022070277-appb-000003
where m and n are both positive integers. Note that the total number of times account x has logged in on the IMEI is
Figure PCTCN2022070277-appb-000004
but
Figure PCTCN2022070277-appb-000005
Figure PCTCN2022070277-appb-000006
Among them, x i is the number of login times of account x on the device corresponding to imei i . Similarly, record the total number of times account y has logged in on the IMEI as
Figure PCTCN2022070277-appb-000007
but
Figure PCTCN2022070277-appb-000008
Among them, y i is the number of login times of account y on the device corresponding to imei i . Determine the total number of logins of account x and account y on the same IMEI, respectively:
Figure PCTCN2022070277-appb-000009
and
Figure PCTCN2022070277-appb-000010
but
Figure PCTCN2022070277-appb-000011
Figure PCTCN2022070277-appb-000012
We define the attribution of IMEI to account x as
Figure PCTCN2022070277-appb-000013
The attribution of IMEI to account y is defined as
Figure PCTCN2022070277-appb-000014
Figure PCTCN2022070277-appb-000015
Intuitively, for accounts x and y in an account pair, the degree of attribution of IMEI to account x is the ratio of the number of logins of account x on the same IMEI of the two accounts to the number of logins of account x on all IMEIs; IMEI The degree of attribution to account y is the ratio of the number of logins of account y on the same IMEI of the two accounts to the number of logins of account y on all IMEIs. The greater the ID belonging degree, the more likely the device corresponding to the ID is owned by the user corresponding to the account. Correspondingly, a small ID attribution usually means that the account may only log in to the device corresponding to the ID once occasionally, and there is no strong belonging relationship.
第三,确定表征账号对中的账号的归属度之和的子特征向量,以及表征账号对中的账号的归属度之差的子特征向量。Third, a sub-feature vector representing the sum of the attribution degrees of the accounts in the account pair, and a sub-feature vector representing the difference between the attribution degrees of the accounts in the account pair are determined.
承接于上述示例,账号对中的账号的归属度之和为
Figure PCTCN2022070277-appb-000016
Figure PCTCN2022070277-appb-000017
的和,账号对中的账号的归属度之差为
Figure PCTCN2022070277-appb-000018
Figure PCTCN2022070277-appb-000019
的差。
Continuing from the above example, the sum of the attribution degrees of the accounts in the account pair is
Figure PCTCN2022070277-appb-000016
and
Figure PCTCN2022070277-appb-000017
The difference between the attribution degrees of the accounts in the account pair is
Figure PCTCN2022070277-appb-000018
and
Figure PCTCN2022070277-appb-000019
poor.
第四,对于该类型中的每个标识信息,根据该标识信息关联过的账号的数量,确定表征该标识信息共享度的子特征向量。Fourth, for each identification information in the type, according to the number of accounts associated with the identification information, a sub-feature vector representing the sharing degree of the identification information is determined.
其中,标识信息的共享度为:登陆过该标识信息的账号的个数。可以理解,标识信息的共享度越高,说明该标识信息对应的设备越有可能是公用设备。The degree of sharing of the identification information is: the number of accounts that have logged in the identification information. It can be understood that the higher the sharing degree of the identification information, the more likely the device corresponding to the identification information is a public device.
步骤205,将特征向量输入预先训练的识别模型,确定账号对中的账号是否属于同一人。Step 205: Input the feature vector into the pre-trained recognition model to determine whether the accounts in the account pair belong to the same person.
本实施例中,上述执行主体可以将特征向量输入预先训练的识别模型,确定账号对中的账号是否属于同一人。其中,识别模型用于表征账号对对应的特征向量与账号对中的账号是否属于同一人的判定结果之间的对应关系。In this embodiment, the above-mentioned execution body may input the feature vector into the pre-trained recognition model to determine whether the accounts in the account pair belong to the same person. The identification model is used to represent the correspondence between the feature vector corresponding to the account pair and the determination result of whether the accounts in the account pair belong to the same person.
识别模型可以是具有识别功能的任意网络模型,包括但不限于是卷积神经网络模型、循环神经网络模型、残差神经网络模型。The recognition model can be any network model with recognition function, including but not limited to a convolutional neural network model, a recurrent neural network model, and a residual neural network model.
继续参见图3,图3是根据本实施例的用于识别一人多账号的方法的应用场景的一个示意图300。在图3的应用场景中,账号对中包括电商账号x和电商账号y。服务器301首先从数据库服务器302中获取账号对中的账号x和账号y分别对应的用户信息、所涉及的标识信息。然后,服务器确定账号对中的账号x和账号y涉及的标识信息中是否存在至少一个相同的标识信息(也即预设关联信息)。其中,预设关联信息用于表征账号对中的账号x和账号y是否具有属于同一人的可能性。然后,服务器301响应于确定账号对中的账号x和账号y存在至少一个相同的标识信息(包括第二标识和第三标识),确定账号对中的账号x和账号y是否满足收件信息中的电话相同、账号对应的 用户的性别相同且年龄相同,以及账号对中的账号的注册信息中的身份证号不同(也即预设判定条件)。其中,预设判定条件用于确定账号对中的账号是否属于同一人。然后,响应于确定账号对中的账号x和账号y不满足预设判定条件,对账号对中的账号的用户信息、标识信息进行处理,得到账号对对应的特征向量。最后,服务器301将特征向量输入预先训练的识别模型,确定账号对中的账号x和账号y不属于同一人。其中,识别模型用于表征账号对对应的特征向量与账号对中的账号是否属于同一人的判定结果之间的对应关系。Continuing to refer to FIG. 3 , FIG. 3 is a schematic diagram 300 of an application scenario of the method for identifying one person with multiple accounts according to this embodiment. In the application scenario of FIG. 3 , the account pair includes an e-commerce account x and an e-commerce account y. The server 301 first obtains from the database server 302 the user information and the involved identification information respectively corresponding to the account x and the account y in the account pair. Then, the server determines whether there is at least one identical identification information (that is, preset association information) in the identification information related to the account x and the account y in the account pair. The preset association information is used to represent whether the account x and the account y in the account pair have the possibility of belonging to the same person. Then, in response to determining that the account x and the account y in the account pair have at least one identical identification information (including the second identification and the third identification), the server 301 determines whether the account x and the account y in the account pair satisfy the requirements in the recipient information The phone number of the account is the same, the gender and age of the user corresponding to the account are the same, and the ID number in the registration information of the account in the account pair is different (that is, the preset judgment condition). The preset judgment condition is used to determine whether the accounts in the account pair belong to the same person. Then, in response to determining that the account x and the account y in the account pair do not meet the preset determination conditions, the user information and identification information of the account in the account pair are processed to obtain a feature vector corresponding to the account pair. Finally, the server 301 inputs the feature vector into the pre-trained recognition model, and determines that the account x and the account y in the account pair do not belong to the same person. The identification model is used to represent the correspondence between the feature vector corresponding to the account pair and the determination result of whether the accounts in the account pair belong to the same person.
本公开的上述实施例提供的方法,通过获取账号对中的账号所对应的用户信息、所涉及的标识信息;确定账号对中的账号是否满足预设关联信息,预设关联信息用于表征账号对中的账号是否具有属于同一人的可能性;响应于确定账号对中的账号满足预设关联信息,确定账号对中的账号是否满足预设判定条件,其中,预设判定条件用于确定账号对中的账号是否属于同一人;响应于确定账号对中的账号不满足预设判定条件,对账号对中的账号的用户信息、标识信息进行处理,得到账号对对应的特征向量;将特征向量输入预先训练的识别模型,确定账号对中的账号是否属于同一人,其中,识别模型用于表征账号对对应的特征向量与账号对中的账号是否属于同一人的判定结果之间的对应关系,从而提供了一种识别一人多账号的方法,提高了识别准确率。In the method provided by the above-mentioned embodiments of the present disclosure, by acquiring the user information corresponding to the account in the account pair and the involved identification information, it is determined whether the account in the account pair satisfies the preset association information, and the preset association information is used to represent the account. Whether the accounts in the pair have the possibility of belonging to the same person; in response to determining that the accounts in the pair of accounts satisfy the preset association information, determine whether the accounts in the pair of accounts satisfy the preset judgment condition, wherein the preset judgment condition is used to determine the account number Whether the accounts in the pair belong to the same person; in response to determining that the accounts in the pair of accounts do not meet the preset judgment conditions, the user information and identification information of the accounts in the pair of accounts are processed to obtain the feature vector corresponding to the pair of accounts; the feature vector Input the pre-trained identification model to determine whether the accounts in the account pair belong to the same person, wherein the identification model is used to represent the corresponding relationship between the feature vector corresponding to the account pair and the judgment result of whether the accounts in the account pair belong to the same person, Thus, a method for identifying one person with multiple accounts is provided, and the identification accuracy is improved.
在本实施例的一些可选的实现方式中,识别模型通过如下方式训练得到:In some optional implementations of this embodiment, the recognition model is obtained by training in the following manner:
第一,获取账号集合中每个账号所对应的用户信息、所涉及的标识信息。First, obtain the user information and the involved identification information corresponding to each account in the account set.
本实现方式中,账号集合中包括大量的账号。获取账号所对应的用户信息、所涉及的标识信息的方式可以参照步骤201的方式进行,在此不再赘述。In this implementation manner, the account set includes a large number of accounts. The manner of acquiring the user information corresponding to the account and the involved identification information can be performed with reference to the manner of step 201, and details are not described herein again.
第二,将账号集合中满足预设关联信息的账号组合为账号对,得到多个账号对。Second, the accounts in the account set that satisfy the preset association information are combined into account pairs to obtain multiple account pairs.
可以理解,满足预设关联信息的账号才具有属于同一人的可能性, 当存在属于同一人的可能性时,可以将账号组合成账号对。It can be understood that only accounts that satisfy the preset association information have the possibility of belonging to the same person, and when there is a possibility of belonging to the same person, the accounts can be combined into account pairs.
第三,从多个账号对中筛选出满足预设判定条件的账号对,得到多个训练账号对,并根据预设判定条件对每个训练账号对设置表征该训练账号对中的账号是否属于同一人的标签。Third, screen out the account pairs that meet the preset judgment conditions from the multiple account pairs, obtain multiple training account pairs, and set each training account pair according to the preset judgment conditions to indicate whether the account in the training account pair belongs to Same person label.
本实现方式,预设判定条件可以确定出属于同一人的账号对和不属于同一人的账号对。针对于属于同一人的账号对,上述执行主体可以设置该账号对中的账号属于同一人的标签,并将此类账号对作为正样本。针对于不属于同一人的账号对,上述执行主体可以设置该账号对中的账号不属于同一人的标签,并将此类账号对作为负样本。In this implementation manner, the preset judgment condition can determine the account pair that belongs to the same person and the account pair that does not belong to the same person. For account pairs belonging to the same person, the above-mentioned execution entity may set a label that the accounts in the account pair belong to the same person, and use such account pairs as positive samples. For account pairs that do not belong to the same person, the above-mentioned execution entity may set a label that the accounts in the account pair do not belong to the same person, and use such account pairs as negative samples.
第四,对多个账号对中的每个训练账号对中的账号的用户信息、标识信息进行处理,得到该训练账号对对应的特征向量。Fourth, the user information and identification information of the accounts in each training account pair in the multiple account pairs are processed to obtain a feature vector corresponding to the training account pair.
本实现方式中,信息处理的方式可以参照步骤204的方式进行,在此不再赘述。In this implementation manner, the manner of information processing can be performed with reference to the manner of step 204, and details are not described herein again.
第五,利用机器学习方法,以训练账号对对应的特征向量作为输入,以所输入的训练账号对对应的标签作为期望输出,训练初始识别模型,得到识别模型。Fifth, using the machine learning method, the feature vector corresponding to the training account pair is used as the input, and the label corresponding to the input training account pair is used as the expected output to train the initial recognition model to obtain the recognition model.
本实现方式中,上述执行主体可以通过大量的训练账号对循环训练初始模型,响应于达到预设结束条件,完成初始模型的训练。其中,预设结束条件例如可以是模型的损失函数收敛,训练次数达到一定次数。In this implementation manner, the above-mentioned execution subject can cyclically train the initial model through a large number of training account pairs, and complete the training of the initial model in response to reaching the preset end condition. The preset end condition may be, for example, that the loss function of the model converges and the number of training times reaches a certain number of times.
在本实施例的一些可选的实现方式中,上述执行主体还可以针对属于同一人的多个账号,基于用户粒度进行信息处理。其中,用户粒度表征以用户为单位,而不是以账号为单位。作为示例,信息处理例如可以是信息推送。当账号x和账号y属于同一人时,以账号x和账号y所属的用户为信息推送的对象进行信息推送。具体的,可以针对于该用户的一个账号进行信息推送。当该用户的一个账号接收到推送信息后,不再对该用户的另一账号进行信息推送。可以理解,基于用户粒度进行信息处理,可以提高用户的体验度In some optional implementation manners of this embodiment, the above-mentioned execution body may also perform information processing based on user granularity for multiple accounts belonging to the same person. Among them, the user granularity representation is in units of users, not accounts. As an example, the information processing may be, for example, information push. When account x and account y belong to the same person, the information is pushed with the user to which account x and account y belong as the object of information push. Specifically, information push may be performed for an account of the user. After one account of the user receives the push information, information is not pushed to another account of the user. It can be understood that information processing based on user granularity can improve user experience.
继续参考图4,示出了根据本申请的用于识别一人多账号的方法 的一个实施例的示意性流程400,包括以下步骤:Continuing to refer to Fig. 4, a schematic flow 400 of an embodiment of the method for identifying one person with multiple accounts according to the present application is shown, including the following steps:
步骤401,获取账号对中的账号所对应的用户信息、所涉及的标识信息。Step 401: Obtain the user information and the involved identification information corresponding to the accounts in the account pair.
步骤402,确定账号对中的账号是否满足预设关联信息。Step 402: Determine whether the accounts in the account pair satisfy the preset association information.
其中,预设关联信息用于表征账号对中的账号是否具有属于同一人的可能性。The preset association information is used to represent whether the accounts in the account pair have the possibility of belonging to the same person.
步骤403,响应于确定账号对中的账号满足预设关联信息,确定账号对中的账号是否满足第一判定条件。 Step 403 , in response to determining that the account in the account pair satisfies the preset association information, determine whether the account in the account pair satisfies the first determination condition.
其中,第一判定条件用于确定账号对中的账号属于同一人。当账号对中的账号满足第一判定条件,证明账号对中的账号属于同一人。需要说明的是,当账号对中的账号不满足第一判定条件,并不能证明账号对中的账号不属于同一人。The first determination condition is used to determine that the accounts in the account pair belong to the same person. When the accounts in the account pair meet the first determination condition, it is proved that the accounts in the account pair belong to the same person. It should be noted that when the accounts in the account pair do not meet the first determination condition, it cannot be proved that the accounts in the account pair do not belong to the same person.
步骤404,响应于确定账号对中的账号不满足第一判定条件,确定账号对中的账号是否满足第二判定条件。 Step 404, in response to determining that the account in the account pair does not satisfy the first determination condition, determine whether the account in the account pair meets the second determination condition.
其中,第二判定条件用于确定账号对中的账号不属于同一人。当账号对中的账号满足第二判定条件,证明账号对中的账号不属于同一人。需要说明的是,当账号对中的账号不满足第二判定条件,并不能证明账号对中的账号属于同一人。The second determination condition is used to determine that the accounts in the account pair do not belong to the same person. When the accounts in the account pair meet the second determination condition, it proves that the accounts in the account pair do not belong to the same person. It should be noted that when the accounts in the account pair do not satisfy the second determination condition, it cannot be proved that the accounts in the account pair belong to the same person.
步骤405,响应于确定账号对中的账号不满足第一判定条件和第二判定条件,执行如下操作: Step 405, in response to determining that the account in the account pair does not meet the first judgment condition and the second judgment condition, perform the following operations:
步骤4051,根据账号对中的账号的用户信息,确定表征账号之间的相应项信息的相似度的子特征向量。 Step 4051 , according to the user information of the accounts in the account pair, determine a sub-feature vector representing the similarity of the corresponding item information between the accounts.
步骤4052,针对于每一类型的标识信息,执行如下操作: Step 4052, for each type of identification information, perform the following operations:
步骤40521,根据账号对中的账号的该类型的标识信息,确定表征账号对中的账号均涉及的交集标识信息的数量的子特征向量。Step 40521: Determine, according to the type of identification information of the accounts in the account pair, a sub-feature vector representing the quantity of intersection identification information involved in all the accounts in the account pair.
步骤40522,针对于账号对中的每个账号,根据该账号与交集标识信息的关联次数,以及与该类型的所有的标识信息的关联次数,确定表征交集标识信息相对于该账号的归属度的子特征向量。Step 40522, for each account in the account pair, according to the number of associations between the account and the intersection identification information, and the association times with all identification information of this type, determine the degree of belonging that characterizes the intersection identification information relative to the account. sub-feature vector.
步骤40523,确定表征账号对中的账号的归属度之和的子特征向量,以及表征账号对中的账号的归属度之差的子特征向量。Step 40523: Determine a sub-feature vector representing the sum of the attribution degrees of the accounts in the account pair, and a sub-feature vector representing the difference between the attribution degrees of the accounts in the account pair.
步骤40524,对于该类型中的每个标识信息,根据该标识信息关联过的账号的数量,确定表征该标识信息共享度的子特征向量。Step 40524: For each identification information in the type, according to the number of accounts associated with the identification information, determine a sub-feature vector representing the sharing degree of the identification information.
步骤406,拼接每个子特征向量,得到特征向量。 Step 406, splicing each sub-feature vector to obtain a feature vector.
步骤407,将特征向量输入预先训练的识别模型,确定账号对中的账号是否属于同一人。Step 407: Input the feature vector into the pre-trained recognition model to determine whether the accounts in the account pair belong to the same person.
其中,识别模型用于表征账号对对应的特征向量与账号对中的账号是否属于同一人的判定结果之间的对应关系。The identification model is used to represent the correspondence between the feature vector corresponding to the account pair and the determination result of whether the accounts in the account pair belong to the same person.
从本实施例中可以看出,与图2对应的实施例相比,本实施例中的用于识别一人多账号的方法的流程400具体说明了特征向量的处理过程以及账号对的账号是否属于同一人的判定过程,进一步提高了识别准确率。It can be seen from this embodiment that, compared with the embodiment corresponding to FIG. 2 , the process 400 of the method for identifying one person with multiple accounts in this embodiment specifically describes the processing process of the feature vector and whether the account of the account pair belongs to The determination process of the same person further improves the recognition accuracy.
继续参考图5,作为对上述各图所示方法的实现,本公开提供了一种用于识别一人多账号的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Continuing to refer to FIG. 5 , as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for identifying one person with multiple accounts, and the apparatus embodiment corresponds to the method embodiment shown in FIG. 2 . , the device can be specifically applied to various electronic devices.
如图5所示,用于识别一人多账号的装置包括:包括:获取单元501,被配置成获取账号对中的账号所对应的用户信息、所涉及的标识信息;第一确定单元502,被配置成确定账号对中的账号是否满足预设关联信息,预设关联信息用于表征账号对中的账号是否具有属于同一人的可能性;第二确定单元503,被配置成响应于确定账号对中的账号满足预设关联信息,确定账号对中的账号是否满足预设判定条件,其中,预设判定条件用于确定账号对中的账号是否属于同一人;处理单元504,被配置成响应于确定账号对中的账号不满足预设判定条件,对账号对中的账号的用户信息、标识信息进行处理,得到账号对对应的特征向量;第三确定单元505,被配置成将特征向量输入预先训练的识别模型,确定账号对中的账号是否属于同一人,其中,识别模型用于表征账号对对应的特征向量与账号对中的账号是否属于同一人的判定结果之间的对应关系。As shown in FIG. 5 , the device for identifying one person with multiple accounts includes: an obtaining unit 501, configured to obtain user information corresponding to an account in the account pair and the involved identification information; a first determining unit 502, configured by is configured to determine whether the account in the account pair satisfies the preset association information, and the preset association information is used to represent whether the account in the account pair has the possibility of belonging to the same person; the second determining unit 503 is configured to respond to determining the account pair. The accounts in the pair of accounts satisfy the preset association information, and it is determined whether the accounts in the account pair meet the preset judgment conditions, wherein the preset judgment conditions are used to determine whether the accounts in the account pair belong to the same person; the processing unit 504 is configured to respond to It is determined that the account in the account pair does not meet the preset judgment condition, and the user information and identification information of the account in the account pair are processed to obtain a feature vector corresponding to the account pair; the third determining unit 505 is configured to input the feature vector in advance. The trained identification model determines whether the accounts in the account pair belong to the same person, wherein the identification model is used to represent the correspondence between the feature vector corresponding to the account pair and the judgment result of whether the accounts in the account pair belong to the same person.
在一些实施例中,预设判定条件包括第一判定条件和第二判定条件,第一判定条件用于确定账号对中的账号属于同一人,第二判定条 件用于确定账号对中的账号不属于同一人。In some embodiments, the preset judgment condition includes a first judgment condition and a second judgment condition, the first judgment condition is used to determine that the accounts in the account pair belong to the same person, and the second judgment condition is used to determine that the accounts in the account pair do not belong to the same person. belong to the same person.
在一些实施例中,处理单元504包括:处理子单元(图中未示出),被配置成对账号对中的账号之间的相应项信息进行处理,得到每个相应项信息对应的子特征向量;拼接单元(图中未示出),被配置成拼接每个子特征向量,得到特征向量。In some embodiments, the processing unit 504 includes: a processing subunit (not shown in the figure), configured to process the corresponding item information between the accounts in the account pair to obtain the subfeature corresponding to each corresponding item information vector; a splicing unit (not shown in the figure), configured to splicing each sub-feature vector to obtain a feature vector.
在一些实施例中,用户信息包括用户画像信息、消费习惯信息和收件信息;处理子单元(图中未示出),进一步被配置成:根据账号对中的账号的用户信息,确定表征账号之间的相应项信息的相似度的子特征向量。In some embodiments, the user information includes user portrait information, consumption habit information and receipt information; the processing subunit (not shown in the figure) is further configured to: determine the representative account according to the user information of the account in the account pair A sub-feature vector of the similarity between the corresponding item information.
在一些实施例中,处理子单元(图中未示出),进一步被配置成:针对于每一类型的标识信息,执行如下操作:根据账号对中的账号的该类型的标识信息,确定表征账号对中的账号均涉及的交集标识信息的数量的子特征向量;针对于账号对中的每个账号,根据该账号与交集标识信息的关联次数,以及与该类型的所有的标识信息的关联次数,确定表征交集标识信息相对于该账号的归属度的子特征向量;确定表征账号对中的账号的归属度之和的子特征向量,以及表征账号对中的账号的归属度之差的子特征向量;对于该类型中的每个标识信息,根据该标识信息关联过的账号的数量,确定表征该标识信息共享度的子特征向量。In some embodiments, the processing subunit (not shown in the figure) is further configured to: for each type of identification information, perform the following operations: determine the representation according to the type of identification information of the account in the account pair The sub-feature vector of the number of intersection identification information involved in the accounts in the account pair; for each account in the account pair, according to the number of associations between the account and the intersection identification information, and the association with all identification information of this type determine the sub-feature vector representing the attribution degree of the intersection identification information relative to the account; determine the sub-feature vector representing the sum of the attribution degrees of the accounts in the account pair, and the sub-feature vector representing the difference between the attribution degrees of the accounts in the account pair Feature vector; for each identification information in this type, according to the number of accounts associated with the identification information, determine the sub-feature vector representing the sharing degree of the identification information.
在一些实施例中,上述装置还包括:训练单元(图中未示出),被配置成通过如下方式训练得到识别模型:获取账号集合中每个账号所对应的用户信息、所涉及的标识信息;将账号集合中满足预设关联信息的账号组合为账号对,得到多个账号对;从多个账号对中筛选出满足预设判定条件的账号对,得到多个训练账号对,并根据预设判定条件对每个训练账号对设置表征该训练账号对中的账号是否属于同一人的标签;对多个账号对中的每个训练账号对中的账号的用户信息、标识信息进行处理,得到该训练账号对对应的特征向量;利用机器学习方法,以训练账号对对应的特征向量作为输入,以所输入的训练账号对对应的标签作为期望输出,训练初始识别模型,得到识别模型。In some embodiments, the above-mentioned apparatus further includes: a training unit (not shown in the figure), configured to obtain an identification model by training in the following manner: acquiring the user information corresponding to each account in the account set, the involved identification information ; Combining the accounts meeting the preset association information in the account set into account pairs to obtain multiple account pairs; screening out account pairs that meet the preset judgment conditions from the multiple account pairs to obtain multiple training account pairs, and according to the preset Set the judgment condition to set a label for each training account pair to indicate whether the accounts in the training account pair belong to the same person; process the user information and identification information of the account in each training account pair in the multiple account pairs, and obtain The feature vector corresponding to the training account pair; using the machine learning method, the feature vector corresponding to the training account pair is used as input, and the label corresponding to the input training account pair is used as the expected output to train the initial recognition model to obtain the recognition model.
本实施例中,用于识别一人多账号的装置中的获取单元获取账号 对中的账号所对应的用户信息、所涉及的标识信息;第一确定单元确定账号对中的账号是否满足预设关联信息,预设关联信息用于表征账号对中的账号是否具有属于同一人的可能性;第二确定单元响应于确定账号对中的账号满足预设关联信息,确定账号对中的账号是否满足预设判定条件,其中,预设判定条件用于确定账号对中的账号是否属于同一人;处理单元响应于确定账号对中的账号不满足预设判定条件,对账号对中的账号的用户信息、标识信息进行处理,得到账号对对应的特征向量;第三确定单元将特征向量输入预先训练的识别模型,确定账号对中的账号是否属于同一人,其中,识别模型用于表征账号对对应的特征向量与账号对中的账号是否属于同一人的判定结果之间的对应关系,从而提供了一种识别一人多账号的装置,提高了识别准确率。In this embodiment, the obtaining unit in the device for identifying one person with multiple accounts obtains the user information corresponding to the account in the account pair and the involved identification information; the first determining unit determines whether the account in the account pair satisfies the preset association information, the preset association information is used to represent whether the accounts in the account pair have the possibility of belonging to the same person; the second determination unit determines whether the accounts in the account pair satisfy the preset association information in response to determining whether the accounts in the account pair meet the preset association information. Set a judgment condition, wherein, the preset judgment condition is used to determine whether the accounts in the account pair belong to the same person; the processing unit responds to determining that the account in the account pair does not meet the preset judgment condition, the user information of the account in the account pair, user information, The identification information is processed to obtain the feature vector corresponding to the account pair; the third determining unit inputs the feature vector into the pre-trained identification model to determine whether the accounts in the account pair belong to the same person, wherein the identification model is used to represent the corresponding features of the account pair The corresponding relationship between the vector and the determination result of whether the accounts in the account pair belong to the same person provides a device for identifying one person and multiple accounts, and improves the identification accuracy.
下面参考图6,其示出了适于用来实现本申请实施例的设备(例如图1所示的设备101、102、103、105)的计算机系统600的结构示意图。图6示出的设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。Referring next to FIG. 6 , it shows a schematic structural diagram of a computer system 600 suitable for implementing the devices of the embodiments of the present application (eg, devices 101 , 102 , 103 , and 105 shown in FIG. 1 ). The device shown in FIG. 6 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.
如图6所示,计算机系统600包括处理器(例如CPU,中央处理器)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM603中,还存储有系统600操作所需的各种程序和数据。处理器601、ROM602以及RAM603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。As shown in FIG. 6, a computer system 600 includes a processor (eg, CPU, central processing unit) 601 that can be loaded into a random access memory (RAM) according to a program stored in a read only memory (ROM) 602 or from a storage section 608 The program in 603 executes various appropriate actions and processes. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The processor 601 , the ROM 602 and the RAM 603 are connected to each other through a bus 604 . An input/output (I/O) interface 605 is also connected to bus 604 .
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据 需要被安装入存储部分608。The following components are connected to the I/O interface 605: an input section 606 including a keyboard, a mouse, etc.; an output section 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 608 including a hard disk, etc. ; and a communication section 609 including a network interface card such as a LAN card, a modem, and the like. The communication section 609 performs communication processing via a network such as the Internet. A drive 610 is also connected to the I/O interface 605 as needed. A removable medium 611, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 610 as needed so that a computer program read therefrom is installed into the storage section 608 as needed.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被处理器601执行时,执行本申请的方法中限定的上述功能。In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 609 and/or installed from the removable medium 611 . When the computer program is executed by the processor 601, the above-mentioned functions defined in the method of the present application are performed.
需要说明的是,本申请的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium of the present application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this application, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码,程序设计语言包括面向目标的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言 —诸如”C”语言或类似的程序设计语言。程序代码可以完全地在客户计算机上执行、部分地在客户计算机上执行、作为一个独立的软件包执行、部分在客户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到客户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing the operations of the present application may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional procedures, or a combination thereof programming language - such as "C" or a similar programming language. The program code may execute entirely on the client computer, partly on the client computer, as a stand-alone software package, partly on the client computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the client computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider via Internet connection).
附图中的流程图和框图,图示了按照本申请各种实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器,包括获取单元、第一确定单元、第二确定单元、处理单元和第三确定单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,第三确定单元还可以被描述为“将特征向量输入预先训练的识别模型,确定账号对中的账号是否属于同一人的单元”。The units involved in the embodiments of the present application may be implemented in a software manner, and may also be implemented in a hardware manner. The described unit may also be provided in the processor, for example, it may be described as: a processor including an acquisition unit, a first determination unit, a second determination unit, a processing unit and a third determination unit. Among them, the names of these units do not constitute a limitation of the unit itself under certain circumstances. For example, the third determination unit can also be described as "input the feature vector into the pre-trained recognition model, and determine whether the account in the account pair is not. Units belonging to the same person".
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的设备中所包含的;也可以是单独存在,而未装配入该设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该装置执行时,使得该计算机设备:获取账号对中的账号所对应的用户信息、所涉及的标识信息;确 定账号对中的账号是否满足预设关联信息,预设关联信息用于表征账号对中的账号是否具有属于同一人的可能性;响应于确定账号对中的账号满足预设关联信息,确定账号对中的账号是否满足预设判定条件,其中,预设判定条件用于确定账号对中的账号是否属于同一人;响应于确定账号对中的账号不满足预设判定条件,对账号对中的账号的用户信息、标识信息进行处理,得到账号对对应的特征向量;将特征向量输入预先训练的识别模型,确定账号对中的账号是否属于同一人,其中,识别模型用于表征账号对对应的特征向量与账号对中的账号是否属于同一人的判定结果之间的对应关系。As another aspect, the present application also provides a computer-readable medium. The computer-readable medium may be included in the device described in the above embodiments, or may exist alone without being assembled into the device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the device, the computer equipment: obtains the user information corresponding to the account in the account pair and the involved identification information; determines Whether the accounts in the account pair satisfy the preset association information, the preset association information is used to represent whether the accounts in the account pair have the possibility of belonging to the same person; in response to determining that the accounts in the account pair satisfy the preset association information, determine the account pair Whether the accounts in the account pair meet the preset judgment conditions, wherein the preset judgment conditions are used to determine whether the accounts in the account pair belong to the same person; in response to determining that the accounts in the account pair do not meet the preset judgment conditions, the account in the account pair is determined. The user information and identification information of the account pair are processed to obtain the feature vector corresponding to the account pair; the feature vector is input into the pre-trained identification model to determine whether the accounts in the account pair belong to the same person, wherein the identification model is used to represent the corresponding features of the account pair. The correspondence between the vector and the determination result of whether the accounts in the account pair belong to the same person.
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present application and an illustration of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to the technical solution formed by the specific combination of the above technical features, and should also cover the above technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above-mentioned features with the technical features disclosed in this application (but not limited to) with similar functions.

Claims (14)

  1. 一种用于识别一人多账号的方法,包括:A method for identifying one person with multiple accounts, comprising:
    获取账号对中的账号所对应的用户信息、所涉及的标识信息;Obtain the user information and the involved identification information corresponding to the accounts in the account pair;
    确定所述账号对中的账号是否满足预设关联信息,所述预设关联信息用于表征所述账号对中的账号是否具有属于同一人的可能性;determining whether the accounts in the account pair satisfy preset association information, where the preset association information is used to represent whether the accounts in the account pair have the possibility of belonging to the same person;
    响应于确定所述账号对中的账号满足所述预设关联信息,确定所述账号对中的账号是否满足预设判定条件,其中,所述预设判定条件用于确定所述账号对中的账号是否属于同一人;In response to determining that the account in the account pair satisfies the preset association information, it is determined whether the account in the account pair satisfies a preset judgment condition, wherein the preset judgment condition is used to determine the account in the account pair. Whether the account belongs to the same person;
    响应于确定所述账号对中的账号不满足所述预设判定条件,对所述账号对中的账号的用户信息、标识信息进行处理,得到所述账号对对应的特征向量;以及In response to determining that the account in the account pair does not meet the preset judgment condition, processing the user information and identification information of the account in the account pair to obtain a feature vector corresponding to the account pair; and
    将所述特征向量输入预先训练的识别模型,确定所述账号对中的账号是否属于同一人,其中,所述识别模型用于表征账号对对应的特征向量与账号对中的账号是否属于同一人的判定结果之间的对应关系。Input the feature vector into a pre-trained identification model to determine whether the accounts in the account pair belong to the same person, wherein the identification model is used to represent whether the feature vector corresponding to the account pair and the account in the account pair belong to the same person The corresponding relationship between the judgment results.
  2. 根据权利要求1所述的方法,其中,所述预设判定条件包括第一判定条件和第二判定条件,所述第一判定条件用于确定所述账号对中的账号属于同一人,所述第二判定条件用于确定所述账号对中的账号不属于同一人。The method according to claim 1, wherein the preset judgment condition includes a first judgment condition and a second judgment condition, the first judgment condition is used to determine that the accounts in the account pair belong to the same person, the The second determination condition is used to determine that the accounts in the account pair do not belong to the same person.
  3. 根据权利要求1所述的方法,其中,所述对所述账号对中的账号的用户信息、标识信息进行处理,得到所述账号对对应的特征向量,包括:The method according to claim 1, wherein the processing of the user information and identification information of the accounts in the account pair to obtain a feature vector corresponding to the account pair, comprising:
    对所述账号对中的账号之间的相应项信息进行处理,得到每个相应项信息对应的子特征向量;以及processing the corresponding item information between the accounts in the account pair to obtain a sub-feature vector corresponding to each corresponding item information; and
    拼接每个子特征向量,得到所述特征向量。The eigenvectors are obtained by concatenating each sub-eigenvector.
  4. 根据权利要求3所述的方法,其中,所述用户信息包括用户画像信息、消费习惯信息和收件信息;The method according to claim 3, wherein the user information includes user portrait information, consumption habit information and receipt information;
    所述对所述账号对中的账号之间的相应项信息进行处理,得到每个相应项信息对应的子特征向量,包括:The processing of the corresponding item information between the accounts in the account pair to obtain the sub-feature vector corresponding to each corresponding item information, including:
    根据所述账号对中的账号的用户信息,确定表征账号之间的相应项信息的相似度的子特征向量。According to the user information of the accounts in the account pair, a sub-feature vector representing the similarity of the corresponding item information between the accounts is determined.
  5. 根据权利要求3所述的方法,其中,所述对所述账号对中的账号之间的相应项信息进行处理,得到每个相应项信息对应的子特征向量,包括:The method according to claim 3, wherein the processing of the corresponding item information between the accounts in the account pair to obtain the sub-feature vector corresponding to each corresponding item information, comprising:
    针对于每一类型的标识信息,执行如下操作:For each type of identification information, do the following:
    根据所述账号对中的账号的该类型的标识信息,确定表征所述账号对中的账号均涉及的交集标识信息的数量的子特征向量;According to the type of identification information of the accounts in the account pair, determine a sub-feature vector representing the number of intersection identification information involved in the accounts in the account pair;
    针对于所述账号对中的每个账号,根据该账号与所述交集标识信息的关联次数,以及与该类型的所有的标识信息的关联次数,确定表征所述交集标识信息相对于该账号的归属度的子特征向量;For each account in the account pair, according to the number of associations between the account and the intersection identification information, and the association times with all identification information of the type, determine the relationship between the intersection identification information and the account. The sub-feature vector of the attribution degree;
    确定表征所述账号对中的账号的归属度之和的子特征向量,以及表征所述账号对中的账号的归属度之差的子特征向量;以及determining a sub-feature vector representing the sum of the degrees of belonging of the accounts in the pair of accounts, and a sub-feature vector representing the difference in the degrees of belonging of the accounts in the pair of accounts; and
    对于该类型中的每个标识信息,根据该标识信息关联过的账号的数量,确定表征该标识信息共享度的子特征向量。For each identification information in this type, according to the number of accounts associated with the identification information, a sub-feature vector representing the sharing degree of the identification information is determined.
  6. 根据权利要求1-5任一所述的方法,其中,所述识别模型通过如下方式训练得到:The method according to any one of claims 1-5, wherein the recognition model is obtained by training in the following manner:
    获取账号集合中每个账号所对应的用户信息、所涉及的标识信息;Obtain the user information and the involved identification information corresponding to each account in the account set;
    将所述账号集合中满足所述预设关联信息的账号组合为账号对,得到多个账号对;combining the accounts satisfying the preset association information in the account set into account pairs to obtain multiple account pairs;
    从所述多个账号对中筛选出满足所述预设判定条件的账号对,得到多个训练账号对,并根据所述预设判定条件对每个训练账号对设置表征该训练账号对中的账号是否属于同一人的标签;Screen out the account pairs that meet the preset judgment conditions from the multiple account pairs, obtain multiple training account pairs, and set each training account pair according to the preset judgment conditions to characterize the training account pair. Whether the account belongs to the same person's tag;
    对所述多个账号对中的每个训练账号对中的账号的用户信息、标识信息进行处理,得到该训练账号对对应的特征向量;以及processing the user information and identification information of the account in each training account pair in the plurality of account pairs to obtain a feature vector corresponding to the training account pair; and
    利用机器学习方法,以训练账号对对应的特征向量作为输入,以 所输入的训练账号对对应的标签作为期望输出,训练初始识别模型,得到所述识别模型。Using the machine learning method, take the training account pair corresponding feature vector as input, and take the input training account pair corresponding label as the expected output, train the initial recognition model, and obtain the recognition model.
  7. 一种用于识别一人多账号的装置,包括:A device for identifying one person with multiple accounts, comprising:
    获取单元,被配置成获取账号对中的账号所对应的用户信息、所涉及的标识信息;an obtaining unit, configured to obtain the user information and the involved identification information corresponding to the account in the account pair;
    第一确定单元,被配置成确定所述账号对中的账号是否满足预设关联信息,所述预设关联信息用于表征所述账号对中的账号是否具有属于同一人的可能性;a first determining unit, configured to determine whether the accounts in the account pair satisfy preset association information, and the preset association information is used to represent whether the accounts in the account pair have the possibility of belonging to the same person;
    第二确定单元,被配置成响应于确定所述账号对中的账号满足所述预设关联信息,确定所述账号对中的账号是否满足预设判定条件,其中,所述预设判定条件用于确定所述账号对中的账号是否属于同一人;The second determination unit is configured to, in response to determining that the account in the account pair satisfies the preset association information, determine whether the account in the account pair satisfies a preset judgment condition, wherein the preset judgment condition is determined by using To determine whether the accounts in the account pair belong to the same person;
    处理单元,被配置成响应于确定所述账号对中的账号不满足所述预设判定条件,对所述账号对中的账号的用户信息、标识信息进行处理,得到所述账号对对应的特征向量;以及a processing unit, configured to process the user information and identification information of the account in the account pair in response to determining that the account in the account pair does not meet the preset judgment condition, and obtain the corresponding feature of the account pair vector; and
    第三确定单元,被配置成将所述特征向量输入预先训练的识别模型,确定所述账号对中的账号是否属于同一人,其中,所述识别模型用于表征账号对对应的特征向量与账号对中的账号是否属于同一人的判定结果之间的对应关系。The third determination unit is configured to input the feature vector into a pre-trained identification model, and determine whether the accounts in the account pair belong to the same person, wherein the identification model is used to represent the feature vector corresponding to the account pair and the account number The correspondence between the judgment results of whether the accounts in the pair belong to the same person.
  8. 根据权利要求7所述的装置,其中,所述预设判定条件包括第一判定条件和第二判定条件,所述第一判定条件用于确定所述账号对中的账号属于同一人,所述第二判定条件用于确定所述账号对中的账号不属于同一人。The device according to claim 7, wherein the preset judgment condition includes a first judgment condition and a second judgment condition, the first judgment condition is used to determine that the accounts in the account pair belong to the same person, the The second determination condition is used to determine that the accounts in the account pair do not belong to the same person.
  9. 根据权利要求7所述的装置,其中,所述处理单元包括:The apparatus of claim 7, wherein the processing unit comprises:
    处理子单元,被配置成对所述账号对中的账号之间的相应项信息进行处理,得到每个相应项信息对应的子特征向量;以及a processing subunit, configured to process the corresponding item information between the accounts in the account pair to obtain a sub-feature vector corresponding to each corresponding item information; and
    拼接单元,被配置成拼接每个子特征向量,得到所述特征向量。The stitching unit is configured to stitch each sub-feature vector to obtain the feature vector.
  10. 根据权利要求9所述的装置,其中,所述用户信息包括用户画像信息、消费习惯信息和收件信息;The device according to claim 9, wherein the user information includes user portrait information, consumption habit information and receipt information;
    所述处理子单元,进一步被配置成:The processing subunit is further configured to:
    根据所述账号对中的账号的用户信息,确定表征账号之间的相应项信息的相似度的子特征向量。According to the user information of the accounts in the account pair, a sub-feature vector representing the similarity of the corresponding item information between the accounts is determined.
  11. 根据权利要求9所述的装置,其中,所述处理子单元,进一步被配置成:The apparatus of claim 9, wherein the processing subunit is further configured to:
    针对于每一类型的标识信息,执行如下操作:For each type of identification information, do the following:
    根据所述账号对中的账号的该类型的标识信息,确定表征所述账号对中的账号均涉及的交集标识信息的数量的子特征向量;According to the type of identification information of the accounts in the account pair, determine a sub-feature vector representing the number of intersection identification information involved in the accounts in the account pair;
    针对于所述账号对中的每个账号,根据该账号与所述交集标识信息的关联次数,以及与该类型的所有的标识信息的关联次数,确定表征所述交集标识信息相对于该账号的归属度的子特征向量;For each account in the account pair, according to the number of associations between the account and the intersection identification information, and the association times with all identification information of the type, determine the relationship between the intersection identification information and the account. The sub-feature vector of the attribution degree;
    确定表征所述账号对中的账号的归属度之和的子特征向量,以及表征所述账号对中的账号的归属度之差的子特征向量;以及determining a sub-feature vector representing the sum of the degrees of belonging of the accounts in the pair of accounts, and a sub-feature vector representing the difference in the degrees of belonging of the accounts in the pair of accounts; and
    对于该类型中的每个标识信息,根据该标识信息关联过的账号的数量,确定表征该标识信息共享度的子特征向量。For each identification information in this type, according to the number of accounts associated with the identification information, a sub-feature vector representing the sharing degree of the identification information is determined.
  12. 根据权利要求7-11任一所述的装置,其中,还包括:训练单元,被配置成通过如下方式训练得到所述识别模型:The apparatus according to any one of claims 7-11, further comprising: a training unit configured to obtain the recognition model by training in the following manner:
    获取账号集合中每个账号所对应的用户信息、所涉及的标识信息;Obtain the user information and the involved identification information corresponding to each account in the account set;
    将所述账号集合中满足所述预设关联信息的账号组合为账号对,得到多个账号对;combining the accounts satisfying the preset association information in the account set into account pairs to obtain multiple account pairs;
    从所述多个账号对中筛选出满足所述预设判定条件的账号对,得到多个训练账号对,并根据所述预设判定条件对每个训练账号对设置表征该训练账号对中的账号是否属于同一人的标签;Screen out the account pairs that meet the preset judgment conditions from the multiple account pairs, obtain multiple training account pairs, and set each training account pair according to the preset judgment conditions to characterize the training account pair. Whether the account belongs to the same person's tag;
    对所述多个账号对中的每个训练账号对中的账号的用户信息、标识信息进行处理,得到该训练账号对对应的特征向量;以及processing the user information and identification information of the account in each training account pair in the plurality of account pairs to obtain a feature vector corresponding to the training account pair; and
    利用机器学习方法,以训练账号对对应的特征向量作为输入,以所输入的训练账号对对应的标签作为期望输出,训练初始识别模型,得到所述识别模型。Using the machine learning method, the feature vector corresponding to the training account pair is used as the input, and the label corresponding to the input training account pair is used as the expected output to train the initial recognition model to obtain the recognition model.
  13. 一种计算机可读介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1-6中任一所述的方法。A computer-readable medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method according to any one of claims 1-6.
  14. 一种电子设备,包括:An electronic device comprising:
    一个或多个处理器;one or more processors;
    存储装置,其上存储有一个或多个程序,a storage device on which one or more programs are stored,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-6中任一所述的方法。The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.
PCT/CN2022/070277 2021-01-14 2022-01-05 Method and device for identifying multiple accounts belonging to the same person WO2022152018A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110049322.5A CN113779346A (en) 2021-01-14 2021-01-14 Method and device for identifying one person with multiple accounts
CN202110049322.5 2021-01-14

Publications (1)

Publication Number Publication Date
WO2022152018A1 true WO2022152018A1 (en) 2022-07-21

Family

ID=78835432

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/070277 WO2022152018A1 (en) 2021-01-14 2022-01-05 Method and device for identifying multiple accounts belonging to the same person

Country Status (2)

Country Link
CN (1) CN113779346A (en)
WO (1) WO2022152018A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115730251A (en) * 2022-12-06 2023-03-03 贝壳找房(北京)科技有限公司 Relationship recognition method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779346A (en) * 2021-01-14 2021-12-10 北京沃东天骏信息技术有限公司 Method and device for identifying one person with multiple accounts

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102768659A (en) * 2011-05-03 2012-11-07 阿里巴巴集团控股有限公司 Method and system for identifying repeated account
CN110555451A (en) * 2018-05-31 2019-12-10 北京京东尚科信息技术有限公司 information identification method and device
CN110704776A (en) * 2019-09-12 2020-01-17 北京百度网讯科技有限公司 Account type identification method and device and electronic equipment
US20200201966A1 (en) * 2018-12-21 2020-06-25 Oath Inc. Biometric based self-sovereign information management
CN113779346A (en) * 2021-01-14 2021-12-10 北京沃东天骏信息技术有限公司 Method and device for identifying one person with multiple accounts

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102768659A (en) * 2011-05-03 2012-11-07 阿里巴巴集团控股有限公司 Method and system for identifying repeated account
CN110555451A (en) * 2018-05-31 2019-12-10 北京京东尚科信息技术有限公司 information identification method and device
US20200201966A1 (en) * 2018-12-21 2020-06-25 Oath Inc. Biometric based self-sovereign information management
CN110704776A (en) * 2019-09-12 2020-01-17 北京百度网讯科技有限公司 Account type identification method and device and electronic equipment
CN113779346A (en) * 2021-01-14 2021-12-10 北京沃东天骏信息技术有限公司 Method and device for identifying one person with multiple accounts

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115730251A (en) * 2022-12-06 2023-03-03 贝壳找房(北京)科技有限公司 Relationship recognition method

Also Published As

Publication number Publication date
CN113779346A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
CN109522483B (en) Method and device for pushing information
WO2020238320A1 (en) Method and device for generating emoticon
CN109993150B (en) Method and device for identifying age
US11327960B1 (en) Systems and methods for data parsing
US11373251B1 (en) System and method to augment electronic documents with externally produced metadata to improve processing
WO2019228494A1 (en) Method and device for determining type of wireless access point
WO2022152018A1 (en) Method and device for identifying multiple accounts belonging to the same person
CN108491267B (en) Method and apparatus for generating information
CN109359194B (en) Method and apparatus for predicting information categories
US20220229980A1 (en) Systems and methods for data parsing
CN108280200B (en) Method and device for pushing information
US20220067580A1 (en) Dynamic analysis and monitoring of machine learning processes
WO2021189789A1 (en) Inquiry information processing method and device
US20220198579A1 (en) System and method for dimensionality reduction of vendor co-occurrence observations for improved transaction categorization
CN110059172B (en) Method and device for recommending answers based on natural language understanding
CN113962401A (en) Federal learning system, and feature selection method and device in federal learning system
US11640613B2 (en) Motion-enabled transaction system using air sign symbols
CN115374207A (en) Service processing method and device, electronic equipment and computer readable storage medium
CN112669000A (en) Government affair item processing method and device, electronic equipment and storage medium
CN113486749A (en) Image data collection method, device, electronic equipment and computer readable medium
WO2022047571A1 (en) Dynamic analysis and monitoring machine learning processes
CN111782776A (en) Method and device for realizing intention identification through slot filling
CN112131502A (en) Data processing method, data processing apparatus, electronic device, and medium
WO2022057270A1 (en) Information processing method and apparatus
WO2022062507A1 (en) Information recommendation method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22738899

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22738899

Country of ref document: EP

Kind code of ref document: A1