WO2021213069A1 - 账号的识别方法、装置、电子设备及计算机可读介质 - Google Patents

账号的识别方法、装置、电子设备及计算机可读介质 Download PDF

Info

Publication number
WO2021213069A1
WO2021213069A1 PCT/CN2021/080687 CN2021080687W WO2021213069A1 WO 2021213069 A1 WO2021213069 A1 WO 2021213069A1 CN 2021080687 W CN2021080687 W CN 2021080687W WO 2021213069 A1 WO2021213069 A1 WO 2021213069A1
Authority
WO
WIPO (PCT)
Prior art keywords
account
accounts
resource
node
identified
Prior art date
Application number
PCT/CN2021/080687
Other languages
English (en)
French (fr)
Inventor
赵可
Original Assignee
北京京东振世信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东振世信息技术有限公司 filed Critical 北京京东振世信息技术有限公司
Priority to JP2022563061A priority Critical patent/JP2023523191A/ja
Priority to US17/996,629 priority patent/US20230230081A1/en
Priority to KR1020227036298A priority patent/KR20220155377A/ko
Publication of WO2021213069A1 publication Critical patent/WO2021213069A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/22Payment schemes or models
    • G06Q20/227Payment schemes or models characterised in that multiple accounts are available, e.g. to the payer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to the field of Internet technology, and in particular, to an account identification method, an account identification device, electronic equipment, and computer-readable media.
  • the purpose of the present disclosure is to provide an account identification method, an account identification device, an electronic device, and a computer-readable medium, so as to improve the efficiency of target account identification at least to a certain extent.
  • an account identification method including:
  • Sample accounts are sampled from the accounts to be identified by the model training server, and a target account recognition model is obtained by training using the sample accounts;
  • the acquiring, through the account processing server, a resource transfer record different from the resource pre-acquired account and the resource receiving account, and generating an account relationship data table according to the resource transfer record includes:
  • the account data of the resource transfer record is put into the account relationship data table.
  • the dividing the resource pre-acquisition account and the resource receiving account in the resource transfer record into multiple connected account sets according to the account relationship data table includes:
  • connection points corresponding to the same vertex in the account node table into the same set as the adjacency set corresponding to the vertex, and generating a node adjacency table according to the adjacency set;
  • candidate node adjacency table is different from the node adjacency table, use the candidate node adjacency table as the node adjacency table, and update the candidate node adjacency table;
  • candidate node adjacency table is the same as the node adjacency table, multiple connected account sets are obtained according to the node adjacency table.
  • the obtaining a candidate node adjacency table according to each adjacency set in the node adjacency table includes:
  • a candidate adjacency set is obtained by taking a union set of each adjacency set corresponding to the same vertex, and a candidate node adjacency table is generated according to the candidate adjacency set.
  • the determining the account to be identified in each connected account set according to the connected relationship between each account in the connected account set includes:
  • the sample account obtained by sampling from the account to be identified by the model training server, and the target account recognition model obtained by training using the sample account includes:
  • the training a target account recognition model with the account data indicator of the sample account as input and the label corresponding to the sample account as output includes:
  • Training the target account recognition model constructed by the random forest algorithm by taking the multiple model training data sets as input and the label corresponding to the sample account as output.
  • the judging whether the account to be identified is a target account through the target account identification model includes:
  • the output of the target account identification model is the first label, then it is determined that the account to be identified is the target account.
  • an account identification device including:
  • the account relationship data table generation module is configured to execute the acquisition of resource transfer records different from the resource pre-acquired account and the resource receiving account through the account processing server, and generate an account relationship data table according to the resource transfer records;
  • the connected account set dividing module is configured to execute dividing the resource pre-acquisition account and the resource receiving account in the resource transfer record into multiple connected account sets according to the account relationship data table;
  • the to-be-recognized account determination module is configured to determine the to-be-recognized account in each connected account set according to the connected relationship between the respective accounts in the connected-account set, and send the to-be-recognized account to the model training server ;
  • An account recognition model training module configured to execute sample accounts obtained by sampling from the accounts to be recognized through the model training server, and train to obtain a target account recognition model by using the sample accounts;
  • the target account judgment module is configured to execute the judgment of whether the account to be identified is a target account through the target account identification model.
  • an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the executable instructions Perform the method as described in the first aspect.
  • a computer-readable medium having a computer program stored thereon, and the computer program, when executed by a processor, implements the method as described in the first aspect.
  • FIG. 1 shows a schematic flowchart of an account identification method according to an exemplary embodiment of the present disclosure
  • FIG. 2 shows a schematic diagram of a process of generating an account relationship data table in an exemplary embodiment of the present disclosure
  • FIG. 3 shows a schematic flowchart of determining a set of connected accounts in an exemplary embodiment of the present disclosure
  • FIG. 4 schematically shows a schematic diagram of obtaining user relationship edges according to a specific embodiment of the present disclosure
  • Fig. 5 schematically shows a schematic diagram of obtaining a node adjacency table according to a specific embodiment of the present disclosure
  • FIG. 6 shows a schematic flowchart of determining adjacency table of candidate nodes according to an exemplary embodiment of the present disclosure
  • FIG. 7 schematically shows a schematic diagram of obtaining a node class label according to a specific embodiment of the present disclosure
  • FIG. 8 schematically shows a schematic diagram of a distributed merge, check, and merge of node-type labels according to a specific embodiment of the present disclosure
  • FIG. 9 shows a schematic diagram of a process of determining an account to be identified in an exemplary embodiment of the present disclosure
  • FIG. 10 shows a schematic flowchart of training a target account recognition model according to an exemplary embodiment of the present disclosure
  • FIG. 11 shows a schematic flowchart of training a target account recognition model constructed by a random forest algorithm according to an exemplary embodiment of the present disclosure
  • FIG. 12 shows a schematic diagram of a process of identifying a target account according to an exemplary embodiment of the present disclosure
  • FIG. 13 shows a complete block diagram of an account identification method according to a specific embodiment of the present disclosure
  • FIG. 14 shows a block diagram of an account identification device according to an exemplary embodiment of the present disclosure
  • FIG. 15 shows a schematic structural diagram of a computer system suitable for implementing an electronic device of an embodiment of the present disclosure.
  • This exemplary embodiment first provides an account identification method, which can be used to identify the account for placing an order among multiple accounts.
  • the above-mentioned account identification method may include the following steps:
  • Step S110 Obtain the resource transfer records of the resource pre-acquisition account and the resource receiving account through the account processing server, and generate an account relationship data table according to the resource transfer record.
  • the resource transfer record may refer to the order record during the shopping process.
  • the resource pre-acquisition account may refer to the user's order account when placing an order
  • the resource receiving account may refer to the user's receipt of goods. account.
  • the order processing server is a part of the server used to obtain order data from terminal devices and process the order data.
  • Terminal devices refer to electronic devices such as smartphones and computers that can place orders for goods on the Internet.
  • the order account number may refer to the mobile phone number used by the user who placed an order for a certain product on the online shopping platform, and may also include a login account and other accounts that can be used to determine the user who placed the order.
  • the receiving account can refer to the mobile phone number of the receiving user corresponding to the order, or other account that can be used to determine the receiving user.
  • one order corresponds to one order account and one delivery account
  • the order account and delivery account of the same order may be the same account or different accounts. Since this exemplary embodiment is used to identify the order account, when acquiring the order data, it is only necessary to obtain the orders with the order account and the receiving account that are different from each other, and generate an account relationship data table based on the order data.
  • the account relationship data table may include the order number, the order account number, the receiving account number, the number of orders placed, and some other order data indicators.
  • Step S120 Divide the resource pre-acquisition account and the resource receiving account in the resource transfer record into multiple connected account sets according to the account relationship data table.
  • the user-connected group is a collection of users who place orders on behalf of any pair of users, that is, a collection of connected accounts.
  • Step S130 Determine the account to be identified in each connected account set according to the connected relationship between each account in the connected account set, and send the account to be identified to the model training server.
  • connection relationship between accounts can be represented by the closeness between one account and multiple other accounts
  • the account to be identified can be determined by the closeness between the account and other accounts.
  • To determine the account to be identified from each connected account set is to determine the account with the highest degree of closeness in each connected account set, that is, the account with the highest probability of placing an order on its behalf.
  • the account to be identified is sent to the model training server, and the target account recognition model is trained in the model training server through the account to be identified.
  • the model training server is a part of the server used to process training data and train the target account identification model based on the training data.
  • Step S140 Sample accounts are sampled from the accounts to be recognized by the model training server, and the target account recognition model is obtained by training the sample accounts.
  • the model training server After obtaining the account to be identified in each connected account set, the model training server extracts a part of the account to be identified as a sample account, and judges whether this part of the sample account is a target account. According to the account data indicators of the sample account obtained from the account relationship data table and the judgment result of whether it is the target account, the target account recognition model is trained. This model can be used to determine whether the account is the target account. When placing an order account, the target account identification model can be used to identify the account number placed on behalf of the order.
  • Step S150 Determine whether the account to be identified is the target account through the target account recognition model.
  • the account data indicators of the account to be recognized are input into the trained target account recognition model, and it can be judged whether the account to be recognized is the target account.
  • a plurality of accounts to be identified can be determined according to the connection relationship between each account, and the target account identification model is trained by a part of the sample accounts extracted from the account to be identified, and the above target account identification is used.
  • the model determines which of the accounts to be identified are the target accounts.
  • the account identification method in the exemplary embodiment of the present disclosure can train the account identification model through sample accounts obtained by sampling, thereby identifying the accounts in multiple resource transfer records, determining the target account among them, and improving the identification of the account. Efficiency also greatly reduces the workload of the staff. Therefore, through the above method, it is possible to identify the account numbers of multiple orders, determine the account number that placed the order, and then identify the real consumer group.
  • step S110 the account processing server obtains resource transfer records with different resource pre-acquisition accounts and resource receiving accounts, and generates an account relationship data table according to the resource transfer records, which may specifically include the following steps:
  • Step S210 Obtain account data in all resource transfer records through the account processing server, and determine whether the resource pre-acquisition account and the resource receiving account in the account data in the resource transfer record are the same.
  • the order processing server can obtain the account data in all resource transfer records sent by the terminal device, that is, the account data of all orders and store it in the data storage module of the server, and then obtain the account data from the data storage module of the server and perform data processing.
  • the data storage module can include the order number, the mobile phone number of the user who placed the order, the mobile phone number of the receiving user, the number of orders placed, and some other data information in the order.
  • the account data of orders within one month can be obtained, and the account data of orders within one quarter can also be obtained for analysis, which is not specifically limited.
  • Step S220 If the resource pre-acquisition account and the resource receiving account in the resource transfer record are the same, the account data of the resource transfer record is filtered out.
  • Step S230 If the resource pre-acquisition account and the resource receiving account in the resource transfer record are different, the account data of the resource transfer record is put into the account relationship data table.
  • the account data corresponding to the order is put into the account relationship data table.
  • the account can be divided into multiple connected account sets according to the relationship between the order account corresponding to each order in the account relationship data table and the receiving account.
  • the specific method is combined with Figure 3 and Figure 4 below. Be explained.
  • step S120 as shown in FIG. 3, dividing the resource pre-acquisition account and the resource receiving account in the resource transfer record into multiple connected account sets according to the account relationship data table, which may specifically include the following steps:
  • Step S310 Obtain the resource pre-acquisition account and resource-receiving account in the resource transfer record from the account relationship data table, and use the resource pre-acquisition account and resource-receiving account in each resource transfer record as the account node to generate multiple sets of account node relationships right.
  • each account can be divided into multiple connected account sets using a distributed merge search method, or multiple connected account sets may be obtained through other methods.
  • This example embodiment does not make specific restrictions, and only uses distributed merge Take the collection method as an example to illustrate.
  • the distributed union search method is a method to obtain a connected graph by merging pairs of connected nodes.
  • the distributed merge search method uses MapReduce (mapping and reduction) distributed operations, using label functions to assign labels to connected account nodes, and then iteratively perform node label data classification according to judgment conditions. Block merging operation until the class label of each node no longer changes.
  • MapReduce mapping and reduction
  • the order user table 401 is obtained from the account relationship data table.
  • the order user table 401 includes the account of the placing user and the receiving user corresponding to each order, because the ordering user and receiving user of order G The users are the same, so the data of order G is excluded and not considered.
  • Step S320 Take one account node in each group of account node relationship pairs as a vertex, and the other account node as a connection point corresponding to the vertex, to obtain an account node table.
  • One account node in the account node relationship pair is used as a vertex, and the other account node is used as a connection point corresponding to the vertex to expand sequentially to obtain the account node table of each account node, as shown in the account node table 501 in FIG. 5.
  • Step S330 Put the connection points corresponding to the same vertex in the account node table into the same set as the adjacency set corresponding to the vertex, and generate the node adjacency table according to the adjacency set.
  • the node adjacency table 502 is obtained according to the account node table 501.
  • the connection points corresponding to the same vertices in the account node table 501 and the vertices themselves can be put into the same set as the adjacency set corresponding to the vertices. For example, if the connection points corresponding to mobile phone 2 are mobile phone 1 and mobile phone 3, put the vertex mobile phone 2 and the connection points mobile phone 1 and mobile phone 3 into the adjacency set corresponding to mobile phone 2, and the adjacency set corresponding to mobile phone 2 is ⁇ 1,2 ,3 ⁇ , and so on.
  • Step S340 Obtain a candidate node adjacency table according to each adjacency set in the node adjacency table, and determine whether the candidate node adjacency table and the node adjacency table are the same.
  • the node adjacency table 502 is used as the initial node adjacency table, and the MapReduce distributed operation is used again to construct the label function F so that each node obtains the node adjacency table as its class label L to obtain the candidate node adjacency table, and judge the candidate node adjacency table and Whether the node adjacency table is the same.
  • Step S350 If the candidate node adjacency table is different from the node adjacency table, the candidate node adjacency table is used as the node adjacency table, and the candidate node adjacency table is updated.
  • the candidate node adjacency table is used to replace the initial node adjacency table, the candidate node adjacency table is updated again, and the iteration judgment flag flag count is increased by 1. Among them, the iteration judgment flag flag is reset to 0 at the beginning of each iteration. If the candidate node adjacency table is the same as the node adjacency table, it remains unchanged. If the candidate node adjacency table is different from the node adjacency table, the count is increased by 1.
  • Step S360 If the candidate node adjacency table is the same as the node adjacency table, obtain multiple connected account sets according to the node adjacency table.
  • the candidate node adjacency table is the same as the node adjacency table, that is, the iteration judgment flag flag is equal to 0, the iteration ends, the node adjacency table obtained in this iteration is used as the final node adjacency table, and the final node adjacency table is deduplicated
  • a set of multiple connected accounts can be obtained, and a connected group of users with shopping relationships among users can be obtained.
  • step S340 obtaining a candidate node adjacency table according to each adjacency set in the node adjacency table, which may specifically include the following steps:
  • Step S610 Use each account node in the adjacency set as a vertex, and use the adjacency set where the account node is located as the adjacency set corresponding to the vertex.
  • the label function F is used to obtain the adjacency set of the account node from the node adjacency table 502 as the class label of the node, and the node adjacency table 502 Each node adjacency set of is expanded in turn to obtain a node class label set 701.
  • Step S620 The adjacency sets corresponding to the same vertex are merged to obtain a candidate adjacency set, and a candidate node adjacency table is generated according to the candidate adjacency set.
  • each vertex in the node class label set 701 each vertex in the node class label set 701 and its corresponding class label are traversed.
  • the class labels with the same vertices are merged, and finally a candidate adjacency set corresponding to each account node is obtained, and a candidate node adjacency table 801 is generated according to each candidate adjacency set.
  • each connected account set determines a to-be-identified account that is most likely to be the target account. In recognition, it is to determine an account with the highest probability of placing an order on its behalf.
  • the tight centrality algorithm can be used to mine key nodes in the network. By calculating the reciprocal of the average value of the shortest distance from each node to all other reachable nodes, it can be used to measure the length of the distance transmitted from the node to other nodes (i.e., tight sex).
  • the account to be identified in each connected account set can be determined by the tight centrality algorithm.
  • the specific method is as follows:
  • determining the account to be identified in each connected account set according to the connected relationship between each account in the connected account set may specifically include the following steps:
  • Step S910 Obtain the number of resource transfers between each group of resource pre-acquisition accounts and resource receiving accounts in the connected account set through the account relationship data table.
  • the number of resource transfers between the resource pre-acquisition account and the resource receiving account that is, the number of orders placed between the order account and the receiving account.
  • a directed graph of the user relationship within the user connected group in each connected account set is constructed. There is an out-degree relationship between the ordering user a and the receiving user b. That is, the receiving relationship, the number of orders placed between the order user a and the receiving user b is obtained.
  • Step S920 Obtain the total number of accounts in the connected account set, and the number of connected accounts that have a resource acquisition relationship with the resource pre-acquired account in the connected account set.
  • the total number of accounts in the connected account set can be represented by N
  • the total number of connected accounts that have a receiving relationship with account v can be represented by R(v).
  • Step S930 Obtain the tightness of the resource pre-acquisition account according to the number of resource transfers and the number of connected accounts and the total number of accounts in the connected account set.
  • the tightness weight of the resource pre-acquisition account can be obtained according to the number of resource transfers, that is, the tightness weight w out is the reciprocal of the number of orders.
  • the shortest distance from user v to user u through d(v,u) is:
  • Step S940 Determine an account to be identified in each connected account set according to the tightness of all resource pre-acquired accounts in the connected account set.
  • the user i corresponding to the maximum close centrality C max (i) in the connected account set may be used as the account to be identified in the set, that is, the account that is suspected of placing an order on behalf of the user.
  • the target account recognition model can be trained based on the sample accounts drawn from the account to be identified, and the model can be used to judge all the accounts to be identified to obtain the target account, that is, the account to place an order on behalf of .
  • step S140 as shown in FIG. 10, sample accounts are sampled from the accounts to be identified by the model training server, and the target account recognition model is obtained by training of the sample accounts, which may specifically include the following steps:
  • Step S1010 Sort the accounts to be identified according to the closeness through the model training server, and divide all the accounts to be identified into multiple sets of accounts to be identified according to the sorting result.
  • the model training server sorts all the accounts to be identified according to the size of the close centrality, and divides all the accounts to be identified into multiple sets of accounts to be identified according to the value of the close centrality.
  • Step S1020 Extract a preset number of accounts to be recognized from each set of accounts to be recognized as sample accounts, and determine whether the sample account is a target account.
  • a preset number of to-be-identified accounts are selected from each set of to-be-identified accounts as sample accounts, and it is determined whether these sample accounts are target accounts.
  • the specific method of judgment can be through outbound calls to the ordering users corresponding to these sample accounts to determine whether the sample account is an ordering account, or other methods can be used to determine the sample account. There are no specific restrictions in this example implementation. .
  • Step S1030 Add a first label to the target account in the sample account, and add a second label to the remaining sample accounts in the sample account.
  • Step S1040 Obtain the account data indicator of the sample account through the account relationship data table, and use the account data indicator of the sample account as input and the label corresponding to the sample account as output to train the target account recognition model.
  • training the target account recognition model may specifically include the following steps:
  • Step S1110 Obtain multiple model training data sets according to the account data indicators of the sample account, and construct a target account recognition model through the random forest algorithm.
  • the random forest algorithm divides the data by sampling N training samples randomly and with replacement from the data set samples, and only considers M random index characteristics each time.
  • the random forest algorithm performs a total of T rounds of sampling, obtains T training sets, and independently trains T decision trees. Each decision tree outputs the classification results of the decision tree, and votes on the classification results of the T decision trees to obtain the final classification result.
  • each model training data set is used for the training of T decision trees.
  • Step S1120 Take multiple model training data sets as input, and use labels corresponding to sample accounts as output to train the target account recognition model constructed by the random forest algorithm.
  • each decision tree in the model For each decision tree in the model, the account data indicators of the sample account in each model training data set are used as input, and the label corresponding to the sample account is used as output.
  • Each decision tree in the model is independently trained, and finally , Through voting on the output of each decision tree, the final result is obtained as the output of the model to complete the training of the target account recognition model.
  • step S150 judging whether the account to be identified is the target account through the target account recognition model, which may specifically include the following steps:
  • Step S1210 Obtain the account data indicator of the account to be identified through the account relationship data table, and input the account data indicator of the account to be identified into the target account identification model.
  • Step S1220 If the output of the target account identification model is the first tag, then it is determined that the account to be identified is the target account.
  • the account to be identified is determined as the target account; if the output result is the second label, it is determined that the account to be identified is not the target account . All indicators corresponding to the account to be identified are input into the target account identification model, and the target account can be identified according to the output result of the model, that is, the account that placed the order on behalf of the account can be identified.
  • FIG. 13 is a complete block diagram of a specific implementation of the present disclosure.
  • the block diagram may include three modules, and the specific steps performed in each module are as follows:
  • Step S1302. Data processing.
  • Analyze the number of orders placed between users eliminate the order data of the same mobile phone number for the user who placed the order and the user who received the order, etc.; output the user relationship data table such as the user who placed the order, the user who received the order, and the number of orders.
  • Step S1303. Distribute and search to obtain user connected groups.
  • the accounts are classified by the distributed combined search method to obtain multiple connected account sets.
  • the specific method has been explained in the foregoing embodiment, and will not be repeated here.
  • Step S1304. The user shopping relationship directed graph.
  • a directed graph of user relationships in the connected group of users in each connected account set is constructed according to multiple connected account sets and the account relationship data table.
  • Step S1305. The close centrality is suspected of user identification.
  • the user with the largest close centrality is selected from each set of connected accounts as the suspected users in the set.
  • Step S1306 Sample customer service outbound calls for marking.
  • Step S1307 Construction of a random forest classifier.
  • the random forest algorithm is used to construct a proxy order account recognition model, and the model is trained based on the account data indicators of the sample account with tags. After training, the proxy order account can be identified through the model.
  • the present disclosure also provides an account identification device.
  • the account identification device may include an account relationship data table generation module 1410, a connected account set division module 1420, a to-be-identified account determination module 1430, an account recognition model training module 1440, and a target account determination module 1450. in:
  • the account relationship data table generating module 1410 may be configured to obtain resource transfer records different from the resource pre-acquired account and the resource receiving account through the account processing server, and generate an account relationship data table based on the resource transfer records;
  • the connected account set dividing module 1420 may be configured to divide the resource pre-acquisition account and the resource receiving account in the resource transfer record into multiple connected account sets according to the account relationship data table;
  • the to-be-recognized account determination module 1430 may be configured to determine the to-be-recognized account in each connected account set according to the connected relationship between the respective accounts in the connected-account set, and send the to-be-recognized account to the model training server;
  • the account recognition model training module 1440 may be configured to execute a model training server to sample a sample account from an account to be recognized, and use the sample account to train to obtain a target account recognition model;
  • the target account determination module 1450 may be configured to perform a target account recognition model to determine whether the account to be recognized is a target account.
  • the account relationship data table generating module 1410 may include an account judging unit, an account filtering unit, and a data table generating unit. in:
  • the account judgment unit may be configured to obtain account data in all resource transfer records through the account processing server, and determine whether the resource pre-acquisition account and the resource receiving account in the account data in the resource transfer record are the same;
  • the account filtering unit may be configured to filter out the account data of the resource transfer record if the resource pre-acquisition account and the resource receiving account in the resource transfer record are the same;
  • the data table generating unit may be configured to execute, if the resource pre-acquisition account and the resource receiving account in the resource transfer record are different, the account data of the resource transfer record is put into the account relationship data table.
  • the connected account set dividing module 1420 may include a node relationship pair generation unit, an account node table generation unit, a node adjacency table generation unit, a node adjacency table judgment unit, a node adjacency table update unit, and a connected Account set determination unit. in:
  • the node relationship pair generation unit may be configured to execute the acquisition of the resource pre-acquisition account and the resource-receiving account in the resource transfer record from the account relationship data table, and use the resource pre-acquisition account and the resource-receiving account in each resource transfer record as the account.
  • the node generates multiple sets of account node relationship pairs;
  • the account node table generating unit may be configured to execute each account node relationship pair as a vertex and the other account node as the connection point corresponding to the vertex to obtain the account node table;
  • the node adjacency table generating unit may be configured to execute the placing of the connection points corresponding to the same vertex in the account node table into the same set as the adjacency set corresponding to the vertex, and generate the node adjacency table according to the adjacency set;
  • the node adjacency table judging unit may be configured to obtain a candidate node adjacency table according to each adjacency set in the node adjacency table, and determine whether the candidate node adjacency table and the node adjacency table are the same;
  • the node adjacency table update unit may be configured to perform, if the candidate node adjacency table is different from the node adjacency table, use the candidate node adjacency table as the node adjacency table, and update the candidate node adjacency table;
  • the connected account set determining unit may be configured to execute that if the candidate node adjacency table is the same as the node adjacency table, obtain multiple connected account sets according to the node adjacency table.
  • the node adjacency list judgment unit may include an adjacency set expansion unit and a candidate adjacency list generation unit. in:
  • the adjacency set expansion unit may be configured to execute each account node in the adjacency set as a vertex, and the adjacency set where the account node is located is the adjacency set corresponding to the vertex;
  • the candidate adjacency table generating unit may be configured to perform a union of each adjacency set corresponding to the same vertex to obtain a candidate adjacency set, and generate a candidate node adjacency table according to the candidate adjacency set.
  • the to-be-identified account determination module 1430 may include a closeness weight determination unit, a closeness parameter acquisition unit, a closeness calculation unit, and a to-be-identified account determination unit. in:
  • the compactness weight determination unit may be configured to obtain the number of resource transfers between each group of resource pre-acquisition accounts and resource receiving accounts in the connected account set through the account relationship data table;
  • the tightness parameter obtaining unit may be configured to perform obtaining the total number of accounts in the connected account set, and the number of connected accounts that have a resource acquisition relationship with the resource pre-acquired account in the connected account set;
  • the tightness calculation unit may be configured to obtain the tightness of the resource pre-acquisition account based on the number of resource transfers and the number of connected accounts in the connected account set and the total number of accounts;
  • the to-be-recognized account determination unit may be configured to perform pre-acquiring account tightness based on all resources in the connected account set, and determine one to-be-recognized account in each connected account set.
  • the account recognition model training module 1440 may include an account set allocation unit, a target account determination unit, an account tag adding unit, and a recognition model training unit. in:
  • the account set allocation unit may be configured to perform sorting of the accounts to be identified according to the tightness through the model training server, and divide all the accounts to be identified into multiple account sets to be identified according to the sorting result;
  • the target account judging unit may be configured to extract a preset number of to-be-identified accounts from each set of to-be-identified accounts as sample accounts, and determine whether the sample accounts are target accounts;
  • the account label adding unit may be configured to perform adding a first label to the target account in the sample account, and adding a second label to the remaining sample accounts in the sample account;
  • the recognition model training unit may be configured to obtain the account data index of the sample account through the account relationship data table, and use the account data index of the sample account as input and the label corresponding to the sample account as output to train the target account recognition model.
  • the recognition model training unit may include a recognition model construction unit and a multi-model training unit. in:
  • the recognition model construction unit may be configured to obtain multiple model training data sets according to the account data indicators of the sample account, and construct the target account recognition model through the random forest algorithm;
  • the multi-model training unit may be configured to perform training on the target account recognition model constructed by the random forest algorithm by taking multiple model training data sets as input and using labels corresponding to sample accounts as output.
  • the target account determination module 1450 may include an account data input unit and a target account identification unit. in:
  • the account data input unit may be configured to obtain account data indicators of the account to be identified through the account relationship data table, and input the account data indicators of the account to be identified into the target account identification model;
  • the target account identification unit may be configured to execute determining that the account to be identified is the target account if the output of the target account identification model is the first label.
  • FIG. 15 shows a schematic structural diagram of a computer system suitable for implementing an electronic device of an embodiment of the present disclosure.
  • the computer system 1500 includes a central processing unit (CPU) 1501, which can be based on a program stored in a read only memory (ROM) 1502 or a program loaded from a storage portion 1508 into a random access memory (RAM) 1503 And perform various appropriate actions and processing.
  • CPU 1501 read only memory
  • RAM 1503 random access memory
  • various programs and data required for system operation are also stored.
  • the CPU 1501, ROM 1502, and RAM 1503 are connected to each other through a bus 1504.
  • An input/output (I/O) interface 1505 is also connected to the bus 1504.
  • the following components are connected to the I/O interface 1505: an input part 1506 including a keyboard, a mouse, etc.; an output part 1507 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and speakers, etc.; a storage part 1508 including a hard disk, etc. ; And a communication section 1509 including a network interface card such as a LAN card, a modem, and the like. The communication section 1509 performs communication processing via a network such as the Internet.
  • the driver 1510 is also connected to the I/O interface 1505 as needed.
  • a removable medium 1511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 1510 as required, so that the computer program read therefrom is installed into the storage portion 1508 as required.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication part 1509, and/or installed from the removable medium 1511.
  • CPU central processing unit
  • the computer-readable medium shown in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wireless, wire, optical cable, RF, etc., or any suitable combination of the above.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the above-mentioned module, program segment, or part of the code contains one or more for realizing the specified logic function.
  • Executable instructions may also occur in a different order from the order marked in the drawings. For example, two blocks shown one after another can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram or flowchart, and the combination of blocks in the block diagram or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations, or can be implemented by It is realized by a combination of dedicated hardware and computer instructions.
  • the present disclosure also provides a computer-readable medium.
  • the computer-readable medium may be included in the electronic device described in the above embodiments; or it may exist alone without being assembled into the electronic device. middle.
  • the above-mentioned computer-readable medium carries one or more programs. When the above-mentioned one or more programs are executed by an electronic device, the electronic device realizes the method described in the following embodiments.

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computer Security & Cryptography (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种账号的识别方法、装置、电子设备及计算机可读介质,属于互联网技术领域。该方法包括:获取资源预获取账号与资源接收账号不同的资源转移记录,并根据资源转移记录生成账号关系数据表;根据账号关系数据表将资源预获取账号与资源接收账号划分为多个连通账号集合;根据连通账号集合中各个账号之间的连通关系确定每个连通账号集合中的待识别账号;从待识别账号中抽样得到样本账号,并利用样本账号训练得到目标账号识别模型;通过目标账号识别模型判断待识别账号是否为目标账号。通过账号之间的连通关系选出最有可能被确定为目标账号的待识别账号,并以此训练目标账号识别模型,从而提高目标账号的识别效率。

Description

账号的识别方法、装置、电子设备及计算机可读介质
相关申请的交叉引用
本申请要求于2020年04月23日提交的申请号为202010328202.4、名称为“账号的识别方法、装置、电子设备及计算机可读介质”的中国专利申请的优先权,该中国专利申请的全部内容通过引用全部并入本文。
技术领域
本公开涉及互联网技术领域,具体而言,涉及一种账号的识别方法、账号的识别装置、电子设备及计算机可读介质。
背景技术
随着网络购物的普及,在购物过程中经常会出现例如在一个网络购物平台的店铺中提供另一个网络购物平台下单业务的代下单现象。这些网络购物平台中的店铺可能是通过一些非正常手段获得优惠券吸引其他平台的用户让其提供代下单服务,也可能是针对一些习惯使用其他平台的用户提供代下单服务,还有可能是针对不会网购的消费人群提供代客户下单服务等。
目前,提供代下单服务的用户人群没有专门的风控系统进行识别,有可能会导致一系列售后问题,并且影响网络购物平台的用户体验,而由人工进行代下单账号识别的效率很低。因此,需要一种账号的识别方法解决上述问题,提高代下单账号识别的效率。
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。
发明内容
本公开的目的在于提供一种账号的识别方法、账号的识别装置、电子设备及计算机可读介质,进而至少在一定程度上提高目标账号识别的效率。
根据本公开的第一个方面,提供一种账号的识别方法,包括:
通过账号处理服务器获取资源预获取账号与资源接收账号不同的资源转移记录,并根据所述资源转移记录生成账号关系数据表;
根据所述账号关系数据表将所述资源转移记录中的资源预获取账号与资源接收账号划分为多个连通账号集合;
根据所述连通账号集合中各个账号之间的连通关系确定每个所述连通账号集合中的待识别账号,并将所述待识别账号发送至模型训练服务器;
通过所述模型训练服务器从所述待识别账号中抽样得到样本账号,并利用所述样本账号训练得到目标账号识别模型;
通过所述目标账号识别模型判断所述待识别账号是否为目标账号。
在本公开的一种示例性实施例中,所述通过账号处理服务器获取资源预获取账号与资源接收账号不同的资源转移记录,并根据所述资源转移记录生成账号关系数据表,包括:
通过账号处理服务器获取所有资源转移记录中的账号数据,并判断所述资源转移记录中所述账号数据中的资源预获取账号和资源接收账号是否相同;
若所述资源转移记录中的资源预获取账号和资源接收账号相同,则过滤掉所述资源转移记录的账号数据;
若所述资源转移记录中的资源预获取账号和资源接收账号不同,则将所述资源转移记录的账号数据放入账号关系数据表。
在本公开的一种示例性实施例中,所述根据所述账号关系数据表将所述资源转移记录中的资源预获取账号与资源接收账号划分为多个连通账号集合,包括:
从所述账号关系数据表中获取所述资源转移记录中的资源预获取账号和资源接收账号,并以每个资源转移记录中的资源预获取账号和资源接收账号作为账号节点生成多组账号节点关系对;
分别将每组账号节点关系对中的一个账号节点作为顶点,另一个账号节点作为所述顶点对应的连接点,得到账号节点表;
将所述账号节点表中对应于相同顶点的连接点放入同一个集合中,作为所述顶点对应的邻接集合,并根据所述邻接集合生成节点邻接表;
根据所述节点邻接表中的各个邻接集合得到候选节点邻接表,并判断所述候选节点邻接表与所述节点邻接表是否相同;
若所述候选节点邻接表与所述节点邻接表不同,则将所述候选节点邻接表作为所述节点邻接表,并更新所述候选节点邻接表;
若所述候选节点邻接表与所述节点邻接表相同,则根据所述节点邻接表得到多个连通账号集合。
在本公开的一种示例性实施例中,所述根据所述节点邻接表中的各个邻接集合得到候选节点邻接表,包括:
将所述邻接集合中的各个账号节点分别作为顶点,并将所述账号节点所在的邻接集合作为所述顶点对应的邻接集合;
将对应于相同顶点的各个邻接集合取并集得到候选邻接集合,并根据所述候选邻接集合生成候选节点邻接表。
在本公开的一种示例性实施例中,所述根据所述连通账号集合中各个账号之间的连通关系确定每个所述连通账号集合中的待识别账号,包括:
通过所述账号关系数据表获取所述连通账号集合中的每一组资源预获取账号与资源接收账号之间的资源转移次数;
获取所述连通账号集合中的账号总数,以及所述连通账号集合中与所述资源预获取 账号存在资源获取关系的连通账号数;
根据所述资源转移次数以及所述连通账号集合中的连通账号数和账号总数,得到所述资源预获取账号的紧密度;
根据所述连通账号集合中的所有资源预获取账号的紧密度,在每个所述连通账号集合中确定一个待识别账号。
在本公开的一种示例性实施例中,所述通过所述模型训练服务器从所述待识别账号中抽样得到样本账号,并利用所述样本账号训练得到目标账号识别模型,包括:
通过所述模型训练服务器将所述待识别账号按照所述紧密度进行排序,并根据排序结果将所有待识别账号分为多个待识别账号集合;
从每个待识别账号集合中抽取预设样本数的待识别账号作为样本账号,并判断所述样本账号是否为目标账号;
将所述样本账号中的目标账号添加第一标签,并将所述样本账号中的其余样本账号添加第二标签;
通过所述账号关系数据表获取所述样本账号的账号数据指标,并以所述样本账号的账号数据指标作为输入,以所述样本账号对应的标签作为输出,训练目标账号识别模型。
在本公开的一种示例性实施例中,所述以所述样本账号的账号数据指标作为输入,以所述样本账号对应的标签作为输出,训练目标账号识别模型,包括:
根据所述样本账号的账号数据指标得到多个模型训练数据集,并通过随机森林算法构造目标账号识别模型;
以所述多个模型训练数据集作为输入,以所述样本账号对应的标签作为输出,对所述随机森林算法构造的所述目标账号识别模型进行训练。
在本公开的一种示例性实施例中,所述通过所述目标账号识别模型判断所述待识别账号是否为目标账号,包括:
通过所述账号关系数据表获取所述待识别账号的账号数据指标,并将所述待识别账号的账号数据指标输入所述目标账号识别模型;
若所述目标账号识别模型的输出为所述第一标签,则判定所述待识别账号为目标账号。
根据本公开的第二方面,提供一种账号的识别装置,包括:
账号关系数据表生成模块,被配置为执行通过账号处理服务器获取资源预获取账号与资源接收账号不同的资源转移记录,并根据所述资源转移记录生成账号关系数据表;
连通账号集合划分模块,被配置为执行根据所述账号关系数据表将所述资源转移记录中的资源预获取账号与资源接收账号划分为多个连通账号集合;
待识别账号确定模块,被配置为执行根据所述连通账号集合中各个账号之间的连通关系确定每个所述连通账号集合中的待识别账号,并将所述待识别账号发送至模型训练服务器;
账号识别模型训练模块,被配置为执行通过所述模型训练服务器从所述待识别账号中抽样得到样本账号,并利用所述样本账号训练得到目标账号识别模型;
目标账号判断模块,被配置为执行通过所述目标账号识别模型判断所述待识别账号是否为目标账号。
根据本公开的第三方面,提供一种电子设备,包括:处理器;以及存储器,用于存储所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行如第一方面中所述的方法。
根据本公开的第四方面,提供一种计算机可读介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如第一方面中所述的方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了本公开示例实施方式的账号的识别方法的流程示意图;
图2示出了本公开示例实施方式的生成账号关系数据表的流程示意图;
图3示出了本公开示例实施方式的确定连通账号集合的流程示意图;
图4示意性示出了根据本公开的一个具体实施方式的获取用户关系边的示意图;
图5示意性示出了根据本公开的一个具体实施方式的获取节点邻接表的示意图;
图6示出了本公开示例实施方式的确定候选节点邻接表的流程示意图;
图7示意性示出了根据本公开的一个具体实施方式的获取节点类标签的示意图;
图8示意性示出了根据本公开的一个具体实施方式的节点类标签分布式并查集合并的示意图;
图9示出了本公开示例实施方式的确定待识别账号的流程示意图;
图10示出了本公开示例实施方式的训练目标账号识别模型的流程示意图;
图11示出了本公开示例实施方式的训练通过随机森林算法构造的目标账号识别模型的流程示意图;
图12示出了本公开示例实施方式的识别目标账号的流程示意图;
图13示出了根据本公开的一个具体实施例中账号的识别方法的完整框图;
图14示出了本公开示例实施方式的账号的识别装置的框图;
图15示出了适于用来实现本公开实施例的电子设备的计算机系统的结构示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本公开的实施方式的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而省略所述特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知技术方案以避免喧宾夺主而使得本公开的各方面变得模糊。
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
本示例实施方式首先提供了一种账号的识别方法,可以用于在多个账号中识别出其中的代下单账号。参考图1所示,上述账号的识别方法可以包括以下步骤:
步骤S110.通过账号处理服务器获取资源预获取账号与资源接收账号不同的资源转移记录,并根据资源转移记录生成账号关系数据表。
本示例实施方式中,资源转移记录可以指购物过程中的订单记录,相对应地,资源预获取账号可以指用户下单时的下单账号,资源接收账号可以指用户收货时使用的收货账号。
订单处理服务器是用于从终端设备获取订单数据,并对订单数据进行处理的部分服务器,而终端设备指的是智能手机、电脑等能够在网络上对商品进行下单购物的电子设备。
下单账号可以指在网络购物平台上对某一商品下单的下单用户所使用的手机号,也可以包括登录账号等其他能用来确定下单用户的账号。收货账号可以指订单对应的收货用户的手机号,或其他能用来确定收货用户的账号。
本示例实施方式中,一个订单对应一个下单账号和一个收货账号,同一个订单的下单账号和收货账号可以是同一个账号,也可以是不同的账号。由于本示例实施方式用于对代下单账号进行识别时,在获取订单数据时,只需要获取下单账号与收货账号不同的订单,并根据订单数据生成账号关系数据表。其中,账号关系数据表中可以包括订单号、下单账号、收货账号、下单次数以及其他的一些下单数据指标。
步骤S120.根据账号关系数据表将资源转移记录中的资源预获取账号与资源接收账号划分为多个连通账号集合。
在无向图中,若从顶点u到顶点v有路径边,则称点u和v是连通的。如果无向图 中任意一对顶点都是连通的,则称此图是连通图。用户连通群体即为任一对用户间存在代下单行为的用户集合,即连通账号集合。
通过账号关系数据表获取每个订单对应的下单账号和收货账号,并通过多个订单的下单账号和收货账号之间的关系,将用户账号划分为多个连通账号集合,每个连通账号集合中的各个账号之间都存在对应的购物关系。
步骤S130.根据连通账号集合中各个账号之间的连通关系确定每个连通账号集合中的待识别账号,并将待识别账号发送至模型训练服务器。
账号之间的连通关系可以通过一个账号与其他多个账号之间的紧密度来表示,待识别账号可以通过账号与其他账号之间的紧密度来确定。从每个连通账号集合中确定待识别账号,就是确定每个连通账号集合中紧密度最高的账号,也就是代下单概率最高的账号。
确定每个连通账号集合中的待识别账号后,将待识别账号发送至模型训练服务器,并通过待识别账号在模型训练服务器中训练目标账号识别模型。其中,模型训练服务器是用于处理训练数据,并根据训练数据训练目标账号识别模型的部分服务器。
步骤S140.通过模型训练服务器从待识别账号中抽样得到样本账号,并利用样本账号训练得到目标账号识别模型。
模型训练服务器在获取每个连通账号集合中的待识别账号之后,从这些待识别账号中抽取一部分作为样本账号,并判断这部分样本账号是否为目标账号。根据从账号关系数据表中获取的样本账号的账号数据指标,以及其是否为目标账号的判断结果,训练目标账号识别模型,该模型可以用于判断账号是否为目标账号,当上述目标账号为代下单账号时,目标账号识别模型就可以用于代下单账号的识别。
步骤S150.通过目标账号识别模型判断待识别账号是否为目标账号。
将待识别账号的账号数据指标输入训练好的目标账号识别模型,可以判断待识别账号是否为目标账号。
本公开示例实施方式的账号的识别方法中,可以根据各个账号之间的连通关系确定多个待识别账号,通过从待识别账号抽取的一部分样本账号训练目标账号识别模型,并使用上述目标账号识别模型判断待识别账号中哪些是目标账号。本公开示例实施方式中的账号的识别方法,可以通过抽样得到的样本账号来训练账号识别模型,从而对多个资源转移记录中的账号进行识别,判断出其中的目标账号,提高了账号的识别效率,也大大减少了工作人员的工作量。因此,通过上述方法,可以对多个订单的账号进行识别,判断出其中的代下单账号,进而识别出真实的消费群体。
下面,结合图2至图11对本示例实施方式的上述步骤进行更加详细的说明。
在步骤S110中,如图2所示,通过账号处理服务器获取资源预获取账号与资源接收账号不同的资源转移记录,并根据资源转移记录生成账号关系数据表,具体可以包括以下几个步骤:
步骤S210.通过账号处理服务器获取所有资源转移记录中的账号数据,并判断资源转移记录中账号数据中的资源预获取账号和资源接收账号是否相同。
订单处理服务器可以获取终端设备发送的所有资源转移记录中的账号数据,也就是所有订单的账号数据并存储在服务器的数据存储模块中,再从服务器的数据存储模块中获取账号数据并进行数据处理。一般来说,数据存储模块中可以包含订单号、下单用户手机号、收货用户手机号、下单次数以及订单中的一些其他数据信息。本示例实施方式中,可以获取一个月内订单的账号数据,也可以获取一个季度内订单的账号数据进行分析,不作具体限定。
步骤S220.若资源转移记录中的资源预获取账号和资源接收账号相同,则过滤掉资源转移记录的账号数据。
判断资源转移记录中的资源预获取账号和资源接收账号是否相同,就是判断一个订单的下单账号和收货账号是否为同一个账号,若相同,则不符合代下单的前提条件,则将对应订单的账号数据删除,以减轻计算的工作量。
步骤S230.若资源转移记录中的资源预获取账号和资源接收账号不同,则将资源转移记录的账号数据放入账号关系数据表。
若订单的下单账号和收货账号不同,说明该订单有代下单的可能性,则将订单对应的账号数据放入账号关系数据表中。
生成账号关系数据表后,可以根据账号关系数据表中每个订单对应的下单账号与收货账号之间的关系,将账号分为多个连通账号集合,具体方法下面结合图3和图4进行说明。
在步骤S120中,如图3所示,根据账号关系数据表将资源转移记录中的资源预获取账号与资源接收账号划分为多个连通账号集合,具体可以包括以下几个步骤:
步骤S310.从账号关系数据表中获取资源转移记录中的资源预获取账号和资源接收账号,并以每个资源转移记录中的资源预获取账号和资源接收账号作为账号节点生成多组账号节点关系对。
本示例实施方式中,可以使用分布式并查集方法将各个账号分为多个连通账号集合,也可以通过其他方法获得多个连通账号集合,本示例实施方式不作具体限制,仅以分布式并查集方法为例进行说明。
分布式并查集方法是通过合并具有连通关系的节点对获得连通图的方法。本示例实施方式中,分布式并查集方法是通过MapReduce(映射与归约)分布式操作,使用标签函数把具有连通关系的账号节点进行标签赋值,然后根据判断条件迭代进行节点类标签数据分块合并操作,直到各节点的类标签不再变化为止。
使用分布式并查集方法将账号节点分为多个连通账号集合,首先,需要基于账号关系数据表获得账号节点关系对,并将账号节点关系对按顺序进行排列,例如,可以使手机号数值小的账号在前进行处理。如图4所示,从账号关系数据表中获取订单用户表 401,订单用户表401中包括了每个订单对应的下单用户和收货用户的账号,由于订单G的下单用户和收货用户相同,因此将订单G的数据剔除,不予考虑。获取订单用户表401后,根据表中每个订单对应的下单用户和收货用户的账号,生成多组账号节点关系对,即图4中的用户关系边表402,并将表格中的账号节点关系对按照手机号数值大小进行排列。
步骤S320.分别将每组账号节点关系对中的一个账号节点作为顶点,另一个账号节点作为顶点对应的连接点,得到账号节点表。
将账号节点关系对中的一个账号节点作为顶点,另一个账号节点作为顶点对应的连接点依次展开,得到各账号节点的账号节点表,如图5中的账号节点表501所示。
步骤S330.将账号节点表中对应于相同顶点的连接点放入同一个集合中,作为顶点对应的邻接集合,并根据邻接集合生成节点邻接表。
如图5所示,根据账号节点表501得到节点邻接表502,可以通过将账号节点表501中对应于相同顶点的连接点,以及顶点本身放入同一个集合中,作为顶点对应的邻接集合,例如,手机2对应的连接点有手机1和手机3,则将顶点手机2以及连接点手机1和手机3放入手机2对应的邻接集合中,手机2对应的邻接集合即为{1,2,3},以此类推。
步骤S340.根据节点邻接表中的各个邻接集合得到候选节点邻接表,并判断候选节点邻接表与节点邻接表是否相同。
将节点邻接表502作为初始化节点邻接表,并再次使用MapReduce分布式操作,构造标签函数F使各节点获得该节点的邻接表作为其类标签L得到候选节点邻接表,并判断候选节点邻接表与节点邻接表是否相同。
步骤S350.若候选节点邻接表与节点邻接表不同,则将候选节点邻接表作为节点邻接表,并更新候选节点邻接表。
如果候选节点邻接表与节点邻接表中至少有一个邻接集合不同,则以候选节点邻接表替代初始化节点邻接表,并再次更新候选节点邻接表,同时使迭代判断标志flag计数加1。其中,迭代判断标志flag在每次迭代开始时都重置为0,若候选节点邻接表与节点邻接表相同则保持不变,若候选节点邻接表与节点邻接表不同则计数加1。
步骤S360.若候选节点邻接表与节点邻接表相同,则根据节点邻接表得到多个连通账号集合。
如果候选节点邻接表与节点邻接表相同,即迭代判断标志flag等于0,则迭代结束,将本次迭代得到的节点邻接表作为最终的节点邻接表,并将最终的节点邻接表进行去重操作即可得到多个连通账号集合,获得用户间存在购物关系的用户连通群体。
在步骤S340中,如图6所示,根据节点邻接表中的各个邻接集合得到候选节点邻接表,具体可以包括以下几个步骤:
步骤S610.将邻接集合中的各个账号节点分别作为顶点,并将账号节点所在的邻接集合作为顶点对应的邻接集合。
遍历邻接集合中的各个账号节点并将其作为顶点,如图7所示,使用标签函数F从节点邻接表502中获取账号节点所在的邻接集合作为该节点的类标签,将节点邻接表502中的各个节点邻接集合依次展开,得到节点类标签集合701。
步骤S620.将对应于相同顶点的各个邻接集合取并集得到候选邻接集合,并根据候选邻接集合生成候选节点邻接表。
如图8所示,针对节点类标签集合701中的各个顶点,遍历节点类标签集合701中各个顶点及其对应的类标签。将具有相同顶点的的类标签进行合并操作,最终得到各个账号节点对应的候选邻接集合,并根据各个候选邻接集合生成候选节点邻接表801。
根据图3至图8中的方法得到多个连通账号集合之后,在接下来的步骤中,从每个连通账号集合中确定一个最有可能为目标账号的待识别账号,在代下单账号的识别中,就是确定一个代下单概率最高的账号。
紧密中心度算法可以用来挖掘网络中的关键节点,通过计算每一节点到所有其他可达节点的最短距离平均值的倒数,可以用来衡量从该节点传输到其他节点的距离长短(即紧密性)。
本示例实施方式中,可以通过紧密中心度算法确定每个连通账号集合中的待识别账号。具体方法如下:
在步骤S130中,如图9所示,根据连通账号集合中各个账号之间的连通关系确定每个连通账号集合中的待识别账号,具体可以包括以下几个步骤:
步骤S910.通过账号关系数据表获取连通账号集合中的每一组资源预获取账号与资源接收账号之间的资源转移次数。
资源预获取账号与资源接收账号之间的资源转移次数,即下单账号和收货账号之间的下单次数。基于上述步骤中得到的多个连通账号集合,以及账号关系数据表构造每个连通账号集合中的用户连通群体内用户关系有向图,下单用户a与收货用户b间存在出度关系,即收货关系,则获取下单用户a与收货用户b之间的下单次数。
步骤S920.获取连通账号集合中的账号总数,以及连通账号集合中与资源预获取账号存在资源获取关系的连通账号数。
本示例实施方式中,连通账号集合中的账号总数可以用N表示,与账号v存在收货关系的连通账号总数可以用R(v)表示。
步骤S930.根据资源转移次数以及连通账号集合中的连通账号数和账号总数,得到资源预获取账号的紧密度。
根据资源转移次数可以得到资源预获取账号的紧密度权重,即紧密度权重w out为下单次数的倒数。
通过d(v,u)表示用户v到用户u的最短距离为:
Figure PCTCN2021080687-appb-000001
则用户v的紧密中心度C(v)可以表示为:
Figure PCTCN2021080687-appb-000002
步骤S940.根据连通账号集合中的所有资源预获取账号的紧密度,在每个连通账号集合中确定一个待识别账号。
本示例实施方式中,可以以该连通账号集合中紧密中心度最大值C max(i)对应的用户i作为该集合中的待识别账号,即疑似代下单账号。
获取各个集合中的待识别账号后,可以根据待识别账号中抽取的样本账号训练目标账号识别模型,并使用该模型对所有待识别账号进行判断,得到其中的目标账号,也就是代下单账号。
在步骤S140中,如图10所示,通过模型训练服务器从待识别账号中抽样得到样本账号,并利用样本账号训练得到目标账号识别模型,具体可以包括以下几个步骤:
步骤S1010.通过模型训练服务器将待识别账号按照紧密度进行排序,并根据排序结果将所有待识别账号分为多个待识别账号集合。
通过模型训练服务器对所有待识别账号按照紧密中心度的大小进行排序,并按照紧密中心度得值进行分段,将所有待识别账号分为多个待识别账号集合。
步骤S1020.从每个待识别账号集合中抽取预设样本数的待识别账号作为样本账号,并判断样本账号是否为目标账号。
通过分层抽样从每个待识别账号集合中选取预设样本数的待识别账号作为样本账号,并判断这些样本账号是否为目标账号。具体判断的方法可以通过对这些样本账号对应的下单用户进行外呼,以判断该样本账号是否是代下单账号,也可以通过其他方法对样本账号进行判断,本示例实施方式中不作具体限制。
步骤S1030.将样本账号中的目标账号添加第一标签,并将样本账号中的其余样本账号添加第二标签。
对样本账号进行判断后,对其中的目标账号添加第一标签,并将其余的样本账号添加第二标签,以供模型训练时使用。
步骤S1040.通过账号关系数据表获取样本账号的账号数据指标,并以样本账号的账号数据指标作为输入,以样本账号对应的标签作为输出,训练目标账号识别模型。
基于账号关系数据表获取所有样本账号的账号数据指标,包括下单地址数、优惠卷使用数、非注册用户比例、订单数、商品品类数、下单时间等指标,关联这些账号数据指标,构造模型数据集以进一步学习目标账号识别模型。
在步骤S1040中,如图11所示,以样本账号的账号数据指标作为输入,以样本账 号对应的标签作为输出,训练目标账号识别模型,具体可以包括以下几个步骤:
步骤S1110.根据样本账号的账号数据指标得到多个模型训练数据集,并通过随机森林算法构造目标账号识别模型。
随机森林算法是通过从数据集样本中随机且有放回地采样N个训练样本,且每次只考虑M个随机指标特征对数据进行划分。随机森林算法一共进行T轮采样,得到T个训练集,分别独立训练T个决策树,每个决策树输出该决策树的分类结果,并对T个决策树的分类结果进行投票得到最终的分类结果。
获取样本账号的账号数据指标后,结合步骤S1030中对样本账号添加的标签,将其划分为对应的T个模型训练数据集,每个模型训练数据集,分别用于T个决策树的训练。
步骤S1120.以多个模型训练数据集作为输入,以样本账号对应的标签作为输出,对随机森林算法构造的目标账号识别模型进行训练。
对于模型中的每个决策树,分别以每个模型训练数据集中的样本账号的账号数据指标作为输入,以样本账号对应的标签作为输出,对该模型中的每个决策树进行独立训练,最后,通过对每个决策树输出的结果进行投票得到最终的结果作为模型的输出,以完成目标账号识别模型的训练。
在步骤S150中,如图12所示,通过目标账号识别模型判断待识别账号是否为目标账号,具体可以包括以下几个步骤:
步骤S1210.通过账号关系数据表获取待识别账号的账号数据指标,并将待识别账号的账号数据指标输入目标账号识别模型。
基于账号关系数据表获取所有待识别账号的账号数据指标,包括下单地址数、优惠卷使用数、非注册用户比例、订单数、商品品类数、下单时间等指标,并将每个账号对应的指标分别输入训练好的目标账号识别模型。
步骤S1220.若目标账号识别模型的输出为第一标签,则判定待识别账号为目标账号。
若将待识别账号对应的指标输入目标账号识别模型后,其输出的结果为第一标签,则判定待识别账号为目标账号;若输出的结果为第二标签,则判定待识别账号不是目标账号。将所有的待识别账号对应的指标分别输入目标账号识别模型,根据模型输出的结果可以将其中的目标账号识别出来,即识别出其中的代下单账号。
如图13所示是应用本公开的一个具体实施方式的完整框图,该框图中可以包括三个模块,每个模块中执行的具体步骤如下:
1.在数据模块1310中可以执行以下步骤:
步骤S1301.数据存储。
包含订单号、下单用户手机号、收件用户手机号等数据存储。
步骤S1302.数据处理。
分析用户间下单次数、剔除下单用户及收货用户为同一手机号的订单数据等;输出 下单用户、收货用户、下单次数等用户关系数据表。
2.在用户连通群体识别模块1320中可以执行以下步骤:
步骤S1303.分布式并查集获取用户连通群体。
即通过分布式并查集的方法对账号进行分类以得到多个连通账号集合,具体方法已在前述实施例中做出说明,此处不再赘述。
3.在代下单用户识别模块1330中可以执行以下步骤:
步骤S1304.用户购物关系有向图。
根据多个连通账号集合以及账号关系数据表构造每个连通账号集合中的用户连通群体内用户关系有向图。
步骤S1305.紧密中心度疑似用户识别。
根据紧密中心度的大小,从每个连通账号集合中选出紧密中心度最大的用户作为该集合中的疑似用户。
步骤S1306.抽样客服外呼打标。
对所有疑似用户进行分层抽样,选出一部分样本账号进行客服外呼并添加标签。
步骤S1307.随机森林分类器构建。
通过随机森林算法构造代下单账号识别模型,并根据带有标签的样本账号的账号数据指标对该模型进行训练,训练之后即可通过该模型对代下单账号进行识别。
应当注意,尽管在附图中以特定顺序描述了本公开中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。
进一步的,本公开还提供了一种账号的识别装置。参考图14所示,该账号的识别装置可以包括账号关系数据表生成模块1410、连通账号集合划分模块1420、待识别账号确定模块1430、账号识别模型训练模块1440以及目标账号判断模块1450。其中:
账号关系数据表生成模块1410可以被配置为执行通过账号处理服务器获取资源预获取账号与资源接收账号不同的资源转移记录,并根据资源转移记录生成账号关系数据表;
连通账号集合划分模块1420可以被配置为执行根据账号关系数据表将资源转移记录中的资源预获取账号与资源接收账号划分为多个连通账号集合;
待识别账号确定模块1430可以被配置为执行根据连通账号集合中各个账号之间的连通关系确定每个连通账号集合中的待识别账号,并将待识别账号发送至模型训练服务器;
账号识别模型训练模块1440可以被配置为执行通过模型训练服务器从待识别账号中抽样得到样本账号,并利用样本账号训练得到目标账号识别模型;
目标账号判断模块1450可以被配置为执行通过目标账号识别模型判断待识别账号 是否为目标账号。
在本公开的一些示例性实施例中,账号关系数据表生成模块1410可以包括账号判断单元、账号过滤单元以及数据表生成单元。其中:
账号判断单元可以被配置为执行通过账号处理服务器获取所有资源转移记录中的账号数据,并判断资源转移记录中账号数据中的资源预获取账号和资源接收账号是否相同;
账号过滤单元可以被配置为执行若资源转移记录中的资源预获取账号和资源接收账号相同,则过滤掉资源转移记录的账号数据;
数据表生成单元可以被配置为执行若资源转移记录中的资源预获取账号和资源接收账号不同,则将资源转移记录的账号数据放入账号关系数据表。
在本公开的一些示例性实施例中,连通账号集合划分模块1420可以包括节点关系对生成单元、账号节点表生成单元、节点邻接表生成单元、节点邻接表判断单元、节点邻接表更新单元以及连通账号集合确定单元。其中:
节点关系对生成单元可以被配置为执行从账号关系数据表中获取资源转移记录中的资源预获取账号和资源接收账号,并以每个资源转移记录中的资源预获取账号和资源接收账号作为账号节点生成多组账号节点关系对;
账号节点表生成单元可以被配置为执行分别将每组账号节点关系对中的一个账号节点作为顶点,另一个账号节点作为顶点对应的连接点,得到账号节点表;
节点邻接表生成单元可以被配置为执行将账号节点表中对应于相同顶点的连接点放入同一个集合中,作为顶点对应的邻接集合,并根据邻接集合生成节点邻接表;
节点邻接表判断单元可以被配置为执行根据节点邻接表中的各个邻接集合得到候选节点邻接表,并判断候选节点邻接表与节点邻接表是否相同;
节点邻接表更新单元可以被配置为执行若候选节点邻接表与节点邻接表不同,则将候选节点邻接表作为节点邻接表,并更新候选节点邻接表;
连通账号集合确定单元可以被配置为执行若候选节点邻接表与节点邻接表相同,则根据节点邻接表得到多个连通账号集合。
在本公开的一些示例性实施例中,节点邻接表判断单元可以包括邻接集合展开单元以及候选邻接表生成单元。其中:
邻接集合展开单元可以被配置为执行将邻接集合中的各个账号节点分别作为顶点,并将账号节点所在的邻接集合作为顶点对应的邻接集合;
候选邻接表生成单元可以被配置为执行将对应于相同顶点的各个邻接集合取并集得到候选邻接集合,并根据候选邻接集合生成候选节点邻接表。
在本公开的一些示例性实施例中,待识别账号确定模块1430可以包括紧密度权重确定单元、紧密度参数获取单元、紧密度计算单元以及待识别账号确定单元。其中:
紧密度权重确定单元可以被配置为执行通过账号关系数据表获取连通账号集合中 的每一组资源预获取账号与资源接收账号之间的资源转移次数;
紧密度参数获取单元可以被配置为执行获取连通账号集合中的账号总数,以及连通账号集合中与资源预获取账号存在资源获取关系的连通账号数;
紧密度计算单元可以被配置为执行根据资源转移次数以及连通账号集合中的连通账号数和账号总数,得到资源预获取账号的紧密度;
待识别账号确定单元可以被配置为执行根据连通账号集合中的所有资源预获取账号的紧密度,在每个连通账号集合中确定一个待识别账号。
在本公开的一些示例性实施例中,账号识别模型训练模块1440可以包括账号集合分配单元、目标账号判断单元、账号标签添加单元以及识别模型训练单元。其中:
账号集合分配单元可以被配置为执行通过模型训练服务器将待识别账号按照紧密度进行排序,并根据排序结果将所有待识别账号分为多个待识别账号集合;
目标账号判断单元可以被配置为执行从每个待识别账号集合中抽取预设样本数的待识别账号作为样本账号,并判断样本账号是否为目标账号;
账号标签添加单元可以被配置为执行将样本账号中的目标账号添加第一标签,并将样本账号中的其余样本账号添加第二标签;
识别模型训练单元可以被配置为执行通过账号关系数据表获取样本账号的账号数据指标,并以样本账号的账号数据指标作为输入,以样本账号对应的标签作为输出,训练目标账号识别模型。
在本公开的一些示例性实施例中,识别模型训练单元可以包括识别模型构造单元以及多模型训练单元。其中:
识别模型构造单元可以被配置为执行根据样本账号的账号数据指标得到多个模型训练数据集,并通过随机森林算法构造目标账号识别模型;
多模型训练单元可以被配置为执行以多个模型训练数据集作为输入,以样本账号对应的标签作为输出,对随机森林算法构造的目标账号识别模型进行训练。
在本公开的一些示例性实施例中,目标账号判断模块1450可以包括账号数据输入单元以及目标账号识别单元。其中:
账号数据输入单元可以被配置为执行通过账号关系数据表获取待识别账号的账号数据指标,并将待识别账号的账号数据指标输入目标账号识别模型;
目标账号识别单元可以被配置为执行若目标账号识别模型的输出为第一标签,则判定待识别账号为目标账号。
上述账号的识别装置中各模块/单元的具体细节在相应的方法实施例部分已有详细的说明,此处不再赘述。
图15示出了适于用来实现本公开实施例的电子设备的计算机系统的结构示意图。
需要说明的是,图15示出的电子设备的计算机系统1500仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图15所示,计算机系统1500包括中央处理单元(CPU)1501,其可以根据存储在只读存储器(ROM)1502中的程序或者从存储部分1508加载到随机访问存储器(RAM)1503中的程序而执行各种适当的动作和处理。在RAM 1503中,还存储有系统操作所需的各种程序和数据。CPU 1501、ROM 1502以及RAM 1503通过总线1504彼此相连。输入/输出(I/O)接口1505也连接至总线1504。
以下部件连接至I/O接口1505:包括键盘、鼠标等的输入部分1506;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分1507;包括硬盘等的存储部分1508;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分1509。通信部分1509经由诸如因特网的网络执行通信处理。驱动器1510也根据需要连接至I/O接口1505。可拆卸介质1511,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1510上,以便于从其上读出的计算机程序根据需要被安装入存储部分1508。
特别地,根据本公开的实施例,下文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分1509从网络上被下载和安装,和/或从可拆卸介质1511被安装。在该计算机程序被中央处理单元(CPU)1501执行时,执行本公开的系统中限定的各种功能。
需要说明的是,本公开所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以 代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
作为另一方面,本公开还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该电子设备执行时,使得该电子设备实现如下述实施例中所述的方法。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块,但是这种划分并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多模块的特征和功能可以在一个模块中具体化。反之,上文描述的一个模块的特征和功能可以进一步划分为由多个模块来具体化。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。

Claims (11)

  1. 一种账号的识别方法,包括:
    通过账号处理服务器获取资源预获取账号与资源接收账号不同的资源转移记录,并根据所述资源转移记录生成账号关系数据表;
    根据所述账号关系数据表将所述资源转移记录中的资源预获取账号与资源接收账号划分为多个连通账号集合;
    根据所述连通账号集合中各个账号之间的连通关系确定每个所述连通账号集合中的待识别账号,并将所述待识别账号发送至模型训练服务器;
    通过所述模型训练服务器从所述待识别账号中抽样得到样本账号,并利用所述样本账号训练得到目标账号识别模型;
    通过所述目标账号识别模型判断所述待识别账号是否为目标账号。
  2. 根据权利要求1所述的账号的识别方法,其中,所述通过账号处理服务器获取资源预获取账号与资源接收账号不同的资源转移记录,并根据所述资源转移记录生成账号关系数据表,包括:
    通过账号处理服务器获取所有资源转移记录中的账号数据,并判断所述资源转移记录中所述账号数据中的资源预获取账号和资源接收账号是否相同;
    若所述资源转移记录中的资源预获取账号和资源接收账号相同,则过滤掉所述资源转移记录的账号数据;
    若所述资源转移记录中的资源预获取账号和资源接收账号不同,则将所述资源转移记录的账号数据放入账号关系数据表。
  3. 根据权利要求1所述的账号的识别方法,其中,所述根据所述账号关系数据表将所述资源转移记录中的资源预获取账号与资源接收账号划分为多个连通账号集合,包括:
    从所述账号关系数据表中获取所述资源转移记录中的资源预获取账号和资源接收账号,并以每个资源转移记录中的资源预获取账号和资源接收账号作为账号节点生成多组账号节点关系对;
    分别将每组账号节点关系对中的一个账号节点作为顶点,另一个账号节点作为所述顶点对应的连接点,得到账号节点表;
    将所述账号节点表中对应于相同顶点的连接点放入同一个集合中,作为所述顶点对应的邻接集合,并根据所述邻接集合生成节点邻接表;
    根据所述节点邻接表中的各个邻接集合得到候选节点邻接表,并判断所述候选节点邻接表与所述节点邻接表是否相同;
    若所述候选节点邻接表与所述节点邻接表不同,则将所述候选节点邻接表作为所述节点邻接表,并更新所述候选节点邻接表;
    若所述候选节点邻接表与所述节点邻接表相同,则根据所述节点邻接表得到多个连 通账号集合。
  4. 根据权利要求3所述的账号的识别方法,其中,所述根据所述节点邻接表中的各个邻接集合得到候选节点邻接表,包括:
    将所述邻接集合中的各个账号节点分别作为顶点,并将所述账号节点所在的邻接集合作为所述顶点对应的邻接集合;
    将对应于相同顶点的各个邻接集合取并集得到候选邻接集合,并根据所述候选邻接集合生成候选节点邻接表。
  5. 根据权利要求1所述的账号的识别方法,其中,所述根据所述连通账号集合中各个账号之间的连通关系确定每个所述连通账号集合中的待识别账号,包括:
    通过所述账号关系数据表获取所述连通账号集合中的每一组资源预获取账号与资源接收账号之间的资源转移次数;
    获取所述连通账号集合中的账号总数,以及所述连通账号集合中与所述资源预获取账号存在资源获取关系的连通账号数;
    根据所述资源转移次数以及所述连通账号集合中的连通账号数和账号总数,得到所述资源预获取账号的紧密度;
    根据所述连通账号集合中的所有资源预获取账号的紧密度,在每个所述连通账号集合中确定一个待识别账号。
  6. 根据权利要求5所述的账号的识别方法,其中,所述通过所述模型训练服务器从所述待识别账号中抽样得到样本账号,并利用所述样本账号训练得到目标账号识别模型,包括:
    通过所述模型训练服务器将所述待识别账号按照所述紧密度进行排序,并根据排序结果将所有待识别账号分为多个待识别账号集合;
    从每个待识别账号集合中抽取预设样本数的待识别账号作为样本账号,并判断所述样本账号是否为目标账号;
    将所述样本账号中的目标账号添加第一标签,并将所述样本账号中的其余样本账号添加第二标签;
    通过所述账号关系数据表获取所述样本账号的账号数据指标,并以所述样本账号的账号数据指标作为输入,以所述样本账号对应的标签作为输出,训练目标账号识别模型。
  7. 根据权利要求6所述的账号的识别方法,其中,所述以所述样本账号的账号数据指标作为输入,以所述样本账号对应的标签作为输出,训练目标账号识别模型,包括:
    根据所述样本账号的账号数据指标得到多个模型训练数据集,并通过随机森林算法构造目标账号识别模型;
    以所述多个模型训练数据集作为输入,以所述样本账号对应的标签作为输出,对所述随机森林算法构造的所述目标账号识别模型进行训练。
  8. 根据权利要求6所述的账号的识别方法,其中,所述通过所述目标账号识别模 型判断所述待识别账号是否为目标账号,包括:
    通过所述账号关系数据表获取所述待识别账号的账号数据指标,并将所述待识别账号的账号数据指标输入所述目标账号识别模型;
    若所述目标账号识别模型的输出为所述第一标签,则判定所述待识别账号为目标账号。
  9. 一种账号的识别装置,包括:
    账号关系数据表生成模块,被配置为执行通过账号处理服务器获取资源预获取账号与资源接收账号不同的资源转移记录,并根据所述资源转移记录生成账号关系数据表;
    连通账号集合划分模块,被配置为执行根据所述账号关系数据表将所述资源转移记录中的资源预获取账号与资源接收账号划分为多个连通账号集合;
    待识别账号确定模块,被配置为执行根据所述连通账号集合中各个账号之间的连通关系确定每个所述连通账号集合中的待识别账号,并将所述待识别账号发送至模型训练服务器;
    账号识别模型训练模块,被配置为执行通过所述模型训练服务器从所述待识别账号中抽样得到样本账号,并利用所述样本账号训练得到目标账号识别模型;
    目标账号判断模块,被配置为执行通过所述目标账号识别模型判断所述待识别账号是否为目标账号。
  10. 一种电子设备,包括:
    处理器;以及
    存储器,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1至8中任一项所述的账号的识别方法。
  11. 一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实现如权利要求1至8中任一项所述的账号的识别方法。
PCT/CN2021/080687 2020-04-23 2021-03-15 账号的识别方法、装置、电子设备及计算机可读介质 WO2021213069A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2022563061A JP2023523191A (ja) 2020-04-23 2021-03-15 アカウントの識別方法、装置、電子機器及びコンピュータ読み取り可能な媒体
US17/996,629 US20230230081A1 (en) 2020-04-23 2021-03-15 Account identification method, apparatus, electronic device and computer readable medium
KR1020227036298A KR20220155377A (ko) 2020-04-23 2021-03-15 계정의 식별방법, 식별장치, 전자 디바이스 및 컴퓨터 판독 가능한 매체

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010328202.4A CN113554438B (zh) 2020-04-23 2020-04-23 账号的识别方法、装置、电子设备及计算机可读介质
CN202010328202.4 2020-04-23

Publications (1)

Publication Number Publication Date
WO2021213069A1 true WO2021213069A1 (zh) 2021-10-28

Family

ID=78101060

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/080687 WO2021213069A1 (zh) 2020-04-23 2021-03-15 账号的识别方法、装置、电子设备及计算机可读介质

Country Status (5)

Country Link
US (1) US20230230081A1 (zh)
JP (1) JP2023523191A (zh)
KR (1) KR20220155377A (zh)
CN (1) CN113554438B (zh)
WO (1) WO2021213069A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115730251A (zh) * 2022-12-06 2023-03-03 贝壳找房(北京)科技有限公司 关系识别方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117522418B (zh) * 2024-01-05 2024-03-26 南京晟斯科技有限公司 一种基于SaaS模式的学员信息数据管理系统及方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160267166A1 (en) * 2015-03-10 2016-09-15 Microsoft Technology Licensing, Llc Methods of searching through indirect cluster connections
CN107463551A (zh) * 2017-07-17 2017-12-12 广州特道信息科技有限公司 社交网络人际关系的分析方法及装置
CN108038744A (zh) * 2017-10-17 2018-05-15 中体彩科技发展有限公司 体彩防代购方法及装置
CN108305099A (zh) * 2018-01-18 2018-07-20 阿里巴巴集团控股有限公司 确定代购用户的方法及装置
CN108322473A (zh) * 2018-02-12 2018-07-24 北京京东金融科技控股有限公司 用户行为分析方法与装置
CN109858919A (zh) * 2017-11-27 2019-06-07 阿里巴巴集团控股有限公司 异常账号的确定方法及装置、在线下单方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10438130B2 (en) * 2015-12-01 2019-10-08 Palo Alto Research Center Incorporated Computer-implemented system and method for relational time series learning
CN110278175B (zh) * 2018-03-14 2020-06-02 阿里巴巴集团控股有限公司 图结构模型训练、垃圾账户识别方法、装置以及设备
CN109063966B (zh) * 2018-07-03 2022-02-01 创新先进技术有限公司 风险账户的识别方法和装置
CN109241418B (zh) * 2018-08-22 2024-04-09 中国平安人寿保险股份有限公司 基于随机森林的异常用户识别方法及装置、设备、介质
CN110020866B (zh) * 2019-01-22 2023-06-13 创新先进技术有限公司 一种识别模型的训练方法、装置及电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160267166A1 (en) * 2015-03-10 2016-09-15 Microsoft Technology Licensing, Llc Methods of searching through indirect cluster connections
CN107463551A (zh) * 2017-07-17 2017-12-12 广州特道信息科技有限公司 社交网络人际关系的分析方法及装置
CN108038744A (zh) * 2017-10-17 2018-05-15 中体彩科技发展有限公司 体彩防代购方法及装置
CN109858919A (zh) * 2017-11-27 2019-06-07 阿里巴巴集团控股有限公司 异常账号的确定方法及装置、在线下单方法及装置
CN108305099A (zh) * 2018-01-18 2018-07-20 阿里巴巴集团控股有限公司 确定代购用户的方法及装置
CN108322473A (zh) * 2018-02-12 2018-07-24 北京京东金融科技控股有限公司 用户行为分析方法与装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115730251A (zh) * 2022-12-06 2023-03-03 贝壳找房(北京)科技有限公司 关系识别方法
CN115730251B (zh) * 2022-12-06 2024-06-07 贝壳找房(北京)科技有限公司 关系识别方法

Also Published As

Publication number Publication date
US20230230081A1 (en) 2023-07-20
CN113554438A (zh) 2021-10-26
KR20220155377A (ko) 2022-11-22
JP2023523191A (ja) 2023-06-02
CN113554438B (zh) 2023-12-05

Similar Documents

Publication Publication Date Title
CN106709777A (zh) 一种订单聚类方法及装置,以及反恶意信息的方法及装置
WO2021213069A1 (zh) 账号的识别方法、装置、电子设备及计算机可读介质
CN111368147A (zh) 图特征处理的方法及装置
CN112016855B (zh) 基于关系网匹配的用户行业识别方法、装置和电子设备
CN111815169A (zh) 业务审批参数配置方法及装置
CN109658120B (zh) 一种业务数据处理方法以及装置
US20180032880A1 (en) Using Learned Application Flow to Predict Outcomes and Identify Trouble Spots in Network Business Transactions
CN110224859A (zh) 用于识别团伙的方法和系统
US20210349920A1 (en) Method and apparatus for outputting information
CN109284342A (zh) 用于输出信息的方法和装置
CN113095723A (zh) 优惠券的推荐方法及装置
CN111245815B (zh) 数据处理方法、装置、存储介质及电子设备
CN113779346A (zh) 用于识别一人多账号的方法及装置
WO2023185125A1 (zh) 产品资源的数据处理方法及装置、电子设备、存储介质
CN111415168A (zh) 一种交易告警的方法和装置
CN113869904B (zh) 可疑数据识别方法、装置、电子设备、介质和计算机程序
US11030673B2 (en) Using learned application flow to assist users in network business transaction based apps
CN114202418A (zh) 信息处理方法、装置、设备及介质
CN110895564A (zh) 一种潜在客户数据处理方法和装置
CN112990311A (zh) 一种准入客户的识别方法和装置
CN113052635A (zh) 人口属性标签预测方法、系统、计算机设备和存储介质
CN111046894A (zh) 识别马甲账号的方法和装置
CN111753111A (zh) 图片搜索方法和装置
CN115082079B (zh) 关联用户的识别方法、装置、计算机设备及存储介质
CN112570287B (zh) 一种垃圾分类方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21791883

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022563061

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20227036298

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06.02.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21791883

Country of ref document: EP

Kind code of ref document: A1