CN116362737B - Account clustering method and device, computer readable storage medium and terminal - Google Patents
Account clustering method and device, computer readable storage medium and terminal Download PDFInfo
- Publication number
- CN116362737B CN116362737B CN202310625405.3A CN202310625405A CN116362737B CN 116362737 B CN116362737 B CN 116362737B CN 202310625405 A CN202310625405 A CN 202310625405A CN 116362737 B CN116362737 B CN 116362737B
- Authority
- CN
- China
- Prior art keywords
- account
- preliminary
- account information
- association
- clustered
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/08—Payment architectures
- G06Q20/10—Payment architectures specially adapted for electronic funds transfer [EFT] systems; specially adapted for home banking systems
- G06Q20/102—Bill distribution or payments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0633—Lists, e.g. purchase orders, compilation or processing
- G06Q30/0635—Processing of requisition or of purchase orders
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Marketing (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
An account clustering method and device, a computer readable storage medium and a terminal, wherein the method comprises the following steps: determining a plurality of account information to be clustered; preliminary grouping is carried out on account information to be clustered to obtain a plurality of preliminary account groups, wherein account information contained in each preliminary account group belongs to the same user; performing intra-group pairing on account information in at least a part of the preliminary account groups, wherein each preliminary account group subjected to intra-group pairing obtains one or more corresponding account pairs; inputting each account pair into a preset graph calculation model to generate an account association relationship graph; splitting the account association relation graph to obtain a plurality of initialized account association subgraphs, and then performing iterative operation until no connection relation exists between any two nodes between every two account association subgraphs, stopping iteration and obtaining clustered account association subgraphs. By adopting the scheme, more accurate and complete account clustering results can be obtained.
Description
Technical Field
The present application relates to the field of information processing technologies, and in particular, to a method and apparatus for account clustering, a computer readable storage medium, and a terminal.
Background
With the development of internet technology and the rise of e-commerce platforms, the same natural person (user) often adopts different member accounts (or identity accounts) in online or offline business transaction with different business service parties (e.g., merchants or shops of different brands). For example, for a user a, when he enters or places an order at online store a1 of e-commerce platform a, the first member account is used; when the online store B1 of the e-commerce platform B enters a meeting or places a bill, if the second member account … can accurately determine the association relationship between different identity accounts, that is, based on that a plurality of different identity accounts are accurately positioned to a user belonging together, the online store B1 can assist the business service side to perform multi-channel operation and marketing activities, so that the problem of data island can be solved.
In the prior art, different identity accounts are generally associated or clustered according to information such as equipment, communication numbers, co-occurring geographic positions and the like commonly used by different business data (for example, transaction orders signed by users and different merchants), and an obtained account association result or clustering result is a set of a plurality of identity accounts belonging to the same user.
However, the limitation of the above prior art is that after obtaining a plurality of clusters, the account correlation between different clusters cannot be identified. For example, for the account cluster Q1 obtained according to the same communication number, not only the account information contained in Q1 (or Q2) belongs to the same user, but also the account information between Q1 and Q2 may have a correlation with the account cluster Q2 obtained based on the co-occurrence geographic location (i.e., all the account information of Q1 and Q2 belong to the same user). However, the existing clustering scheme cannot identify the relevance, so that the accuracy and reliability are required to be improved.
Disclosure of Invention
The technical problem solved by the embodiment of the application is how to obtain more accurate and complete account clustering results.
In order to solve the technical problems, an embodiment of the present application provides an account clustering method, including the following steps: determining a plurality of account information to be clustered; preliminary grouping is carried out on the account information to be clustered to obtain a plurality of preliminary account groups, wherein the account information contained in each preliminary account group belongs to the same user; performing intra-group pairing on account information in at least a part of the preliminary account groups, wherein each preliminary account group subjected to intra-group pairing obtains one or more corresponding account pairs; inputting the obtained account pairs into a preset graph calculation model to generate an account association relationship graph; splitting the account association relationship graph to obtain a plurality of initialization account association subgraphs, and then carrying out iterative operation based on each initialization account association subgraph until no connection relationship exists between any two nodes between every two account association subgraphs, stopping iteration and obtaining clustered account association subgraphs, wherein in each iterative operation, the plurality of account association subgraphs with the nodes with the connection relationship are combined into a single account association subgraph; the nodes are used for indicating the account information, and each node with a connection relationship is used for indicating the account information belonging to the same user.
Optionally, the determining the plurality of account information to be clustered includes: acquiring a plurality of business data from different business platforms, wherein each business data comprises a main identity of a user to which each business data belongs, and each business platform comprises a plurality of off-line entity stores and/or a plurality of on-line virtual stores; and for each service data, extracting the main identity of the user belonging to the service data, and taking the extracted main identities as the account information to be clustered.
Optionally, the service data is selected from: trade order data, member meeting data, interaction data.
Optionally, each service data further includes one or more secondary identities of the users to which each service data belongs; preliminary grouping is carried out on the account information to be clustered to obtain a plurality of preliminary account groups, and the method comprises the following steps: and respectively determining service data of account information sources to be clustered, and dividing the account information extracted from each service data containing the same secondary identity into a group to obtain a plurality of preliminary account groups.
Optionally, the secondary identity is selected from: communication number, social software account number, identity of service platform.
Optionally, performing intra-group pairing on account information in at least a part of the preliminary account number group includes: and for at least a part of the preliminary account groups, selecting one piece of account information to be paired from each preliminary account group, and forming paired account pairs by the rest account information in the preliminary account groups and the account information to be paired respectively.
Optionally, the graph calculation model is a Spark-graph model.
Optionally, the merging the multiple account related subgraphs with the nodes having the connection relationship into a single account related subgraph includes: and merging the plurality of account related subgraphs with the nodes with the connection relationship by adopting a graph traversal algorithm pregel so as to determine the plurality of account related subgraphs.
Optionally, after obtaining the clustered account association subgraphs, the method further includes: and generating a unique identity identifier OneID of the user to which the account associated subgraph belongs for each clustered account associated subgraph.
The embodiment of the application also provides an account clustering device, which comprises: the account information to be clustered determining module is used for determining a plurality of account information to be clustered; the primary grouping module is used for carrying out primary grouping on the account information to be clustered to obtain a plurality of primary account groups, wherein the account information contained in each primary account group belongs to the same user; the intra-group pairing module is used for performing intra-group pairing on account information in at least a part of the primary account groups, and each primary account group performing intra-group pairing obtains one or more corresponding account pairs; the diagram generating module is used for inputting the obtained account pairs into a preset diagram calculation model so as to generate an account association relation diagram; the clustering module is used for splitting the account association relation graph to obtain a plurality of initialization account association subgraphs, then carrying out iterative operation on the basis of each initialization account association subgraph until no connection relation exists between any two nodes between every two account association subgraphs, stopping iteration and obtaining each clustered account association subgraph, wherein in each iterative operation, the plurality of account association subgraphs with the nodes with the connection relation are combined into a single account association subgraph; the nodes are used for indicating the account information, and each node with a connection relationship is used for indicating the account information belonging to the same user.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when being run by a processor, performs the steps of the account clustering method.
The embodiment of the application also provides a terminal, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor executes the steps of the account clustering method when running the computer program.
Compared with the prior art, the technical scheme of the embodiment of the application has the following beneficial effects:
in the embodiment of the application, the account information to be clustered is initially grouped (or called primary clustering) so as to initially divide each account information belonging to the same user into a group; then, carrying out intra-group pairing on the preliminary grouping result to obtain paired account pairs; and performing secondary clustering by adopting a graph calculation model and an iterative operation method to obtain clustered account association subgraphs (namely, final account clustering results). Compared with the existing account clustering method which generally clusters based on commonly used equipment, communication numbers, commonly-occurring geographic positions and other information, whether account information in different obtained clusters belongs to the same user or not cannot be determined. Therefore, the method can identify the relevance of the account information in different preliminary clusters (namely whether the account information belongs to the same user) based on the iterative operation of the graph so as to optimize the preliminary clustering result, and can expand the number of the accounts in a single account cluster on the premise that the account information to be clustered is determined, so that a more accurate and complete account clustering result is obtained.
Further, the determining a plurality of account information to be clustered includes: acquiring a plurality of business data from different business platforms, wherein each business data comprises a main identity of a user to which each business data belongs, and each business platform comprises a plurality of off-line entity stores and/or a plurality of on-line virtual stores; and for each service data, extracting the main identity of the user belonging to the service data, and taking the extracted main identities as the account information to be clustered. Compared with the prior art that the account information obtained based on a single channel (for example, a single service platform) is generally clustered, the embodiment of the application clusters based on the account information obtained by a plurality of service platforms, and the obtained clustering result can realize the cross-channel user identity positioning, that is, a plurality of account information belonging to the same user and contained in each account cluster are sourced from different platforms. Thus, the adoption of the embodiment facilitates the business service body to realize the cross-platform, multi-channel operation and marketing activities of the clients or consumers.
Drawings
FIG. 1 is a flowchart of an account clustering method in an embodiment of the present application;
fig. 2 is a schematic structural diagram of an account clustering device in an embodiment of the present application.
Detailed Description
In order to make the above objects, features and advantages of the present application more comprehensible, embodiments accompanied with figures are described in detail below.
Referring to fig. 1, fig. 1 is a flowchart of an account clustering method in an embodiment of the present application. The method may include steps S11 to S15:
step S11: determining a plurality of account information to be clustered;
step S12: preliminary grouping is carried out on the account information to be clustered to obtain a plurality of preliminary account groups, wherein the account information contained in each preliminary account group belongs to the same user;
step S13: performing intra-group pairing on account information in at least a part of the preliminary account groups, wherein each preliminary account group subjected to intra-group pairing obtains one or more corresponding account pairs;
step S14: inputting the obtained account pairs into a preset graph calculation model to generate an account association relationship graph;
step S15: splitting the account association relation graph to obtain a plurality of initialization account association subgraphs, and then carrying out iterative operation based on each initialization account association subgraph until no connection relation exists between any two nodes between every two account association subgraphs, stopping iteration and obtaining clustered account association subgraphs, wherein in each iterative operation, the plurality of account association subgraphs with the nodes with the connection relation are combined into a single account association subgraph.
The nodes are used for indicating the account information, and each node with a connection relationship is used for indicating the account information belonging to the same user.
In a specific implementation of step S11, the determining a plurality of account information to be clustered may include: acquiring a plurality of business data from different business platforms, wherein each business data comprises a main identity of a user to which each business data belongs, and each business platform comprises a plurality of off-line entity stores and/or a plurality of on-line virtual stores; and for each service data, extracting the main identity of the user belonging to the service data, and taking the extracted main identities as the account information to be clustered.
In particular, the business platform may be selected from, for example, different forms of e-commerce platforms including, but not limited to, an online shopping platform, an online meal ordering platform, an online financial business platform, an online medical services platform, an online educational training platform, and various online life/entertainment services platforms, and the like. The form of the e-commerce platform may include: application (APP), applet, public number, website, etc. For another example, the service platform may also be an off-line entity service platform including, but not limited to, an off-line shopping mall, an off-line dining platform, an off-line financial service center, an off-line medical facility, an off-line educational training facility, an off-line beauty/hair/health maintenance platform, and the like.
Without limitation, the traffic data may be selected from: trade order data, member enrollment data, interaction data (e.g., user login data, lesson selection data, comment data, etc.). The service data includes a main identity of the user/client (or referred to as a service object), where the main identity is usually a member identity, and may be used to uniquely identify the identity in the process of interaction between the service object and the service platform.
For example, after the customer's first order is placed from the online shopping platform A, trade order data is created or generated in which at least the customer's first primary identity (typically an order identity account or member account) should be included. In addition, one or more secondary identities (e.g., cell phone number, mailbox number, and common social software account number, etc.) and other transaction related information may be included.
As another example, after the customer B becomes a member of the offline beauty institution B, member entry data (e.g., a member agreement) is created or generated, and at least a primary identity (typically a member account number) of the customer B should be included in the member entry data. In addition, one or more secondary identities (e.g., cell phone number, mailbox number, and common social software account number, etc.) and other member entry-related information may be included.
In the embodiment of the application, compared with the prior art that the account information obtained based on a single channel (for example, a single service platform) is clustered, the embodiment of the application clusters based on the account information obtained by a plurality of service platforms, and the obtained clustering result can realize the cross-channel user identity positioning, that is, a plurality of account information belonging to the same user and contained in each account cluster are sourced from different platforms. Thus, the adoption of the embodiment facilitates the business service body to realize the cross-platform, multi-channel operation and marketing activities of the clients or consumers.
It should be noted that, besides being able to be derived from different service data, the account information to be clustered may also be extracted from other data sources including user account information, for example, network traffic data.
In specific implementation, the flow data in the preset network range can be obtained from a total network interface corresponding to the preset network range in a data packet grabbing manner, wherein the total network interface comprises an internal network total port or an external network total port in the preset network range, and a switch or an adapter for communicating the preset network range with an external network is arranged between the internal network total port and the external network total port. For example, a data backup device or program is set in the total network interface to backup the traffic data passing through the total interface, so as to obtain the traffic data in the network range; and then extracting account information of different users from the flow data, and taking the account information as the account information to be clustered.
Specifically, in the process of network data transmission, keywords used for identifying account information of various communication software, social software and websites can be saved; if the stored keyword exists in the flow data, the account information can be determined.
In the implementation of step S12, the account information to be clustered is initially grouped to obtain a plurality of preliminary account groups, where the account information included in each preliminary account group belongs to the same user.
Further, in the step S12, the preliminary grouping is performed on the account information to be clustered to obtain a plurality of preliminary account groups, including: respectively determining service data of account information sources to be clustered, and dividing account information extracted from each service data containing the same secondary identity into a group to obtain a plurality of preliminary account groups; wherein each business data also contains one or more secondary identities of the users to which the business data belong.
The secondary identity may include, but is not limited to, a communication number (e.g., a mobile phone number, a mailbox number), a common social software account number, an identity of a service platform, and the like.
For example, the service data a includes a main identity id_a of a user and a communication number m1; the service data B contains the primary identity id_b of a certain user and also contains the communication number m1. Since the service data a and the service data B include the same communication number m1, the primary id_a included in the service data a and the primary id_b included in the service data B can be considered to belong to the same user.
Specifically, the "identity of a service platform" may refer to an identity (may be referred to as a "primary identity") given to a client by a certain service platform. Since the business platform may include multiple stores, each customer may have a member identification (which may be referred to as a "secondary identification") at each store. Thus, each "primary identity" may correspond to a plurality of "secondary identities".
It should be noted that, in the specific implementation, the preliminary grouping may also be performed in other suitable manners according to the source of the account information to be clustered. For example, as described in the foregoing step S12, the account information to be clustered is extracted from the traffic data, and the preliminary grouping of the account information to be clustered may include: and respectively determining flow data of each account information source to be clustered, and dividing the account information extracted from each flow data containing the same keywords (such as IP address, communication number and social software account) into a group to obtain a plurality of preliminary account groups.
In the implementation of step S13, the account information in at least a part of the primary account groups is subjected to intra-group pairing, and each of the primary account groups subjected to intra-group pairing obtains one or more corresponding account pairs.
Further, in the step S13, performing intra-group pairing on account information in at least a part of the preliminary account number groups includes: and for at least a part of the preliminary account groups, selecting one piece of account information to be paired from each preliminary account group, and forming paired account pairs by the rest account information in the preliminary account groups and the account information to be paired respectively.
The at least one portion of the preliminary account number groups may be a preliminary account number group selected from the plurality of preliminary account number groups obtained by performing the preliminary grouping in step S12, where the preliminary account number group includes at least two account number information (i.e., the account number information is greater than 1).
The selecting one piece of account information to be paired from each preliminary account group may specifically include: randomly selecting one account information from each preliminary account group as the account information to be paired; or, the account number may be identified for the account information in each preliminary account group, and then the account information with the preset account number is selected as the account information to be paired.
For example, account information included in a certain preliminary account group includes: ID_1, ID_2, ID_3 and ID_4, and taking ID_1 as the account information to be paired through random selection, the result of intra-group pairing on the preliminary account group is as follows: account number pair 1 (id_1 and id_2), account number pair 2 (id_1 and id_3), account number pair 3 (id_1 and id_4).
In the implementation of step S14, the obtained account pairs are input into a preset graph calculation model to generate an account association relationship graph.
The account association relationship graph is a graph data structure, and is a mesh data structure composed of a vertex (or node) set (vertex) and a relationship set (edge) between vertices. Each vertex or node user of the account association relationship graph indicates the account information (the nodes are in one-to-one correspondence with the account information), and each node with a connection relationship (or an edge) is used for indicating the account information belonging to the same user. For example, if two nodes in the graph have one connecting edge, this means that account information indicated by the two nodes respectively belongs to the same user.
In a specific implementation, the graph calculation model may be a Spark-graph model. But is not limited thereto, other graph calculation models may be adopted that can achieve the same or similar functions.
In the implementation of step S15, the specific method for splitting the account association relationship graph to obtain the plurality of initialized account association subgraphs may include: randomly splitting the account association relation graph into a first preset number of initialization association subgraphs; or, identifying node serial numbers for all nodes in the account association relation graph, and forming a sub-graph for every second preset number of nodes according to the sequence from the node serial numbers to the large number to obtain the plurality of initialization association sub-graphs.
The specific values of the first preset number and the second preset number may be set in combination with actual needs, which is not limited in the embodiment of the present application.
It can be understood that in each clustered account association subgraph, the account information indicated by each node belongs to the same user, that is, the account information set formed by the account information indicated by each node is used as an account cluster.
Further, in the step S15, merging the plurality of account related subgraphs with the nodes having the connection relationship into a single account related subgraph includes: and merging the plurality of account related subgraphs with the nodes with the connection relationship by adopting a graph traversal algorithm pregel so as to determine the plurality of account related subgraphs.
In the embodiment of the application, compared with the existing account clustering method which generally clusters based on commonly used equipment, communication numbers, commonly-occurring geographic positions and other information, whether account information in different obtained clusters belongs to the same user or not cannot be determined. Therefore, the preliminary clustering result can be optimized, the number of the accounts in the preliminary clustering can be expanded, and a more accurate and complete account clustering result can be obtained.
Further, after obtaining the clustered account association subgraphs, the method further comprises the following steps: and generating a unique identity identifier OneID of the user to which the account associated subgraph belongs for each clustered account associated subgraph.
In the embodiment of the application, because the account information contained in each clustered account association subgraph can be derived from a plurality of different channels (for example, a plurality of different online and/or offline service platforms), the OneID of the affiliated user is generated for each clustered account association subgraph, and the OneID can be used for identifying the identity of the multi-channel-sourced client. Therefore, the identification and data communication of the business objects can be completed, the problem of data islanding is solved, and the power-assisted business service main body can complete the operation and marketing activities of the whole channel.
Referring to fig. 2, fig. 2 is a schematic structural diagram of an account clustering device in an embodiment of the present application. The account clustering device may include:
the account information to be clustered determining module 21 is configured to determine a plurality of account information to be clustered;
a preliminary grouping module 22, configured to perform preliminary grouping on the account information to be clustered to obtain a plurality of preliminary account groups, where account information included in each preliminary account group belongs to the same user;
an intra-group pairing module 23, configured to perform intra-group pairing on account information in at least a portion of the primary account groups, where each of the primary account groups performing intra-group pairing obtains a corresponding one or more account pairs;
the graph generating module 24 is configured to input the obtained account pairs into a preset graph calculation model, so as to generate an account association relationship graph;
the clustering module 25 is configured to split the account association relationship graph to obtain a plurality of initialized account association subgraphs, and then perform iterative operation based on each initialized account association subgraph until no connection relationship exists between any two nodes between every two account association subgraphs, stop iteration, and obtain each clustered account association subgraph, where in each iterative operation, the plurality of account association subgraphs with the nodes having the connection relationship are combined into a single account association subgraph;
the nodes are used for indicating the account information, and each node with a connection relationship is used for indicating the account information belonging to the same user.
Regarding the principle, implementation and beneficial effects of the account clustering device, please refer to the foregoing and the related description about the account clustering method shown in fig. 1, which are not repeated herein.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, the computer program executing the steps of the account clustering method shown in fig. 1 when being run by a processor. The computer readable storage medium may include non-volatile memory (non-volatile) or non-transitory memory, and may also include optical disks, mechanical hard disks, solid state disks, and the like.
Specifically, in the embodiment of the present application, the processor may be a central processing unit (central processing unit, abbreviated as CPU), and the processor may also be other general purpose processors, digital signal processors (digital signal processor, abbreviated as DSP), application specific integrated circuits (application specific integrated circuit, abbreviated as ASIC), off-the-shelf programmable gate arrays (field programmable gate array, abbreviated as FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It should also be appreciated that the memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically erasable ROM (electrically EPROM, EEPROM), or a flash memory. The volatile memory may be a random access memory (random access memory, RAM for short) which acts as an external cache. By way of example but not limitation, many forms of random access memory (random access memory, abbreviated as RAM) are available, such as static random access memory (static RAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, abbreviated as DDR SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus random access memory (direct rambus RAM, abbreviated as DR RAM).
The embodiment of the application also provides a terminal, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor executes the steps of the account clustering method shown in the figure 1 when running the computer program. The terminal can include, but is not limited to, terminal equipment such as a mobile phone, a computer, a tablet computer, a server, a cloud platform, and the like.
It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In this context, the character "/" indicates that the front and rear associated objects are an "or" relationship.
The term "plurality" as used in the embodiments of the present application means two or more.
The first, second, etc. descriptions in the embodiments of the present application are only used for illustrating and distinguishing the description objects, and no order is used, nor is the number of the devices in the embodiments of the present application limited, and no limitation on the embodiments of the present application should be construed.
It should be noted that the serial numbers of the steps in the present embodiment do not represent a limitation on the execution sequence of the steps.
Although the present application is disclosed above, the present application is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the application, and the scope of the application should be assessed accordingly to that of the appended claims.
Claims (11)
1. An account clustering method is characterized by comprising the following steps:
determining a plurality of account information to be clustered;
preliminary grouping is carried out on the account information to be clustered to obtain a plurality of preliminary account groups, wherein the account information contained in each preliminary account group belongs to the same user;
performing intra-group pairing on account information in at least a part of the preliminary account groups, wherein each preliminary account group subjected to intra-group pairing obtains one or more corresponding account pairs;
inputting the obtained account pairs into a preset graph calculation model to generate an account association relationship graph;
splitting the account association relationship graph to obtain a plurality of initialization account association subgraphs, and then carrying out iterative operation based on each initialization account association subgraph until no connection relationship exists between any two nodes between every two account association subgraphs, stopping iteration and obtaining clustered account association subgraphs, wherein in each iterative operation, the plurality of account association subgraphs with the nodes with the connection relationship are combined into a single account association subgraph;
the node is used for indicating the account information, and each node with a connection relationship is used for indicating the account information belonging to the same user;
the method for performing intra-group pairing on account information in at least a part of the preliminary account groups comprises the following steps:
and for at least a part of the preliminary account groups, selecting one piece of account information to be paired from each preliminary account group, and forming paired account pairs by the rest account information in the preliminary account groups and the account information to be paired respectively.
2. The method of claim 1, wherein the determining a plurality of account information to be clustered comprises:
acquiring a plurality of business data from different business platforms, wherein each business data comprises a main identity of a user to which each business data belongs, and each business platform comprises a plurality of off-line entity stores and/or a plurality of on-line virtual stores;
and for each service data, extracting the main identity of the user belonging to the service data, and taking the extracted main identities as the account information to be clustered.
3. The method of claim 2, wherein the traffic data is selected from the group consisting of:
trade order data, member meeting data, interaction data.
4. The method of claim 2, wherein each of the traffic data further comprises one or more secondary identities of the respective user;
preliminary grouping is carried out on the account information to be clustered to obtain a plurality of preliminary account groups, and the method comprises the following steps:
and respectively determining service data of account information sources to be clustered, and dividing the account information extracted from each service data containing the same secondary identity into a group to obtain a plurality of preliminary account groups.
5. The method of claim 4, wherein the secondary identity is selected from the group consisting of:
communication number, social software account number, identity of service platform.
6. The method of claim 1, wherein the graph calculation model is a Spark-graph model.
7. The method according to claim 1, wherein the merging the plurality of account related subgraphs for which the node having the connection relationship exists into a single account related subgraph comprises:
and merging the plurality of account related subgraphs with the nodes with the connection relationship by adopting a graph traversal algorithm pregel so as to determine the plurality of account related subgraphs.
8. The method of claim 1, wherein after obtaining the clustered individual account association subgraphs, the method further comprises:
and generating a unique identity identifier OneID of the user to which the account associated subgraph belongs for each clustered account associated subgraph.
9. An account clustering device, comprising:
the account information to be clustered determining module is used for determining a plurality of account information to be clustered;
the primary grouping module is used for carrying out primary grouping on the account information to be clustered to obtain a plurality of primary account groups, wherein the account information contained in each primary account group belongs to the same user;
the intra-group pairing module is used for performing intra-group pairing on account information in at least a part of the primary account groups, and each primary account group performing intra-group pairing obtains one or more corresponding account pairs;
the diagram generating module is used for inputting the obtained account pairs into a preset diagram calculation model so as to generate an account association relation diagram;
the clustering module is used for splitting the account association relation graph to obtain a plurality of initialization account association subgraphs, then carrying out iterative operation on the basis of each initialization account association subgraph until no connection relation exists between any two nodes between every two account association subgraphs, stopping iteration and obtaining each clustered account association subgraph, wherein in each iterative operation, the plurality of account association subgraphs with the nodes with the connection relation are combined into a single account association subgraph;
the node is used for indicating the account information, and each node with a connection relationship is used for indicating the account information belonging to the same user;
the method for performing intra-group pairing on account information in at least a part of the preliminary account groups comprises the following steps:
and for at least a part of the preliminary account groups, selecting one piece of account information to be paired from each preliminary account group, and forming paired account pairs by the rest account information in the preliminary account groups and the account information to be paired respectively.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the account clustering method of any one of claims 1 to 8.
11. A terminal comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, characterized in that the processor executes the steps of the account clustering method according to any one of claims 1 to 8 when the computer program is executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310625405.3A CN116362737B (en) | 2023-05-29 | 2023-05-29 | Account clustering method and device, computer readable storage medium and terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310625405.3A CN116362737B (en) | 2023-05-29 | 2023-05-29 | Account clustering method and device, computer readable storage medium and terminal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116362737A CN116362737A (en) | 2023-06-30 |
CN116362737B true CN116362737B (en) | 2023-10-13 |
Family
ID=86910677
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310625405.3A Active CN116362737B (en) | 2023-05-29 | 2023-05-29 | Account clustering method and device, computer readable storage medium and terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116362737B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117725441A (en) * | 2023-12-21 | 2024-03-19 | 北京火山引擎科技有限公司 | Rights management method and device, readable storage medium and electronic equipment |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105630904A (en) * | 2015-12-21 | 2016-06-01 | 中国电子科技集团公司第十五研究所 | Internet account information mining method and device |
CN109447177A (en) * | 2018-11-12 | 2019-03-08 | 南京中孚信息技术有限公司 | Account clustering method, device and server |
CN110852739A (en) * | 2018-08-20 | 2020-02-28 | 北京嘀嘀无限科技发展有限公司 | Account number merging method, device, equipment and computer readable storage medium |
CN111125469A (en) * | 2019-12-09 | 2020-05-08 | 重庆邮电大学 | User clustering method and device for social network and computer equipment |
CN111368013A (en) * | 2020-06-01 | 2020-07-03 | 深圳市卡牛科技有限公司 | Unified identification method, system, equipment and storage medium based on multiple accounts |
CN111701247A (en) * | 2020-07-13 | 2020-09-25 | 腾讯科技(深圳)有限公司 | Method and equipment for determining unified account |
CN113641657A (en) * | 2021-08-23 | 2021-11-12 | 苏州良医汇网络科技有限公司 | Method, device and equipment for merging user accounts |
CN114254278A (en) * | 2021-11-19 | 2022-03-29 | 中国建设银行股份有限公司 | User account merging method and device, computer equipment and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100057580A1 (en) * | 2008-08-28 | 2010-03-04 | Radha Raghunathan | Unified payment card |
US10061841B2 (en) * | 2015-10-21 | 2018-08-28 | International Business Machines Corporation | Fast path traversal in a relational database-based graph structure |
-
2023
- 2023-05-29 CN CN202310625405.3A patent/CN116362737B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105630904A (en) * | 2015-12-21 | 2016-06-01 | 中国电子科技集团公司第十五研究所 | Internet account information mining method and device |
CN110852739A (en) * | 2018-08-20 | 2020-02-28 | 北京嘀嘀无限科技发展有限公司 | Account number merging method, device, equipment and computer readable storage medium |
CN109447177A (en) * | 2018-11-12 | 2019-03-08 | 南京中孚信息技术有限公司 | Account clustering method, device and server |
CN111125469A (en) * | 2019-12-09 | 2020-05-08 | 重庆邮电大学 | User clustering method and device for social network and computer equipment |
CN111368013A (en) * | 2020-06-01 | 2020-07-03 | 深圳市卡牛科技有限公司 | Unified identification method, system, equipment and storage medium based on multiple accounts |
CN111701247A (en) * | 2020-07-13 | 2020-09-25 | 腾讯科技(深圳)有限公司 | Method and equipment for determining unified account |
CN113641657A (en) * | 2021-08-23 | 2021-11-12 | 苏州良医汇网络科技有限公司 | Method, device and equipment for merging user accounts |
CN114254278A (en) * | 2021-11-19 | 2022-03-29 | 中国建设银行股份有限公司 | User account merging method and device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116362737A (en) | 2023-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Klesel et al. | A test for multigroup comparison using partial least squares path modeling | |
US20230069078A1 (en) | Systems, devices, and methods for dlt-based data management platforms and data products | |
US11694093B2 (en) | Generation of training data to train a classifier to identify distinct physical user devices in a cross-device context | |
US11042946B2 (en) | Identity mapping between commerce customers and social media users | |
US11188720B2 (en) | Computing system including virtual agent bot providing semantic topic model-based response | |
EP3472715A1 (en) | Predicting psychometric profiles from behavioral data using machine-learning while maintaining user anonymity | |
TW201917601A (en) | User intention recognition method and device capable of recognizing user intention by acquiring dialogue text from a user | |
CN112613917A (en) | Information pushing method, device and equipment based on user portrait and storage medium | |
CN116362737B (en) | Account clustering method and device, computer readable storage medium and terminal | |
US11244153B2 (en) | Method and apparatus for processing information | |
US20220101358A1 (en) | Segments of contacts | |
WO2018033052A1 (en) | Method and system for evaluating user portrait data | |
JP7237905B2 (en) | Method, apparatus and system for data mapping | |
WO2021174881A1 (en) | Multi-dimensional information combination prediction method, apparatus, computer device, and medium | |
CN111259952A (en) | Abnormal user identification method and device, computer equipment and storage medium | |
US10708234B2 (en) | System, method, and recording medium for preventing back propogation of data protection | |
JP2015162246A (en) | efficient link management for graph clustering | |
US9830377B1 (en) | Methods and systems for hierarchical blocking | |
CN114943279A (en) | Method, device and system for predicting bidding cooperative relationship | |
CN114860742A (en) | Artificial intelligence-based AI customer service interaction method, device, equipment and medium | |
CN106575418B (en) | Suggested keywords | |
US10387566B2 (en) | Assisting with written communication style based on recipient dress style | |
US20140324906A1 (en) | Method and system for focused multi-blocking to increase link identification rates in record comparison | |
CN114661887A (en) | Cross-domain data recommendation method and device, computer equipment and medium | |
CN113822691A (en) | User account identification method, device, system and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |