CN112084422A - Intelligent processing method and device for account data - Google Patents

Intelligent processing method and device for account data Download PDF

Info

Publication number
CN112084422A
CN112084422A CN202010896462.1A CN202010896462A CN112084422A CN 112084422 A CN112084422 A CN 112084422A CN 202010896462 A CN202010896462 A CN 202010896462A CN 112084422 A CN112084422 A CN 112084422A
Authority
CN
China
Prior art keywords
account
information
community
account information
subgroup
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010896462.1A
Other languages
Chinese (zh)
Other versions
CN112084422B (en
Inventor
赖茂立
吴翰昌
陈龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010896462.1A priority Critical patent/CN112084422B/en
Priority claimed from CN202010896462.1A external-priority patent/CN112084422B/en
Publication of CN112084422A publication Critical patent/CN112084422A/en
Application granted granted Critical
Publication of CN112084422B publication Critical patent/CN112084422B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The application discloses an account data intelligent processing method and device, wherein the method comprises the following steps: the method comprises the steps of obtaining a first relation network diagram between account information of a user and login equipment information, converting the first relation network diagram into a second relation network diagram between the account information, and determining the connection weight between every two pieces of account information in the second relation network diagram. And carrying out community division on the account information in the second relation network graph based on the connection weight to obtain a target account community set. According to the user portrait information, performing cluster analysis on each target account community in the target account community set to obtain account cluster subgroups, performing rationality verification on the account cluster subgroups, and when the rationality verification of any account cluster subgroup passes, generating user identification information of the account cluster subgroups passing the rationality verification. According to the method, the accuracy of identifying the account information of the same user is improved and the error of the identification result is reduced in a mode of combining graph calculation and information entropy.

Description

Intelligent processing method and device for account data
Technical Field
The application relates to the field of account data processing, in particular to an intelligent account data processing method and device.
Background
When a user plays game activities, shopping activities and network social activities, the user may have multiple accounts and log in on different devices due to the virtualization and anonymity of the network. In order to protect the privacy of the user, the operator anonymizes the user device information related to the privacy of the user, so that the complete user device information cannot be collected usually.
In the prior art, device information is used as unique identification information of a user, a logged-in account belongs to the same user according to the device information, and an abnormal user is identified according to the device information and the corresponding user. Due to the fact that complete equipment information cannot be acquired, and the user has actions of borrowing colleagues or colleagues, friends and the like to log in application software through mobile phones, the acquired user account and the corresponding information of the equipment are dispersed, the user account is difficult to belong to the same user, and the management difficulty of the user account is large.
Disclosure of Invention
The application provides an account data intelligent processing method and device, which can improve accuracy of identifying account information under the same user, reduce errors of identification results and improve effectiveness of user management.
In one aspect, the present application provides an account data intelligent processing method, including:
acquiring a first relation network diagram between account information of a user and login equipment information;
converting the first relation network graph into a second relation network graph among account information;
determining the connection weight between every two pieces of account information in the second relationship network graph;
based on the connection weight, carrying out community division on account information in the second relationship network graph to obtain a target account community set;
acquiring user portrait information corresponding to the account information in the second relationship network diagram;
according to the user portrait information, performing cluster analysis on each target account community in the target account community set to obtain at least one account cluster subgroup;
acquiring attribute labels corresponding to the information of the account numbers in the at least one account number cluster subgroup;
performing rationality verification on the at least one account cluster subgroup based on attribute labels corresponding to the account information in the at least one account cluster subgroup;
and when the rationality verification of any account number cluster subgroup passes, generating user identification information of the account number cluster subgroup with the passing rationality verification.
Another aspect provides an account data intelligent processing device, which includes: the system comprises a first relation network generation module, a second relation network acquisition module, a connection weight determination module, an account community acquisition module, a user portrait acquisition module, an account cluster subgroup acquisition module, an attribute tag acquisition module, an account cluster subgroup verification module and a user identifier generation module;
the first relation network generating module is used for acquiring a first relation network diagram between account information of a user and login equipment information;
the second relation network acquisition module is used for converting the first relation network diagram into a second relation network diagram among account information;
the connection weight determining module is used for determining the connection weight between every two pieces of account information in the second relationship network graph;
the account community acquisition module is used for carrying out community division on account information in the second relationship network graph based on the connection weight to obtain a target account community set;
the user portrait acquisition module is used for acquiring user portrait information corresponding to the account information in the second relationship network diagram;
the account clustering subgroup acquisition module is used for carrying out clustering analysis on each account community in the target account community set according to the user portrait information to obtain at least one account clustering subgroup;
the attribute tag acquisition module is used for acquiring an attribute tag corresponding to each account information in the at least one account cluster subgroup;
the account clustering subgroup verification module is used for verifying the rationality of at least one account clustering subgroup based on attribute labels corresponding to each account information in the at least one account clustering subgroup;
and the user identification generation module is used for generating user identification information of the account cluster subgroup with the passing rationality verification when the rationality verification of any account cluster subgroup passes.
In another aspect, an electronic device is provided, where the electronic device includes a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the above account data intelligent processing method.
Another aspect provides a computer-readable storage medium, where the storage medium includes a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement an account data intelligent processing method described above.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the method provided in the various alternative implementation modes of the account data processing or the user management.
The application provides an account data intelligent processing method and device, and the method comprises the following steps: the method comprises the steps of obtaining account information and login equipment information of a user, constructing a first relation network diagram between the account information and the login equipment information, and converting the first relation network diagram into a second relation network diagram between the account information. And adding connection weight to the connection relation between the account information in the second relation network graph, and carrying out community division on the account information of the second relation network graph based on the connection weight to obtain an account community set. And clustering the account number community set to obtain at least one account number cluster subgroup. And verifying the rationality of the account number cluster subgroups in an information entropy mode, and generating corresponding user identification information for the verified account number cluster subgroups. By means of combination of graph calculation and information entropy, accuracy of identifying account information of the same user is improved, and errors of identification results are reduced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of an account data intelligent processing method according to an embodiment of the present application;
fig. 2 is a flowchart of an account data intelligent processing method according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a first relationship network diagram formed by account information and login device information in an account data intelligent processing method according to an embodiment of the present application;
fig. 4 is a flowchart of a method for converting a first relationship network diagram into a second relationship network diagram in an account data intelligent processing method according to an embodiment of the present application;
fig. 5 is a schematic diagram illustrating a distance between an account and a device in an account data intelligent processing method according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a second relationship network diagram formed between account information in an account data intelligent processing method according to an embodiment of the present application;
fig. 7 is a flowchart of a method for determining connection weight in an account data intelligent processing method according to an embodiment of the present application;
fig. 8 is a flowchart of a method for performing pruning operation in an account data intelligent processing method according to an embodiment of the present application;
fig. 9 is a flowchart of a method for acquiring a target account community set in an account data intelligent processing method according to an embodiment of the present disclosure;
fig. 10 is a flowchart of a method for verifying the rationality of an account cluster subgroup in an account data intelligent processing method according to an embodiment of the present application;
fig. 11 is a flowchart of a method for verifying the consistency degree of an account cluster subgroup in an account data intelligent processing method according to an embodiment of the present application;
fig. 12 is a flowchart of a method for verifying credibility of an account cluster subgroup in an account data intelligent processing method according to an embodiment of the present disclosure;
fig. 13 is a flowchart of a method for managing an account according to user identification information in an account data intelligent processing method according to an embodiment of the present application;
fig. 14 is a data processing flow chart of an account data intelligent processing method applied in a game scene according to an embodiment of the present application;
fig. 15 is a schematic structural diagram of an account data intelligent processing apparatus according to an embodiment of the present application;
fig. 16 is a hardware structural diagram of an apparatus for implementing the method provided in the embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, the present application will be further described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the description of the present application, it is to be understood that the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. Moreover, the terms "first," "second," and the like, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein.
Please refer to fig. 1, which shows an application scenario diagram of an account data intelligent processing method provided in an embodiment of the present application, where the application scenario may include a user terminal 110 and a server 120, and the server 120 obtains account information and login device information of a user corresponding to the user terminal 110, constructs a first relationship network diagram of the account information and the login device information, and converts the first relationship network diagram into a second relationship network diagram of the account information. The server 120 performs community division after adding the connection weight to the second relationship network graph, so as to obtain a set of target account communities. The server 120 clusters the target account communities in the set of target account communities according to the user portrait information to obtain account cluster subgroups. The server 120 verifies the rationality of the account cluster subgroup through the information entropy and generates user identification information for the verified account cluster subgroup. The server 120 analyzes the account information in the corresponding account cluster subgroup based on the user identification information and the user behavior information.
In the embodiment of the present application, the client 110 includes a physical device of a smart phone, a desktop computer, a tablet computer, a notebook computer, a digital assistant, a smart wearable device, and the like, and may also include software running in the physical device, such as an application program. The operating system running on the entity device in the embodiment of the present application may include, but is not limited to, an android system, an IOS system, linux, Unix, windows, and the like. The User terminal 110 includes a User Interface (UI) layer, and the User terminal 110 collects User data from the outside through the UI layer, and sends data required for data analysis to the server 120 based on an Application Programming Interface (API).
In the embodiment of the present application, the server 120 may include a server operating independently, or a distributed server, or a server cluster composed of a plurality of servers. The server 120 may include a network communication unit, a processor, a memory, and the like. Specifically, the server 120 may be configured to perform operations such as data analysis processing, community division, clustering, and rationality verification based on user data sent by a user side, to obtain an account cluster subgroup and generate user identification information corresponding to the account cluster subgroup, and the server 120 may be further configured to manage the account information in the account cluster subgroup based on user behavior information and the user identification information.
In the embodiment of the application, the account data processing may be constructed in a Machine Learning manner, and Machine Learning (ML) is a multi-domain cross subject and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
Referring to fig. 2, an account data intelligent processing method is shown, which can be applied to a server side, and the method includes:
s210, acquiring a first relation network diagram between account information of a user and login equipment information;
specifically, as shown in fig. 3, account information and login device information of a user are acquired, and all the account information and the login device information are connected to form a heterogeneous graph, that is, a first relationship network graph, between the account information and the login device information. The account information and the login equipment information can be obtained through data of the service platform, the client, the background and the third party. In a particular embodiment, the service platform may be a game recommendation platform. In a specific embodiment, the account information may be game account information, social account information, or shopping account information, the login device information may be login device used when logging in the game account information, the social account information, or the shopping account information, the game account information may include a user name, assigned account identification information, an application identifier, and the like, and the login device information may include an International Mobile Equipment Identity (IMEI), Google advertisement identification information (Google Advertising ID), an anonymous device identifier proposed by the national work and trust department, a device identifier corresponding to an ios system or an android system, and the like.
After the account information and the login device information are obtained, the account information and the login device information can be preprocessed. The account information and the login equipment information are encoded through a Message-Digest Algorithm (md 5) and a hash Algorithm to obtain encoded account information and login equipment information, and a first relationship network diagram is formed based on the encoded account information and login equipment information. In a specific embodiment, the account information and the login device information are converted into character string data, and a hashed binary character string is obtained.
The processing algorithm is hash (col) × 10000000+ hash (md5 (col))% 10000000, wherein col columns are used as data of account information and login equipment information, hash (col) × 10000000 is a processing result obtained by directly hashing the data of the account information and the login equipment information, hash (md5 (col))% 10000000 is a processing result obtained by hashing the data of the account information and the login equipment information after md5 is performed, and the two processing results are combined to obtain a coding processing result.
The account information and the login equipment information are coded, and the identification formats of the account information and the login equipment information can be unified, so that the account information and the login equipment information can be managed conveniently, and a first relation network diagram is constructed.
S220, converting the first relation network graph into a second relation network graph among account information;
further, referring to fig. 4, the converting the first relationship network diagram into a second relationship network diagram between account information includes:
s410, determining an account which is not directly connected and an account which is directly connected in the first relationship network graph, wherein the account which is not directly connected represents two pieces of account information which are connected based on the same login equipment information in the first relationship network graph, and the account which is directly connected represents two pieces of account information which are directly connected in the first relationship network graph;
s420, acquiring account number distances among the account numbers which are not directly connected;
s430, when the account number distance between the indirectly connected account numbers is equal to a preset account number distance, taking the indirectly connected account numbers with the account number distance equal to the preset account number distance as indirectly connected account numbers;
s440, according to the directly connected account and the indirectly connected account, a second relation network graph among the account information is constructed.
Specifically, in the embodiment of the present specification, indirectly connected accounts may be connected together through at least one login device. The account numbers which are not directly connected have a connection distance, and in a specific embodiment, the connection distance between the account numbers which are not directly connected can be the minimum number of login devices in an information communication path of the two account numbers. And determining the two account information matched with the preset connection distance as indirectly connected accounts according to the preset connection distance. And connecting the indirectly connected account numbers of each group together, and removing the log-in equipment connected among the groups. In a specific embodiment, as shown in fig. 5, an open circle represents login device information, a solid circle represents account information, the account information a and the account information b are indirectly connected accounts, there are two communication paths, and only one login device is located in each of the two communication paths, so that the connection distance between the account information a and the account information b is 1, and a connection between the account information a and the account information b can be established.
After the directly connected account and the indirectly connected account are determined in the first relationship network graph, other nodes of the login device information and other nodes of the account information are removed, and the corresponding connection relationship is deleted, so that an isomorphic relationship graph between the account information and the account information can be obtained, as shown in fig. 6, a solid dot represents the account information, the isomorphic relationship graph is a second network relationship graph, and the second network relationship graph at this time is an isoright network.
The first relation network graph is converted into the second relation network graph according to a graph calculation mode, so that the connection relation between account information can be determined, the abstract description is carried out on the relation between the account information, and the subsequent steps of community division and the like are facilitated.
S230, determining the connection weight between every two pieces of account information in the second relationship network graph;
further, referring to fig. 7, the determining the connection weight between every two pieces of account information in the second relationship network graph includes:
s710, acquiring login information and/or user portrait information corresponding to the account information in the second relationship network diagram;
s720, determining the similarity between every two pieces of account information in the second relationship network graph based on the login information and/or the user portrait information;
and S730, determining the connection weight between every two pieces of account information in the second relationship network graph according to the similarity between every two pieces of account information.
Specifically, when the connection relationship in the second network relationship diagram is weighted, the connection weight may be calculated by the login information or the user portrait information corresponding to the account information in the second relationship network diagram, or the user portrait information and the login information. The login information may include the total number of active days in the model of the login device in the last half year, and the number of login devices that have logged in between the two account information. The user profile information may include age information, preference information, type information of the login device, etc. of the user. And calculating the similarity between every two account information in the second relationship network graph according to the information. And determining the connection weight between every two pieces of account information in the second relationship network graph according to the similarity. When the similarity between every two pieces of account information is larger, the two pieces of account information are more likely to belong to the same user, and therefore the connection weight is larger. When the similarity is calculated and the connection weight is further calculated, a weight calculation model can be established, login information or user image information is input, or the connection weight is output after the user image information and the login information are input, or the connection weight is calculated according to a preset calculation mode. In a specific embodiment, if the connection weight is calculated through the login information, the connection weight may be obtained by adding the total number of active days of the login device model in the last half year to the number of login device information logged between two pieces of account information after dividing the total number by a preset error correction value. The error correction value may be 30 or other values. Taking account information a and account information b in fig. 5 as an example, the total number of active days of the model of the login device in the last half year, i.e., the cumulative active days of the account information a and b on the two login devices, the number of the login devices logged in between the two account information is the number of the common login devices between the account information a and the account information b, which is 2 as shown in fig. 5. Then, the connection weight corresponding to the connection relationship between the account information a and the account information b can be calculated according to the above numerical value.
The introduction of the connection weight can show the association degree between the account information, and the community division can be better performed according to the association degree between the account information, so that the accuracy of the community division is improved.
Further, referring to fig. 8, after determining the connection weight between every two pieces of account information in the second relationship network graph, the method further includes:
s810, traversing each account number information in the second relationship network graph, and executing the following steps when each account number information is traversed:
s820, account information connected with the currently traversed account information in the second relationship network graph is determined;
s830, sequencing the connected account information from large to small according to the numerical value of the connection weight between the connected account information and the currently traversed account information to obtain a connected account information sequence;
s840, determining account information with a preset number in a connected account information sequence;
and S850, updating the second relation network graph based on the information of the account numbers with the preset number.
In some embodiments, after determining the connection weight, pruning may be performed according to the connection weight to disconnect the network with the lower connection weight. Pruning operation is carried out on each account information in the second relation network graph, traversal is carried out on the account information in the second relation network graph, and account information connected with the currently traversed account information is obtained. And sequencing the numerical values of the connection weights in the connection relation between the currently traversed account information and the connected account information from large to small to obtain a connected account information sequence. And acquiring account information with a preset number in the sequence of the connected account information, for example, selecting account information with the connection weight arranged at the top 100 digits in the sequence, and disconnecting the connection relation between the connected account information except for 100 digits and the corresponding account information, namely completing one pruning operation.
And performing the pruning processing on each account information in the second relation network graph, and updating the second relation network graph based on the account information with the preset number acquired in each connected account information sequence.
Through pruning processing, the density of the account information in the second relation network graph can be reduced, and the processing complexity is reduced.
S240, carrying out community division on account information in the second relationship network graph based on the connection weight to obtain a target account community set;
further, referring to fig. 9, the performing community division on the target account information in the second relationship network graph based on the connection weight to obtain a target account community set includes:
s910, an account community set is created, wherein each account community in the account community set comprises account information;
s920, selecting one account community from the account community set as a target account community;
s930, calculating a first modularity of the target account community based on a connection weight between the account information in the target account community and the connected account information and a connection weight between the account information in the target account community;
s940, adding the target account information in the target account community into an account community corresponding to the account information connected with the target account information to obtain an adjacent account community;
s950, calculating a second modularity of the adjacent account community based on a connection weight between the account information in the adjacent account community and the connected account information and a connection weight between the account information in the adjacent account community;
s960, acquiring a difference value between the second modularity and the first modularity;
s970, updating the account community set according to the difference value;
s980, selecting one account community from the updated account community set as a target account community, and repeating the step of updating the account community set until the difference value between the second modularity degree and the corresponding first modularity degree of the adjacent account community corresponding to any target account community in the currently updated account community set meets a preset condition;
and S990, taking the current account community set when the difference values all meet preset conditions as the target account community set.
Specifically, the second relationship network graph with the connection weight is subjected to community division, and a fast unfolding algorithm and the like may be adopted to obtain the target account community set.
When dividing a community, whether the division is reasonable or not can be determined through the modularity. The modularity refers to the ratio of edges connecting two nodes in the community structure in the network to the connection weight corresponding to each edge in the second relation network graph, and the expected value of the ratio of the edges connecting the two nodes arbitrarily under the same community structure to the connection weight corresponding to each edge in the second relation network graph is subtracted. The calculation formula of the modularity is as follows:
Figure BDA0002658612410000091
wherein the content of the first and second substances,
Figure BDA0002658612410000092
representing all connection weights in the network, Ai,jDenoted the connection weight between node i and node j, ki=∑jAi,jRepresenting the connection weight of the edge connected to node i, ciDenotes the community to which node i is assigned, (c)i,cj) And the method is used for judging whether the node i and the node j are divided in the same community, if so, returning to 1, and if not, returning to 0.
The above calculation of the modularity can be simplified as the following formula:
Figure BDA0002658612410000093
where, Σ in represents the connection weight inside the community c, and Σ tot represents the connection weight of the edge connected to the point inside the community c, including the edge inside the community and the edge outside the community.
The fast unfolding algorithm is an iterative algorithm. And establishing a corresponding account community for each piece of target account information in the second relation network. An account community is selected as a target account community, and account information in the target account community is used as an object to be divided.
According to the connection weight between the account information in the target account community and the connected account information, the connected account information in the target account community and outside the target account community, and the connection weight between the account information in the target account community, calculating a first modularity corresponding to the target account community, namely a first modularity before dividing the target account information. If the account community has only one piece of target account information, the connection relationship in the account community is the connection weight when the target account information is connected with the account, that is, the two connected pieces of target account information are the same account information.
Dividing target account information to be divided into any account community corresponding to the account information connected with the target account information to obtain an adjacent account community. And calculating a second modularity degree corresponding to the adjacent account community according to the connection weight corresponding to the account information in the adjacent account community, wherein the connection weight comprises the account information connected in the adjacent account community and outside the adjacent account community, and the connection weight between the account information in the adjacent account community, and when the second modularity degree corresponding to the adjacent account community is calculated, the target account information is already contained in the adjacent account community.
And comparing the difference between the second modularity and the first modularity, if the difference is positive, the modularity is increased after the division, the division is reasonable, and the divided result is used as a distribution result of a new account community, namely the target account information in the target account community is divided into adjacent account communities. If the difference is negative, the modularity is reduced after the division, the division is unreasonable, and the original relationship between the target account community and the adjacent account community is maintained.
And repeating the steps until the difference value between the corresponding second modularity after division and the corresponding first modularity before division is negative when any account community is selected as the target account community, which indicates that no matter how the division is performed, the modularity cannot be increased in the network, the modularity in the network reaches the maximum value at the moment, and the account community at the moment is taken as the set of the target account community.
The community division is carried out through a community division algorithm and the connection weight, the accuracy of the community division result is improved, account information with high association degree can be classified into the same account community, and the accuracy of identification of the same user is improved.
S250, acquiring user portrait information corresponding to the account information in the second relation network diagram;
s260, according to the user portrait information, performing cluster analysis on each target account community in the target account community set to obtain at least one account cluster subgroup;
specifically, user profile information of the user is introduced for each account information in the set of target account communities. The content of the user portrait information may include information of the user's province, city, preference, etc.
If several account information are used by the same user, the user images corresponding to the account information should be consistent. Then, based on the user portrait information, each target account community is clustered through a clustering algorithm, and all account information matched with the user portrait information in the target account community is divided into account cluster subgroups. At least one account cluster subgroup exists in one target account community. The account cluster subgroup can also be understood as an account system group under a certain natural person, and user portrait information corresponding to each account information in the account cluster subgroup is consistent.
By means of clustering, account information of the same user can be obtained preliminarily, and accuracy of identification of the same user is improved.
S270, acquiring attribute labels corresponding to the account information in the at least one account cluster subgroup;
s280, performing rationality verification on at least one account cluster subgroup based on attribute labels corresponding to the account information in the at least one account cluster subgroup;
further, referring to fig. 10, the performing rationality verification on the at least one account cluster subgroup based on the attribute tag corresponding to each account information in the at least one account cluster subgroup includes:
s1010, calculating the information entropy of the at least one account cluster subgroup according to the attribute label;
s1020, determining the consistency degree of the account information in the at least one account cluster subgroup according to the information entropy;
s1030, determining the credibility of the at least one account cluster subgroup according to the account information number corresponding to different attribute tags;
and S1040, performing rationality verification on the at least one account cluster subgroup according to the consistency degree and the credibility degree.
Specifically, based on the information entropy, whether the classification of the account cluster sub-group is accurate or not can be verified. The attribute labels of the account cluster subgroups may include cities corresponding to the account information in the account cluster subgroups, or account cluster subgroups to which the account information belongs, which are obtained according to research data of a third party. Through attribute labels corresponding to different account information, information entropy in account cluster subgroups to which the account information belongs can be calculated. The information entropy is the probability of occurrence of certain characteristic information, and can describe the degree of confusion of data, and when the account information of the account cluster subgroups is more consistent, the information entropy value of the account cluster subgroups is lower.
The calculation formula of the information entropy is as follows:
Figure BDA0002658612410000111
wherein p isiThe account information corresponding to different attribute labels is represented, and accounts account for the proportion of the account information in the total account cluster subgroup.
And according to the information entropy, verifying the consistency degree of the account information in the account cluster subgroup, wherein the consistency degree can be represented by R1. Meanwhile, the credibility of the account subgroup can be determined according to the corresponding account information number under different attribute labels and the total account information number in the account subgroup, and the credibility can be represented by R2.
When verification is performed, the rationality verification metric of the account cluster subgroup, namely R1R 2, can be calculated by the product of the consistency degree R1 and the credibility degree R2.
Wherein, R is a rationality verification metric value, R >0 indicates that the cluster subgroup is reasonable, and R < ═ 0 indicates that the cluster subgroup may have problems and needs to be readjusted.
The account number cluster subgroup is verified through the information entropy, the accuracy of identifying the same user is improved, and the error of an identification result is reduced.
Further, referring to fig. 11, the determining, according to the information entropy, a degree of consistency of the account information in the at least one account cluster subgroup includes:
s1110, determining a first penalty coefficient according to the type information of the attribute label;
and S1120, determining the consistency degree of the account information in the at least one account cluster subgroup according to the information entropy and the first penalty coefficient.
Specifically, R1 as a measure of the degree of conformity can be calculated from R1 ═ h (q)/B-a 1.
Where h (q) refers to the entropy of the account cluster subgroup. B is a penalty coefficient of the consistency metric, the penalty coefficient is to reduce interference caused by too many types of selected attribute tags in the information entropy calculation, and in a specific embodiment, the penalty coefficient may be the number of types of the attribute tags. A1 is a rejection threshold of the consistency measure, which is a hyperparameter and can be summarized according to a large number of practical experiences, and preferably, this value is 0.4.
Further, referring to fig. 12, the determining the credibility of the at least one account cluster subgroup according to the account information numbers corresponding to the different attribute labels includes:
s1210, acquiring the number of account information of the at least one account cluster subgroup;
s1220, acquiring the number of account information matched with different attribute labels in at least one account cluster subgroup;
s1230, determining the maximum value of the number of the account information matched with the different attribute labels;
and S1240, determining the credibility of the at least one account number cluster subgroup according to the ratio of the maximum value to the account number information number of the at least one account number cluster subgroup.
Specifically, R2 can be used as a confidence measure, which can be represented by R2 ═ MAX (X1, X2, · · Xn) -a 2.
Wherein, the ratio of the account information numbers corresponding to the selected attribute labels of different types to the total account information number is calculated, wherein the maximum value is MAX (X1, X2. cndot. Xn). A2 is a rejection threshold of the confidence measure, which is a hyper-parameter and can be summarized according to a lot of practical experience, and preferably, the value is 0.6.
In a specific embodiment, if the attribute label of a certain account cluster subgroup is city information, the account cluster subgroup counts 3 account information, where the city information of two account information is shanghai city, and the city information of another account information is null. According to the calculation formula of the information entropy, h (q) — (P (shanghai _ shanghai) × log2(P (shanghai _ shanghai)) + P (null _ null) × log2(P (null _ null))), because there are only two values, i.e., shanghai and null set null, the penalty coefficient is 2. In the account cluster subgroup, if the city information of two pieces of account information is shanghai, MAX (shanghai, null) is 2/3, and finally R1 and R2 can be obtained by calculating penalty coefficients B and MAX (shanghai, null), and finally R >0, which indicates that the division of the account information is feasible.
In a specific embodiment, if the attribute tag of a certain account cluster subgroup is third-party data, a total of 3 account information are included in the account cluster subgroup, and in the third-party data, two account information of the 3 account information are included in the same account cluster subgroup, and one account information is included in another account cluster subgroup. According to the information entropy formula, h (q) — (P (C) × log2(P (C)) + P (D) × log2(P (D))) log2(P (null _ null)), where C denotes an account cluster subgroup corresponding to two account information in the same account cluster subgroup in the third-party data, and D denotes an account cluster subgroup corresponding to account information in another account cluster subgroup, and thus P (C) ═ 2/3 and P (D) ═ 1/3. Since only two different account cluster subgroups exist for the three pieces of account information in the third-party data, the penalty factor B is 2, and correspondingly, the account information is most distributed in the third-party data as C, and includes two pieces of account information, MAX (C1, C2) is 2/3. Finally, R1 and R2 can be obtained through the information entropy H (Q), penalty coefficients B and MAX (C1, C2) of the account cluster subgroup, and R >0 is finally obtained, so that the division of the account information is feasible.
The account number cluster subgroup is verified through the information entropy, the accuracy of identifying the same user is improved, and the error of an identification result is reduced.
And S290, when the rationality of any account number cluster subgroup passes the verification, generating user identification information of the account number cluster subgroup which passes the rationality verification.
Specifically, a unique primary key can be obtained by the account cluster subgroup identification information + the account community identification information to refer to the natural person corresponding to the account cluster subgroup.
For example, the identification information of a certain account community is 82392382923823828, and a cluster subgroup of accounts under the account community is class1, then the corresponding natural person may be numbered as class1| 82392382923823828.
And taking the identification information of the reasonably verified account cluster subgroup as the user identification information, or numbering the identification information of the reasonably verified account cluster subgroup again from 0, and determining the renumbered identification information as the user identification information. The account information corresponding to the user identification information belongs to the same natural person.
In a specific embodiment of the present application, account information in the account cluster subgroups and corresponding login device information may also be numbered to obtain a list of account information and a list of login device information, which is convenient for account management.
Further, referring to fig. 13, the method further includes:
s1310, acquiring user behavior data corresponding to the account information;
s1320, determining user identification information corresponding to the account information;
and S1330, managing an account cluster subgroup corresponding to the user identification information based on the user behavior data.
Specifically, new account information corresponding to login equipment information in an account cluster subgroup corresponding to user identification information or new equipment information corresponding to account information in an account cluster subgroup corresponding to user identification information is acquired, the account information in the account cluster subgroup and the corresponding login equipment information are updated, login location information corresponding to the new account information and the new equipment information during login, such as provincial information or city information, is acquired, and if the login location information is changed compared with historical login location information, the corresponding login location information is also updated.
According to the behavior data of the account information in the account cluster subgroup, size analysis, login equipment use frequency analysis, abnormal node analysis and the like can be performed, strategic intervention can be performed in game operation according to an analysis result if the method is applied to a game scene, and adaptive pushing and the like can be performed on each account information of the user if the method is applied to other scenes, such as a shopping scene.
When the large number and the small number of the user are determined, the account information can be sequenced according to the activity degree, the consumption amount and the like of the account information in the account cluster subgroup corresponding to the user identification information in the application software, and the serial number of the large number and the small number based on the user identification information is obtained. The account information with high activity degree and high consumption amount indicates the large number of the user, and the account information with low activity degree and low consumption amount indicates the small number of the user.
When the use frequency of the login equipment is determined, the login equipment type used in login of each account information in the account cluster subgroup corresponding to the user identification information can be obtained, and the login equipment type can comprise mobile phones, computers, ipads and the like, and the types of various mobile phones. Counting all login devices which are logged in by account information in an account cluster subgroup and correspond to user identification information, sequencing the login devices according to the login times of the account information, and determining that the login device with the largest account login time is a frequently-used device and the login device with the smallest account information login time is an unusually-used device.
When identifying abnormal account information or abnormal login equipment information, merging the filtered users in the pruning operation into an account community output by a fast-unfolding algorithm through join association operation, and reconstructing an account cluster subgroup. And analyzing according to the user behavior of each account information in the newly constructed account cluster subgroup, and determining whether abnormal account information or abnormal login equipment information exists.
The account information marked as the same user is managed in a unified manner, so that the account information belonging to the same user can be intervened in a targeted manner, and the accuracy of data analysis and the effectiveness of user management are improved.
In a specific embodiment, the account data intelligent processing method provided by the embodiment of the application can be applied to a game scene. Referring to fig. 14, user data, including account information, login device information, user portrait information, and user behavior information of the user, are collected through the game recommendation platform, the game client, the game background, and other third-party data. The method comprises the steps of coding account information and login equipment information of a user, constructing a heterogeneous relation graph according to the account information and the corresponding login equipment information, converting the heterogeneous relation graph into a isomorphic relation graph between the account information and the account information, and introducing connection weight to the isomorphic graph. Pruning the same composition according to the connection weight to reduce the connection density, then carrying out community division on account information in the same composition through a fast-unfolding algorithm according to the connection weight to obtain a set of account communities with the maximum modularity in the same composition, and taking the set of account communities at the moment as a set of target account communities. And clustering the target account communities in the set of the target account communities according to the user picture information, such as provinces, cities, game preferences, model and brand preferences of the user, to obtain account cluster subgroups. And verifying whether the account information in the account cluster subgroup belongs to the same user or not according to the information entropy. Generating a unified account table and account identification information for the verified account cluster subgroup, and generating a set of sub-account identification information and a set of sub-device identification information in the account cluster subgroup. Analyzing the game account numbers of the users in the account number cluster subgroup according to the user behavior information, determining the size number of the users according to the payment amount and the activity, determining the frequently-used equipment of the users according to the equipment use frequency, and analyzing and determining abnormal account number information or abnormal login equipment information through abnormal nodes. And the account information or the analysis result in the account cluster subgroup can be updated according to the verification of the account cluster subgroup and the analysis result thereof in the online game application.
The embodiment of the application provides an account data intelligent processing method, which comprises the following steps: the method comprises the steps of obtaining account information and login equipment information of a user, constructing a first relation network diagram between the account information and the login equipment information, and converting the first relation network diagram into a second relation network diagram between the account information. And adding connection weight to the connection relation between the account information in the second relation network graph, and carrying out community division on the account information of the second relation network graph based on the connection weight to obtain an account community set. And clustering the account number community set to obtain at least one account number cluster subgroup. And verifying the rationality of the account number cluster subgroups in an information entropy mode, and generating corresponding user identification information for the verified account number cluster subgroups. According to the method, the accuracy of identifying the account information of the same user is improved and the error of the identification result is reduced in a mode of combining graph calculation and information entropy. The method uniformly manages the account information marked as the same user, can intervene the account information belonging to the same user in a targeted manner, and improves the accuracy of data analysis and the effectiveness of user management.
An embodiment of the present application further provides an account data intelligent processing apparatus, please refer to fig. 15, where the apparatus includes: a first relationship network generation module 1510, a second relationship network acquisition module 1520, a connection weight determination module 1530, an account community acquisition module 1540, a user portrait acquisition module 1550, an account cluster subgroup acquisition module 1560, an attribute tag acquisition module 1570, an account cluster subgroup verification module 1580, and a user identifier generation module 1590;
the first relationship network generation module 1510 is configured to obtain a first relationship network diagram between account information of a user and login device information;
the second relationship network obtaining module 1520 is configured to convert the first relationship network graph into a second relationship network graph among account information;
the connection weight determining module 1530 is configured to determine a connection weight between every two pieces of account information in the second relationship network graph;
the account community obtaining module 1540 is configured to perform community division on account information in the second relationship network graph based on the connection weight to obtain a target account community set;
the user portrait acquisition module 1550 is configured to acquire user portrait information corresponding to the account information in the second relationship network diagram;
the account cluster subgroup acquisition module 1560 is configured to perform cluster analysis on each account community in the target account community set according to the user portrait information to obtain at least one account cluster subgroup;
the attribute tag obtaining module 1570 is configured to obtain an attribute tag corresponding to each account information in the at least one account cluster subgroup;
the account cluster subgroup verification module 1580 is configured to perform rationality verification on at least one account cluster subgroup based on an attribute tag corresponding to each account information in the at least one account cluster subgroup;
the user identification generation module 1590 is configured to, when the rationality verification of any account cluster subgroup passes, generate user identification information of the account cluster subgroup whose rationality verification passes.
Further, the apparatus further comprises: the system comprises a user behavior acquisition module and a user management module;
the user behavior acquisition module is used for acquiring user behavior data corresponding to the account information passing the rationality verification;
and the user management module is used for managing the account cluster subgroups corresponding to the user identification information based on the user behavior data.
Further, the second relationship network obtaining module 1520 includes an account type determining unit, an account distance obtaining unit, an indirectly connected account determining unit, and a second relationship network constructing unit:
the account type determining unit is used for determining an account which is not directly connected and an account which is directly connected in the first relationship network diagram, wherein the account which is not directly connected represents two pieces of account information which are connected based on the same login equipment information in the first relationship network diagram, and the account which is directly connected represents the two pieces of account information which are directly connected in the first relationship network diagram;
the account distance acquisition unit is used for acquiring the account distance between the indirectly connected accounts;
the indirect connection account number determining unit is used for taking the indirect connection account number with the account number distance equal to the preset account number distance as the indirect connection account number when the account number distance between the indirect connection account numbers is equal to the preset account number distance;
the second relationship network construction unit is used for constructing a second relationship network graph among the account information according to the directly connected accounts and the indirectly connected accounts.
Further, the connection weight determination module 1530 includes: the device comprises a similarity information acquisition unit, a similarity determination unit and a connection weight determination unit;
the similarity information acquisition unit is used for acquiring login information and/or user portrait information corresponding to the account information in the second relationship network diagram;
the similarity determining unit is used for determining the similarity between every two pieces of account information in the second relationship network diagram based on the login information and/or the user portrait information;
the connection weight determining unit is used for determining the connection weight between every two pieces of account information in the second relationship network graph according to the similarity between every two pieces of account information.
In some specific embodiments, the apparatus further comprises: the pruning module comprises a connection information determining unit, an account information sorting unit, an account information screening unit and an updating unit;
traversing each account number information in the second relationship network graph, and executing the following steps when traversing each account number information:
the connection information determining unit is used for determining account information connected with the currently traversed account information in the second relationship network graph;
the account information sorting unit is used for sorting the connected account information from large to small according to the numerical value of the connection weight between the connected account information and the currently traversed account information to obtain a connected account information sequence;
the account information screening unit is used for determining account information with the number being preset in the front in the connected account information sequence;
the updating unit is used for updating the second relationship network diagram based on the information of the account numbers with the preset number.
Further, the account community obtaining module 1540 includes: the system comprises an initial account community creating unit, a target account community selecting unit, a first modularity degree calculating unit, a community dividing unit, a second modularity degree calculating unit, a difference value calculating unit and an account community updating unit;
the initial account community creating unit is used for creating an account community set, and each account community in the account community set comprises account information;
the target account community selection unit is used for selecting one account community from the account community set as a target account community;
the first modularity degree calculating unit is used for calculating the first modularity degree of the target account community based on the connection weight between the account information in the target account community and the connected account information and the connection weight between the account information in the target account community;
the community dividing unit is used for adding target account information in the target account community into an account community corresponding to the account information connected with the target account information to obtain an adjacent account community;
the second modularity degree calculating unit is used for calculating the second modularity degree of the adjacent account community based on the connection weight between the account information in the adjacent account community and the connected account information and the connection weight between the account information in the adjacent account community;
the difference calculation unit is used for acquiring a difference between the second modularity and the first modularity;
the account community updating unit is used for updating the account community set according to the difference value;
selecting one account community from the updated account community set as a target account community, and repeating the step of updating the account community set until the difference value between the second modularity and the corresponding first modularity of the adjacent account community corresponding to any target account community in the currently updated account community set meets a preset condition;
and taking the current account community set when the difference values all meet the preset conditions as the target account community set.
Further, the account cluster subgroup verification module 1580 comprises an information entropy calculation unit, a consistency degree determination unit, a credibility degree determination unit and a rationality verification unit;
the information entropy calculation unit is used for calculating the information entropy of the at least one account cluster subgroup according to the attribute label;
the consistency degree determining unit is used for determining the consistency degree of the account information in the at least one account cluster subgroup according to the information entropy;
the credibility degree determining unit is used for determining the credibility degree of the at least one account cluster subgroup according to the account information number corresponding to different attribute labels;
and the rationality verifying unit is used for verifying the rationality of the at least one account cluster subgroup according to the consistency degree and the credibility degree.
The device provided in the above embodiments can execute the method provided in any embodiment of the present application, and has corresponding functional modules and beneficial effects for executing the method. Technical details that are not described in detail in the above embodiments may be referred to an account data intelligent processing method provided in any embodiment of the present application.
The embodiment also provides a computer-readable storage medium, where computer-executable instructions are stored in the storage medium, and the computer-executable instructions are loaded by the processor and execute the account data intelligent processing method of the embodiment.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the method provided in the various alternative implementation modes of the account data processing or the user management.
The present embodiment also provides an apparatus, which includes a processor and a memory, where the memory stores a computer program, and the computer program is suitable for being loaded by the processor and executing the account data intelligent processing method of the present embodiment.
The device may be a computer terminal, a mobile terminal or a server, and the device may also participate in constituting the apparatus or system provided by the embodiments of the present application. As shown in fig. 16, the server 16 (or computer terminal 16 or mobile terminal 16) may include one or more (shown here as 1602a, 1602b, … …, 1602 n) processors 1602 (the processors 1602 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 1604 for storing data, and a transmission device 1606 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 16 is merely illustrative and is not intended to limit the structure of the electronic device. For example, server 16 may also include more or fewer components than shown in FIG. 16, or have a different configuration than shown in FIG. 16.
It should be noted that the one or more processors 1602 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the server 16 (or computer terminal). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).
The memory 1604 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the methods described in the embodiments of the present application, and the processor 1602 executes various functional applications and data processing by running the software programs and modules stored in the memory 1604, so as to implement the above-mentioned method for generating a self-attention-network-based time-series behavior capture box. The memory 1604 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1604 may further include memory located remotely from the processor 1602, which may be connected to the server 16 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 1606 is used for receiving or transmitting data via a network. Specific examples of such networks may include wireless networks provided by the communications provider of server 16. In one example, the transmission device 1606 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 1606 can be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the server 16 (or computer terminal).
The present specification provides method steps as described in the examples or flowcharts, but may include more or fewer steps based on routine or non-inventive labor. The steps and sequences recited in the embodiments are but one manner of performing the steps in a multitude of sequences and do not represent a unique order of performance. In the actual system or interrupted product execution, it may be performed sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or methods shown in the figures.
The configurations shown in the present embodiment are only partial configurations related to the present application, and do not constitute a limitation on the devices to which the present application is applied, and a specific device may include more or less components than those shown, or combine some components, or have an arrangement of different components. It should be understood that the methods, apparatuses, and the like disclosed in the embodiments may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a division of one logic function, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or unit modules.
Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. An account data intelligent processing method is characterized by comprising the following steps:
acquiring a first relation network diagram between account information of a user and login equipment information;
converting the first relation network graph into a second relation network graph among account information;
determining the connection weight between every two pieces of account information in the second relationship network graph;
based on the connection weight, carrying out community division on account information in the second relationship network graph to obtain a target account community set;
acquiring user portrait information corresponding to the account information in the second relationship network diagram;
according to the user portrait information, performing cluster analysis on each target account community in the target account community set to obtain at least one account cluster subgroup;
acquiring attribute labels corresponding to the information of the account numbers in the at least one account number cluster subgroup;
performing rationality verification on the at least one account cluster subgroup based on attribute labels corresponding to the account information in the at least one account cluster subgroup;
and when the rationality verification of any account number cluster subgroup passes, generating user identification information of the account number cluster subgroup with the passing rationality verification.
2. The intelligent account data processing method according to claim 1, further comprising:
acquiring user behavior data corresponding to the account information passing the rationality verification;
and managing an account cluster subgroup corresponding to the user identification information based on the user behavior data.
3. The method for intelligently processing account data according to claim 1, wherein the converting the first relationship network diagram into a second relationship network diagram between account information includes:
determining an account number which is not directly connected and an account number which is directly connected in the first relationship network graph, wherein the account number which is not directly connected represents information of two account numbers which are connected based on the same login equipment information in the first relationship network graph, and the account number which is directly connected represents information of two account numbers which are directly connected in the first relationship network graph;
acquiring account number distances among the account numbers which are not directly connected;
when the account number distance between the indirectly connected account numbers is equal to a preset account number distance, taking the indirectly connected account numbers with the account number distance equal to the preset account number distance as indirectly connected account numbers;
and constructing a second relationship network graph between the account information according to the directly connected account and the indirectly connected account.
4. The account data intelligent processing method according to claim 1, wherein the determining the connection weight between every two pieces of account information in the second relationship network graph includes:
acquiring login information and/or user portrait information corresponding to the account information in the second relationship network diagram;
determining the similarity between every two pieces of account information in the second relationship network diagram based on the login information and/or the user portrait information;
and determining the connection weight between every two pieces of account information in the second relationship network graph according to the similarity between every two pieces of account information.
5. The account data intelligent processing method according to claim 1, after determining the connection weight between every two pieces of account information in the second relationship network graph, further comprising:
traversing each account number information in the second relationship network graph, and executing the following steps when traversing each account number information:
determining account information connected with the currently traversed account information in the second relationship network graph; sequencing the connected account information from large to small according to the numerical value of the connection weight between the connected account information and the currently traversed account information to obtain a connected account information sequence;
determining account information with a preset number in a connected account information sequence;
and updating the second relation network diagram based on the information of the account numbers with the preset number.
6. The intelligent account data processing method according to claim 1, wherein the performing community division on the target account information in the second relationship network graph based on the connection weight to obtain a target account community set comprises:
the method comprises the steps of creating an account community set, wherein each account community in the account community set comprises account information;
selecting one account community from the account community set as a target account community;
calculating a first modularity degree of the target account community based on a connection weight between account information in the target account community and connected account information and a connection weight between account information in the target account community;
adding target account information in the target account community into an account community corresponding to the account information connected with the target account information to obtain an adjacent account community;
calculating a second modularity of the adjacent account community based on a connection weight between the account information in the adjacent account community and the connected account information and a connection weight between the account information in the adjacent account community;
obtaining a difference between the second modularity and the first modularity;
updating the account community set according to the difference value;
selecting one account community from the updated account community set as a target account community, and repeating the step of updating the account community set until the difference value between the second modularity and the corresponding first modularity of the adjacent account community corresponding to any target account community in the currently updated account community set meets a preset condition;
and taking the current account community set when the difference values all meet the preset conditions as the target account community set.
7. The intelligent account data processing method according to claim 1, wherein the performing the rationality verification on the at least one account cluster subgroup based on the attribute label corresponding to each account information in the at least one account cluster subgroup comprises:
calculating the information entropy of the at least one account cluster subgroup according to the attribute label;
determining the consistency degree of the account information in the at least one account cluster subgroup according to the information entropy;
determining the credibility degree of the at least one account cluster subgroup according to the account information number corresponding to different attribute labels;
and performing rationality verification on the at least one account number cluster subgroup according to the consistency degree and the credibility degree.
8. The intelligent account data processing method according to claim 7, wherein the determining, according to the information entropy, the degree of consistency of the account information in the at least one account cluster subgroup includes:
determining a first penalty coefficient according to the type information of the attribute label;
and determining the consistency degree of the account information in the at least one account cluster subgroup according to the information entropy and the first penalty coefficient.
9. The intelligent account data processing method according to claim 7, wherein the determining the credibility of the at least one cluster subgroup of accounts according to the number of account information corresponding to different attribute tags comprises:
acquiring the account information number of the at least one account cluster subgroup;
acquiring the number of account information matched with different attribute labels in the at least one account cluster subgroup;
determining the maximum value of the account information numbers matched with the different attribute labels;
and determining the credibility degree of the at least one account number cluster subgroup according to the ratio of the maximum value to the account number information number of the at least one account number cluster subgroup.
10. An account data intelligent processing device, characterized in that the device comprises: the system comprises a first relation network generation module, a second relation network acquisition module, a connection weight determination module, an account community acquisition module, a user portrait acquisition module, an account cluster subgroup acquisition module, an attribute tag acquisition module, an account cluster subgroup verification module and a user identifier generation module;
the first relation network generating module is used for acquiring a first relation network diagram between account information of a user and login equipment information;
the second relation network acquisition module is used for converting the first relation network diagram into a second relation network diagram among account information;
the connection weight determining module is used for determining the connection weight between every two pieces of account information in the second relationship network graph;
the account community acquisition module is used for carrying out community division on account information in the second relationship network graph based on the connection weight to obtain a target account community set;
the user portrait acquisition module is used for acquiring user portrait information corresponding to the account information in the second relationship network diagram;
the account clustering subgroup acquisition module is used for carrying out clustering analysis on each account community in the target account community set according to the user portrait information to obtain at least one account clustering subgroup;
the attribute tag acquisition module is used for acquiring an attribute tag corresponding to each account information in the at least one account cluster subgroup;
the account clustering subgroup verification module is used for verifying the rationality of at least one account clustering subgroup based on attribute labels corresponding to each account information in the at least one account clustering subgroup;
and the user identification generation module is used for generating user identification information of the account cluster subgroup with the passing rationality verification when the rationality verification of any account cluster subgroup passes.
CN202010896462.1A 2020-08-31 Account data intelligent processing method and device Active CN112084422B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010896462.1A CN112084422B (en) 2020-08-31 Account data intelligent processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010896462.1A CN112084422B (en) 2020-08-31 Account data intelligent processing method and device

Publications (2)

Publication Number Publication Date
CN112084422A true CN112084422A (en) 2020-12-15
CN112084422B CN112084422B (en) 2024-05-10

Family

ID=

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112861967A (en) * 2021-02-07 2021-05-28 中国电子科技集团公司电子科学研究院 Social network abnormal user detection method and device based on heterogeneous graph neural network
CN112948673A (en) * 2021-02-22 2021-06-11 网易(杭州)网络有限公司 Game content pushing method and device, electronic equipment and storage medium
CN113191912A (en) * 2021-05-20 2021-07-30 公安部第三研究所 Method, system, device, processor and storage medium for realizing social group relationship closeness calculation processing based on common group relationship
CN113271315A (en) * 2021-06-08 2021-08-17 工银科技有限公司 Virtual private network abnormal use detection method and device and electronic equipment
CN113326064A (en) * 2021-06-10 2021-08-31 深圳前海微众银行股份有限公司 Method for dividing business logic module, electronic equipment and storage medium
CN113709092A (en) * 2021-03-03 2021-11-26 腾讯科技(深圳)有限公司 Data detection method and device, computer equipment and storage medium
CN113763193A (en) * 2021-01-25 2021-12-07 北京沃东天骏信息技术有限公司 Group detection method, group detection device, electronic equipment and computer storage medium
CN113987087A (en) * 2021-10-27 2022-01-28 北京达佳互联信息技术有限公司 Account processing method and device, electronic equipment and storage medium
CN114742479A (en) * 2022-06-10 2022-07-12 深圳竹云科技股份有限公司 Account identification method, device, server and storage medium
WO2023184831A1 (en) * 2022-03-31 2023-10-05 京东科技信息技术有限公司 Method and apparatus for determining target object, and method and apparatus for constructing identifier association graph
CN117235654A (en) * 2023-11-15 2023-12-15 中译文娱科技(青岛)有限公司 Artificial intelligence data intelligent processing method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136303A (en) * 2011-11-24 2013-06-05 北京千橡网景科技发展有限公司 Method and equipment of dividing user group in social network service website
CN103365893A (en) * 2012-03-31 2013-10-23 百度在线网络技术(北京)有限公司 Method and device for searching individual information of user
US20140143407A1 (en) * 2012-11-21 2014-05-22 Telefonaktiebolaget L M Ericsson (Publ) Multi-objective server placement determination
CN108734479A (en) * 2018-04-12 2018-11-02 阿里巴巴集团控股有限公司 Data processing method, device, equipment and the server of Insurance Fraud identification
CN109063966A (en) * 2018-07-03 2018-12-21 阿里巴巴集团控股有限公司 The recognition methods of adventure account and device
CN111046300A (en) * 2019-12-17 2020-04-21 智者四海(北京)技术有限公司 Method and device for determining crowd attributes of users

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136303A (en) * 2011-11-24 2013-06-05 北京千橡网景科技发展有限公司 Method and equipment of dividing user group in social network service website
CN103365893A (en) * 2012-03-31 2013-10-23 百度在线网络技术(北京)有限公司 Method and device for searching individual information of user
US20140143407A1 (en) * 2012-11-21 2014-05-22 Telefonaktiebolaget L M Ericsson (Publ) Multi-objective server placement determination
CN108734479A (en) * 2018-04-12 2018-11-02 阿里巴巴集团控股有限公司 Data processing method, device, equipment and the server of Insurance Fraud identification
CN109063966A (en) * 2018-07-03 2018-12-21 阿里巴巴集团控股有限公司 The recognition methods of adventure account and device
CN111046300A (en) * 2019-12-17 2020-04-21 智者四海(北京)技术有限公司 Method and device for determining crowd attributes of users

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763193A (en) * 2021-01-25 2021-12-07 北京沃东天骏信息技术有限公司 Group detection method, group detection device, electronic equipment and computer storage medium
CN112861967A (en) * 2021-02-07 2021-05-28 中国电子科技集团公司电子科学研究院 Social network abnormal user detection method and device based on heterogeneous graph neural network
CN112948673A (en) * 2021-02-22 2021-06-11 网易(杭州)网络有限公司 Game content pushing method and device, electronic equipment and storage medium
CN113709092A (en) * 2021-03-03 2021-11-26 腾讯科技(深圳)有限公司 Data detection method and device, computer equipment and storage medium
CN113191912A (en) * 2021-05-20 2021-07-30 公安部第三研究所 Method, system, device, processor and storage medium for realizing social group relationship closeness calculation processing based on common group relationship
CN113271315A (en) * 2021-06-08 2021-08-17 工银科技有限公司 Virtual private network abnormal use detection method and device and electronic equipment
CN113326064A (en) * 2021-06-10 2021-08-31 深圳前海微众银行股份有限公司 Method for dividing business logic module, electronic equipment and storage medium
CN113987087A (en) * 2021-10-27 2022-01-28 北京达佳互联信息技术有限公司 Account processing method and device, electronic equipment and storage medium
WO2023184831A1 (en) * 2022-03-31 2023-10-05 京东科技信息技术有限公司 Method and apparatus for determining target object, and method and apparatus for constructing identifier association graph
CN114742479A (en) * 2022-06-10 2022-07-12 深圳竹云科技股份有限公司 Account identification method, device, server and storage medium
CN114742479B (en) * 2022-06-10 2022-09-06 深圳竹云科技股份有限公司 Account identification method, account identification device, server and storage medium
CN117235654A (en) * 2023-11-15 2023-12-15 中译文娱科技(青岛)有限公司 Artificial intelligence data intelligent processing method and system
CN117235654B (en) * 2023-11-15 2024-03-22 中译文娱科技(青岛)有限公司 Artificial intelligence data intelligent processing method and system

Similar Documents

Publication Publication Date Title
CN110543586B (en) Multi-user identity fusion method, device, equipment and storage medium
CN112235384B (en) Data transmission method, device, equipment and storage medium in distributed system
CN108022171B (en) Data processing method and equipment
Wang et al. App-net: A hybrid neural network for encrypted mobile traffic classification
TWI705341B (en) Feature relationship recommendation method and device, computing equipment and storage medium
Chopade et al. A framework for community detection in large networks using game-theoretic modeling
CN112221159B (en) Virtual item recommendation method and device and computer readable storage medium
CN110851706B (en) Training method and device for user click model, electronic equipment and storage medium
CN103530428A (en) Same-occupation type recommendation method based on developer practical skill similarity
CN112566093B (en) Terminal relation identification method and device, computer equipment and storage medium
CN110046297B (en) Operation and maintenance violation identification method and device and storage medium
CN113259972A (en) Data warehouse construction method, system, equipment and medium based on wireless communication network
CN111932386A (en) User account determining method and device, information pushing method and device, and electronic equipment
CN110224859B (en) Method and system for identifying a group
CN110119477A (en) A kind of information-pushing method, device and storage medium
CN108985954A (en) A kind of method and relevant device of incidence relation that establishing each mark
CN114692007A (en) Method, device, equipment and storage medium for determining representation information
CN112307308A (en) Data processing method, device, equipment and medium
CN110457387B (en) Method and related device applied to user tag determination in network
CN112084422B (en) Account data intelligent processing method and device
CN112084422A (en) Intelligent processing method and device for account data
Gamage et al. Common randomized shortest paths (C-RSP): A simple yet effective framework for multi-view graph embedding
Jiang et al. Efficiency improvements in social network communication via MapReduce
CN114390550A (en) Network type identification method, related device, equipment and storage medium
CN117555905B (en) Service processing method, device, equipment, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant