WO2022247955A1

WO2022247955A1 - Abnormal account identification method, apparatus and device, and storage medium

Info

Publication number: WO2022247955A1
Application number: PCT/CN2022/096060
Authority: WO
Inventors: 曹轲; 钟清华; 黄群
Original assignee: 百果园技术(新加坡)有限公司; 曹轲
Priority date: 2021-05-28
Filing date: 2022-05-30
Publication date: 2022-12-01
Also published as: CN113378899A; CN113378899B

Abstract

Disclosed in embodiments of the present application are an abnormal account identification method, apparatus and device, and a storage medium. The method comprises: obtaining a plurality of user accounts and device attribute information associated with the user accounts, and determining a user association relationship between the plurality of user accounts according to the device attribute information; obtaining service data corresponding to each user account, and calculating to obtain a node vector of each user node by means of a graph convolutional network algorithm by taking each user account as a user node, the service data corresponding to each user account as a user node attribute feature, and the user association relationship as an edge; and performing clustering on the basis of the node vector of each user node, and determining an abnormal account according to a clustering result. According to the present solution, abnormal users can be efficiently identified in batches, and the identification accuracy and the identification efficiency are high.

Description

Abnormal account identification method, device, equipment and storage medium

technical field

The embodiments of the present application relate to the field of computers, and in particular to a method, device, device, and storage medium for identifying an abnormal account.

Background technique

The method for identifying abnormal user accounts in the prior art usually adopts a machine learning classification algorithm or a graph algorithm and community mining. In the machine learning classification algorithm, more abnormal user accounts are predicted by learning the characteristics of existing abnormal user accounts, but the classification algorithm tends to ignore the community characteristics of accounts. For example, if account A and account B are active on the same device, it can be considered that they are operated by the same natural person, but account A has cheated and account B has not cheated at this time, then account B is difficult to predict. In the way of community mining through graph algorithm, community mining is based on the same attributes of account A and account B, so as to connect to a community, and then judge the entire community as an abnormal community. However, in this method, the establishment of graph nodes and community mining need to establish graphs based on user and device environment data in a period of time in history, so as to classify and predict the user community types in the graph. Due to the huge amount of historical data and training It takes a long time, so most of the community divisions are applied in offline scenarios, and it is impossible to accurately divide the newly added nodes that do not exist in the graph.

Contents of the invention

Embodiments of the present application provide a method, device, device, and storage medium for identifying abnormal accounts. This solution can efficiently identify abnormal users in batches, and the identification accuracy and efficiency are higher.

In the first aspect, the embodiment of the present application provides a method for identifying an abnormal account, which includes:

Obtaining multiple user accounts and device attribute information associated with the user accounts, and determining a user association relationship between each of the multiple user accounts according to the device attribute information;

Obtain the business data corresponding to each user account, take each user account as a user node, the business data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, through the graph volume The product network algorithm is calculated to obtain the node vector of each user node; and

Clustering is performed based on the node vector of each user node, and an abnormal account is determined according to the clustering result.

In the second aspect, the embodiment of the present application also provides an abnormal account identification device, which includes:

A data acquisition module, configured to acquire multiple user accounts and device attribute information associated with the user accounts, as well as business data corresponding to each user account;

A user association relationship determining module, configured to determine a user association relationship between each user account among the plurality of user accounts according to the device attribute information;

The vector calculation module is used to use each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, and the graph convolution network algorithm is used to calculate and obtain the Describe the node vector of each user node;

A clustering calculation module, configured to perform clustering based on the node vector of each user node; and

The result analysis module is used to determine the abnormal account according to the clustering result.

In the third aspect, the embodiment of the present application also provides an abnormal account identification device, the device includes:

one or more processors;

storage means for storing one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors implement the abnormal account identification method described in the embodiment of the present application.

In the fourth aspect, the embodiment of the present application also provides a storage medium storing computer-executable instructions, the computer-executable instructions are used to execute the abnormal account identification method described in the embodiment of the present application when executed by a computer processor .

In this embodiment of the application, by obtaining multiple user accounts and the device attribute information associated with the user accounts, the user association relationship between each user account among the multiple user accounts is determined according to the device attribute information, and then each user account is obtained For the corresponding business data, each user account is used as a user node, the business data corresponding to each user account is the attribute feature of the user node, and the user relationship is the edge. After calculating the node vector of each user node through the graph convolution network algorithm , clustering is performed based on the node vector of each user node, and abnormal accounts are determined according to the clustering results, so that abnormal users can be efficiently identified in batches, and the identification accuracy and identification efficiency are higher.

Description of drawings

FIG. 1 is a flowchart of a method for identifying an abnormal account provided by an embodiment of the present application;

Fig. 1a is a schematic diagram of association between user account and device attribute information provided by the embodiment of the present application;

FIG. 2 is a flow chart of another abnormal account identification method provided by the embodiment of the present application;

Figure 2a is a schematic diagram of a framework of a graph convolutional network algorithm provided by an embodiment of the present application;

FIG. 3 is a flow chart of another abnormal account identification method provided by the embodiment of the present application;

FIG. 4 is a flow chart of another abnormal account identification method provided by the embodiment of the present application;

FIG. 5 is a structural block diagram of an abnormal account identification device provided by an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a device provided by an embodiment of the present application.

Detailed ways

Fig. 1 is a flow chart of an abnormal account identification method provided by the embodiment of the present application. This embodiment can be applied to the use of many application software such as user login, registration, social networking, etc., to detect and identify abnormal accounts, wherein Abnormal accounts are malicious accounts, vest accounts, agreement accounts, etc., which are different from normal user accounts. Abnormal accounts have behaviors such as batch operations, swiping orders, and malicious operations. The abnormal account identification method can be executed by a computing device such as a server, a system application host, etc., and specifically includes the following steps:

Step S101, acquiring a plurality of user accounts and device attribute information associated with the user accounts, and determining a user association relationship between each user account in the plurality of user accounts according to the device attribute information;

Step S102. Obtain the service data corresponding to each user account, take each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, Obtaining the node vector of each user node by computing a graph convolutional network algorithm; and

Step S103, performing clustering based on the node vector of each user node, and determining an abnormal account according to the clustering result.

Wherein, the user account may be an account used by the user when using a certain software, logging in a forum or a video website, etc., such as a unique user ID (UID) assigned during registration. A user can register one or more user accounts, the user accounts can use the same or different login devices to log in, and the network addresses used for each login can be the same or different. After logging in with the user account, the user can perform related operations, such as sending barrage messages, leaving comments, following the host, etc.

In one embodiment, first obtain multiple user accounts and device attribute information associated with the user accounts. The user account and device attribute information may be information recorded by the system background during user registration and login. The device attribute information is data associated with the user account, such as account login device, IP address used, bound mobile phone number, etc. Optionally, the acquisition of user account and device attribute information takes time as a node, and the time node may be three months, that is, active user accounts and associated device attribute information within three months are acquired.

Wherein, the user account and device attribute information may be stored in the form of a database table. The form and content of its records are shown in the table below:

用户账号user account	登录设备log in device	IP地址IP address	登录时间Log in time
uid1uid1
设备1equipment 1	ip1ip1	aaaaaa
uid2 uid2		设备1equipment 1	ip1ip1	bbb bbb
uid3uid3
设备2device 2	ip2ip2	ccc ccc
uid4uid4
设备1equipment 1	ip1ip1	ddd ddd
uid1uid1
设备1equipment 1	ip3ip3	eeeeee
uid3 uid3		设备2device 2	ip3ip3	ffffff
...... …	...... …	...... …	...... …

In one embodiment, the user association relationship between each user account among the multiple user accounts is determined according to the device attribute information. Wherein, the user association relationship represents whether there is an association between users, and whether there is an association can be whether two user accounts have used the same login device, IP address, mobile phone number, etc., that is, whether there is an association between the two user accounts. If the same device attribute information exists, it is determined that the two are in an association relationship, and if it does not exist, it is determined that the user association relationship between the two user accounts is a non-association relationship.

Specifically, determining the user association relationship between each user account in the plurality of user accounts according to the device attribute information includes: determining the device attribute association between each user account in the plurality of user accounts and the device attribute information relationship; determining the user association relationship between each user account according to the device attribute association relationship. Among them, the device attribute association relationship is used to represent whether a certain user account is associated with a certain device attribute information. The IP address used has a device attribute association relationship, otherwise, there is no device attribute association relationship. Take the content recorded in the above table as an example, uid1 has used ip1 and ip3 and device 1 to log in, then uid1 is associated with ip1, ip3 and device 1; uid2 uses ip1 and device 1 to log in, then uid2 is associated with ip1 and device 1; uid3 Use ip2, ip3 and device 2 to log in, then uid3 is associated with ip2, ip3 and device 2; uid4 uses ip1 to log in with device 1, then uid4 is associated with ip1 and device 1. For characterization in the form of a graph, reference may be made to FIG. 1a , which is a schematic diagram of the association between user accounts and device attribute information provided by an embodiment of the present application. Determine the user association relationship between each user account based on the device attribute association relationship. Specifically, when determining whether there is an association relationship between two user accounts, the judgment conditions include: when there are one or more identical When determining the association relationship of device attribute information, it is judged that they are related to each other. Taking Figure 1a as an example, uid1 is associated with ip1, uid2 is associated with ip1, and uid4 is associated with ip1, that is, uid1, uid2, and uid4 have the same device attribute information (ip1), then it is determined that uid1, uid2, and uid4 are associated; uid1 is associated with ip3 , uid3 is also associated with ip3, then it is determined that uid1 is associated with uid3. Optionally, after the user association relationship is determined, the association relationship can be stored in the database or cache separately in the form of a list, or can be integrated with a previously stored database table.

Wherein, the business data refers to data of business attributes related to the user account. Taking the live broadcast application as an example, the business data can be: user country code, registered device model, number of private chat messages sent within 3 days of registration, number of private chat messages sent within 3 days of registration, number of other users followed within 3 days of registration, and viewing within 3 days of registration The duration of the live broadcast, reward gifts within 3 days of registration, etc. In one embodiment, 52 dimensions of business data are selected for total statistics, that is, a 52-dimensional attribute feature is formed, and the attribute feature can be represented in the form of a vector.

In one embodiment, after the user account and service data are acquired and the user association relationship is determined, a graph convolutional network algorithm is used for calculation to obtain a node vector of each user account. Specifically, each user account is used as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is the edge. The node vector of each user node is calculated through the graph convolutional network algorithm. For user nodes, uid can be converted into index (index) form representation, business data, that is, user node attribute characteristics, can be converted into numerical variables to form attribute vectors using labelencoder (string encoding function), such as (2, 53, 234, 1, ... , 4) to perform characterization, and to perform characterization in the form of a connected edge constructed between two associated user accounts specifically for the user association relationship.

Among them, the graph convolutional network algorithm can be an algorithm based on frequency domain or air domain. Taking frequency domain as an example, exemplary algorithms include ChebNet algorithm, GCN and so on. Taking an algorithm implemented based on the airspace as an example, an exemplary GraphSAGE model algorithm is included. Taking the GraphSAGE model algorithm as an example, the above-mentioned user nodes, user node attribute feature vectors, and edge relationships are trained to calculate the embedding vector of each user node.

In one embodiment, the node vector of each user node is clustered by using a clustering algorithm to obtain a clustering result, such as obtaining multiple clusters. Wherein, the clustering algorithm used may exemplarily be k-means clustering algorithm, hierarchical clustering algorithm, SOM clustering algorithm or FCM clustering algorithm, etc.

After the clustering result is obtained, the abnormal account is finally determined according to the clustering result. Specifically, the way to determine the abnormal account includes any one or more of the following: according to the clustering cluster where the determined abnormal account is located, it is determined that the user account under the cluster is an abnormal account; Analyze the business data of the user accounts in , and determine the abnormal accounts according to the analysis results; according to manual identification and calibration, determine the user accounts in the calibrated clusters as abnormal accounts.

In one embodiment, determining abnormal accounts according to the clustering results includes: calculating the average value of business data of all user accounts in each cluster, and marking the clusters according to the calculation results and preset logical judgment conditions ; Determining the user accounts in the abnormal clusters marked as abnormal accounts. Illustratively, taking the average number of followers and the average viewing time as the labeling conditions as examples, the average number of followers and the average viewing time of user accounts in each cluster are counted, and if the statistics show that they are significantly different from other clusters, Then it is determined that the user account under the cluster is an abnormal account.

Correspondingly, after determining the abnormal account, carry out corresponding risk control treatment on it.

It can be seen from the above solution that by obtaining multiple user accounts and device attribute information associated with the user accounts, the user association relationship between each user account among the multiple user accounts is determined according to the device attribute information, and then the user account corresponding to each user account is obtained. For business data, each user account is used as a user node, the business data corresponding to each user account is the attribute feature of the user node, and the user relationship is the edge. After calculating the node vector of each user node through the graph convolution network algorithm, based on The node vector of each user node is clustered, and abnormal accounts are determined according to the clustering results, so that abnormal users can be efficiently identified in batches, and the identification accuracy and identification efficiency are higher.

FIG. 2 is a flow chart of another abnormal account identification method provided by the embodiment of the present application, which shows a specific method of calculating the node vector of each user node through the graph convolution network algorithm. As shown in Figure 2, the technical solution is as follows:

Step S201, acquiring a plurality of user accounts and device attribute information associated with the user accounts, and determining a user association relationship between each user account in the plurality of user accounts according to the device attribute information;

Step S202. Obtain the service data corresponding to each user account, take each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, training with an inductive learning model of unsupervised learning to obtain the node vector of each user node; and

Step S203, performing clustering based on the node vector of each user node, and determining an abnormal account according to the clustering result.

In one embodiment, an inductive learning model of unsupervised learning is used for training to obtain the node vector of each user node. For example, the GraphSage model is used for training to obtain the node vector of each user node. Among them, the GraphSage model is used as an algorithm framework, which can easily obtain the representation of new nodes. The method adopted by the GraphSage model is to learn how the information of a node is aggregated through the characteristics of its neighbor nodes. In this solution, the user node attribute characteristics and user association relationship of each user node are known, so that a representation of a new node can be obtained efficiently. Assume that it is necessary to aggregate the surrounding neighbor node information for K times. Each aggregation is to aggregate the user node attribute characteristics of each user node obtained in the previous layer, and then assume the characteristics of the user node itself in the upper layer to obtain the Characteristics. The final feature of the user node is obtained by repeating the aggregation K times in this way, and the user node feature of the bottom layer is the input user node feature. An example is shown in FIG. 2a, which is a schematic diagram of a framework of a graph convolutional network algorithm provided by an embodiment of the present application.

The loss function of GraphSage represented by unsupervised learning is as follows:

V _n is sampled from the negative sampling distribution P _n (v) of node u, Q indicates the number of negative samples, u indicates the current node, v indicates the neighbors reachable by random walk, V _n indicates negative sampling nodes, and z indicates GraphSage The embedding vector output by the model, the similarity between the two embedding vectors is obtained by the vector click method. Each layer of GraphSage uses an aggregation function for the aggregation of neighbor node information. In this embodiment, the LSTM aggregation method is used. First, the neighbors are randomly sorted, and then the randomly sorted neighbor sequence embedding vectors are used as LSTM input.

In one embodiment, the parameter setting method of the inductive learning model of unsupervised learning includes: aggregating the characteristics of neighbor nodes within two hops, and the aggregation method adopts long-short-term memory neural network for aggregation; when extracting user nodes, extract the first preset number of one-hop neighbor nodes, and a second preset number of two-hop neighbor nodes, where the second preset number of times is greater than the first preset number of times. Specifically, taking the GraphSage model as an example, its parameter settings and corresponding representation contents are as follows:

K=2: Aggregate the characteristics of neighbors within two hops; S1=3 (representing the first preset number of times), S2=5 (representing the second preset number of times): when sampling, a small number of neighbors of one-hop nodes are extracted, and more nodes of two hops are extracted; Perform 50 random walks with a step size of 5 for each node; negative sampling samples 20 for each node; the aggregation method uses LSTM for neighbor aggregation; embedding vector latitude 50. Finally, a 50-dimensional 50embedding vector of each user node is obtained. Wherein, the setting of the above-mentioned parameters is a parameter value obtained after multiple experiments and has a better effect of identifying abnormal accounts.

It can be seen from the above scheme that by obtaining the business data corresponding to each user account, each user account is used as a user node, the business data corresponding to each user account is the attribute feature of the user node, and the user relationship is the edge, using unsupervised learning The inductive learning model is trained to obtain the node vector of each user node. Through the use of the GraphSAGE model, its powerful inductive learning attribute is used, and an unsupervised learning and training method is adopted. During the parameter setting process, the two-hop inner Neighbor node characteristics of the neighbor node, the aggregation method adopts the long short-term memory neural network for aggregation, when the user node is extracted, the first preset number of one-hop neighbor nodes and the second preset number of two-hop neighbor nodes are extracted, where the second preset The number of times is greater than the first preset number of times, realizing efficient, fast, and accurate node vector generation of user nodes, and finally improving the accuracy and efficiency of abnormal account identification.

FIG. 3 is a flow chart of another abnormal account identification method provided by the embodiment of the present application, which shows a specific clustering method based on the node vector of each user node. As shown in Figure 3, the technical solution is as follows:

Step S301, acquiring a plurality of user accounts and device attribute information associated with the user accounts, and determining a user association relationship between each user account in the plurality of user accounts according to the device attribute information;

Step S302. Obtain the service data corresponding to each user account, take each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, Obtaining the node vector of each user node by computing a graph convolutional network algorithm; and

Step S303, clustering the node vectors of each user node through a density-based spatial clustering algorithm to obtain a plurality of clusters, and determining abnormal accounts according to the clustering results.

Optionally, the method of obtaining the node vector of each user node through the graph convolutional network algorithm may be to use an inductive learning model of unsupervised learning to perform training to obtain the node vector of each user node; of course, it is also possible Other models are used for processing, but the processing effect is relatively worse than the inductive learning model of unsupervised learning. For the specific content of this model, please refer to the explanation part of step S202, which will not be repeated here.

In one embodiment, the clustering algorithm adopts a density-based spatial clustering algorithm, specifically DBSCAN (Density-Based Spatial Clustering of Applications with Noise, a density-based clustering method with noise), which will have sufficient density Regions are divided into clusters, and clusters of arbitrary shape are found in a spatial database with noise. The DBSCAN algorithm defines a "cluster" as the largest collection of density-connected points. Specifically, the determined embedding vectors of user nodes are trained using DBSCAN. DBSCAN performs clustering according to the Euclidean distance between vectors, and clusters the nodes in the entire graph into N categories. Among them, the embedding vectors of abnormal accounts are densely clustered. As a result, they will be classified into the same cluster.

Correspondingly, after obtaining multiple clusters, analyze the data in the multiple clusters to determine abnormal accounts. Optionally, the calculation of the average value of the business data of all user accounts in each cluster can be calculated by using the step S103 explained in the section, and the clusters can be marked according to the calculation results and the preset logical judgment conditions; the marked results A user account in an abnormal cluster is determined as an abnormal account. Specifically, take the business data as the average number of attention, and the logical judgment condition is that clusters greater than the preset average number of attention are marked as an example, and the preset average number of attention is exemplarily 200. Assuming that 50 clusters are currently determined, after calculating the average number of user attention in each cluster, it is found that the average number of attention corresponding to cluster 20 and cluster 31 is 300 and 500 respectively, then the corresponding cluster 20 and Cluster 31 is labeled. It should be noted that the above logical judgment condition is a condition marked after judgment for a single business data, and may also be a combined judgment of multiple business data, and the specific business data type is not limited. After the cluster 20 and the cluster 31 are marked, the user accounts in the cluster 20 and the cluster 31 are determined as abnormal accounts.

It can be seen from the above scheme that the node vector of each user node is clustered through the density-based spatial clustering algorithm to obtain multiple clusters, and the abnormal account is determined according to the clustering results, and the DBSCAN clustering algorithm is applied to specific The clustering process, because the algorithm divides the area with sufficient density into clusters, and finds clusters of any shape in the spatial database with noise, it can efficiently cluster the node vectors of the user account nodes, which is convenient for the final efficient, Quickly identify abnormal accounts.

FIG. 4 is a flow chart of another abnormal account identification method provided by the embodiment of the present application, which provides a real-time online method for determining whether a newly added user account is an abnormal account. As shown in Figure 4, the technical solution is as follows:

Step S401, acquiring a plurality of user accounts and device attribute information associated with the user accounts, and determining a user association relationship between each user account in the plurality of user accounts according to the device attribute information;

Step S402. Obtain the service data corresponding to each user account, take each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, Using an inductive learning model of unsupervised learning to train to obtain the node vector of each user node, and output the trained graphical model file;

Step S403, clustering the node vectors of each user node through a density-based spatial clustering algorithm to obtain multiple clusters, and output the trained clustering model file;

Step S404, calculate the average value of the business data of all user accounts in each cluster, mark the cluster according to the calculation result and the preset logical judgment conditions, and label the result as an abnormal user in the cluster The account is determined to be an abnormal account; and

Step S405, obtain the newly added user node in real time, output the node vector through the training model recorded in the graph model file, and calculate the cluster cluster to which the node vector belongs through the training model recorded in the cluster model file to determine Whether the user account corresponding to the newly added user node is an abnormal account.

In one embodiment, step S405 is performed after step S403, that is, after outputting the trained graph model file and clustering model file, and outputting the trained graph model file and clustering model file for real-time online Abnormal account identification. Exemplarily, the graphical model file and the clustering model file can be stored in the cache. When a new user node is added, the training model recorded in the graphical model file outputs the node vector of the user account, and the training model recorded in the clustering model file The model calculates the cluster to which the node vector belongs, and if it hits the cluster of abnormal accounts, it is determined that the newly added user account is an abnormal account, and corresponding risk control processing is performed.

In another embodiment, step S405 is performed after step S404, that is, after identifying the abnormal account of the currently processed user account, the newly-added The user node is judged to determine whether the user account corresponding to the newly added user node is an abnormal account. The execution order of the above steps S403 to S405 may be executed in the order of Step S403, Step S404 to Step S405, or executed in parallel with Step S404 and Step S405, and the specific execution order is not limited.

It can be seen from the above scheme that by acquiring newly added user nodes in real time, outputting node vectors through the training model recorded in the graph model file, and calculating the cluster cluster to which the node vector belongs through the training model recorded in the clustering model file, to determine the new Whether the user account corresponding to the added user node is an abnormal account, in which the graph model file is obtained based on GraphSage unsupervised learning training, and the clustering model file is obtained through the DBSCAN algorithm for node vector clustering training, which can realize real-time and online user account Identification of whether it is an abnormal account.

FIG. 5 is a structural block diagram of an abnormal account identification device provided by an embodiment of the present application. The device is used to implement the abnormal account identification method provided in the above embodiment, and has corresponding functional modules and beneficial effects for executing the method. As shown in Figure 5, the device specifically includes: a data acquisition module 101, a user association determination module 102, a vector calculation module 103, a cluster calculation module 104 and a result analysis module 105, wherein,

A data acquisition module 101, configured to acquire multiple user accounts and device attribute information associated with the user accounts, as well as business data corresponding to each user account;

A user association relationship determination module 102, configured to determine a user association relationship between each user account among the plurality of user accounts according to the device attribute information;

The vector calculation module 103 is configured to use each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, which is calculated by a graph convolutional network algorithm The node vector of each user node;

A clustering calculation module 104, configured to perform clustering based on the node vector of each user node;

The result analysis module 105 is configured to determine an abnormal account according to the clustering result.

In a possible embodiment, the user association determination module 102 is specifically configured to:

Determine the device attribute association relationship between each user account in the plurality of user accounts and the device attribute information;

The user association relationship between each user account is determined according to the device attribute association relationship.

In a possible embodiment, the vector calculation module 103 is specifically configured to:

The node vector of each user node is obtained by training with an inductive learning model of unsupervised learning.

In a possible embodiment, the parameter setting of the inductive learning model of the unsupervised learning includes:

Aggregate the characteristics of neighbor nodes within two hops, and the aggregation method uses long-term short-term memory neural network for aggregation;

When extracting user nodes, one-hop neighbor nodes for a first preset number of times and two-hop neighbor nodes for a second preset number of times are extracted, and the second preset number of times is greater than the first preset number of times.

In a possible embodiment, the cluster calculation module 104 is specifically configured to:

The node vectors of each user node are clustered through a density-based spatial clustering algorithm to obtain multiple clusters.

In a possible embodiment, the result analysis module 105 is specifically used for:

Calculate the average value of the business data of all user accounts in each cluster, and mark the clusters according to the calculation results and preset logical judgment conditions;

Determining the user accounts in the abnormal clusters marked as abnormal accounts.

In a possible embodiment, the vector calculation module 103 is also used for:

After obtaining the node vector of each user node through the graph convolutional network algorithm, output the graph model file that has been trained;

The cluster calculation module 104 is also used for:

After the clustering is performed based on the node vector of each user node, the trained clustering model file is output.

In a possible embodiment, the data acquisition module 101 is also used to acquire newly added user nodes in real time, and the vector calculation module 103 is also used to output node vectors through the training model recorded in the graph model file; The clustering calculation module 104 is also used to calculate the clustering cluster to which the node vector belongs through the training model recorded in the clustering model file, so that the result analysis module 105 can determine the corresponding Whether the user account is an abnormal account.

FIG. 6 is a schematic structural diagram of an abnormal account identification device provided in the embodiment of the present application. As shown in FIG. 6, the device includes a processor 201, a memory 202, an input device 203, and an output device 204; The quantity can be one or more. In FIG. 6, a processor 201 is taken as an example; the processor 201, memory 202, input device 203 and output device 204 in the device can be connected by a bus or in other ways, and in FIG. 6 by a bus Take connection as an example. As a computer-readable storage medium, the memory 202 can be used to store software programs, computer-executable programs and modules, such as program instructions/modules corresponding to the abnormal account identification method in the embodiment of the present application. The processor 201 executes various functional applications and data processing of the device by running the software programs, instructions and modules stored in the memory 202, that is, realizes the above-mentioned abnormal account identification method. The input device 203 can be used to receive input numbers or character information, and generate key signal input related to user settings and function control of the device. The output device 204 may include a display device such as a display screen.

The embodiment of the present application also provides a storage medium containing computer-executable instructions, the computer-executable instructions are used to perform an abnormal account identification method when executed by a computer processor, and the method includes:

Obtain the business data corresponding to each user account, take each user account as a user node, the business data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, through the graph volume The product network algorithm calculates and obtains the node vector of each user node;

Claims

The abnormal account identification method is characterized in that it includes:

Obtaining multiple user accounts and device attribute information associated with the user accounts, and determining a user association relationship between each of the multiple user accounts according to the device attribute information;

Obtain the business data corresponding to each user account, take each user account as a user node, the business data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, through the graph volume The product network algorithm is calculated to obtain the node vector of each user node; and

Clustering is performed based on the node vector of each user node, and an abnormal account is determined according to the clustering result.
The method for identifying abnormal accounts according to claim 1, wherein the determining the user association relationship between each of the plurality of user accounts according to the device attribute information includes:

Determine the device attribute association relationship between each user account in the plurality of user accounts and the device attribute information;

The user association relationship between each user account is determined according to the device attribute association relationship.
The abnormal account identification method according to claim 1, wherein the node vector of each user node obtained through the calculation of the graph convolution network algorithm includes:

The node vector of each user node is obtained by training with an inductive learning model of unsupervised learning.
The abnormal account identification method according to claim 3, wherein the parameter setting of the inductive learning model of the unsupervised learning comprises:

Aggregate the characteristics of neighbor nodes within two hops, and the aggregation method uses long-term short-term memory neural network for aggregation;

When extracting user nodes, one-hop neighbor nodes for a first preset number of times and two-hop neighbor nodes for a second preset number of times are extracted, and the second preset number of times is greater than the first preset number of times.
The abnormal account identification method according to claim 1, wherein the clustering based on the node vector of each user node includes:

The node vectors of each user node are clustered through a density-based spatial clustering algorithm to obtain multiple clusters.
The abnormal account identification method according to claim 5, wherein said determining the abnormal account according to the clustering result comprises:

Calculate the average value of the business data of all user accounts in each cluster, and mark the clusters according to the calculation results and preset logical judgment conditions; and

Determining the user accounts in the abnormal clusters marked as abnormal accounts.
The abnormal account identification method according to any one of claims 1-6, characterized in that, after calculating the node vector of each user node through the graph convolutional network algorithm, further comprising:

Output the trained graph model file;

After performing clustering based on the node vector of each user node, it also includes:

Output the clustering model file after training.
The abnormal account identification method according to claim 7, further comprising:

Obtaining newly added user nodes in real time, and outputting node vectors through the training model recorded in the graph model file; and

The clustering cluster to which the node vector belongs is calculated through the training model recorded in the clustering model file, so as to determine whether the user account corresponding to the newly added user node is an abnormal account.
Abnormal account identification devices, including:

A data acquisition module, configured to acquire multiple user accounts and device attribute information associated with the user accounts, as well as business data corresponding to each user account;

A user association relationship determining module, configured to determine a user association relationship between each user account among the plurality of user accounts according to the device attribute information;

The vector calculation module is used to use each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, and the graph convolution network algorithm is used to calculate and obtain the Describe the node vector of each user node;

A clustering calculation module, configured to perform clustering based on the node vector of each user node; and

The result analysis module is used to determine the abnormal account according to the clustering result.
An abnormal account identification device, the device includes: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more processors , so that the one or more processors implement the abnormal account identification method according to any one of claims 1-8.
A storage medium storing computer-executable instructions, the computer-executable instructions are used to execute the abnormal account identification method according to any one of claims 1-8 when executed by a computer processor.