WO2022247955A1 - Abnormal account identification method, apparatus and device, and storage medium - Google Patents

Abnormal account identification method, apparatus and device, and storage medium Download PDF

Info

Publication number
WO2022247955A1
WO2022247955A1 PCT/CN2022/096060 CN2022096060W WO2022247955A1 WO 2022247955 A1 WO2022247955 A1 WO 2022247955A1 CN 2022096060 W CN2022096060 W CN 2022096060W WO 2022247955 A1 WO2022247955 A1 WO 2022247955A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
node
account
abnormal
accounts
Prior art date
Application number
PCT/CN2022/096060
Other languages
French (fr)
Chinese (zh)
Inventor
曹轲
钟清华
黄群
Original Assignee
百果园技术(新加坡)有限公司
曹轲
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百果园技术(新加坡)有限公司, 曹轲 filed Critical 百果园技术(新加坡)有限公司
Publication of WO2022247955A1 publication Critical patent/WO2022247955A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Definitions

  • the embodiments of the present application relate to the field of computers, and in particular to a method, device, device, and storage medium for identifying an abnormal account.
  • the method for identifying abnormal user accounts in the prior art usually adopts a machine learning classification algorithm or a graph algorithm and community mining.
  • the machine learning classification algorithm more abnormal user accounts are predicted by learning the characteristics of existing abnormal user accounts, but the classification algorithm tends to ignore the community characteristics of accounts. For example, if account A and account B are active on the same device, it can be considered that they are operated by the same natural person, but account A has cheated and account B has not cheated at this time, then account B is difficult to predict.
  • community mining is based on the same attributes of account A and account B, so as to connect to a community, and then judge the entire community as an abnormal community.
  • Embodiments of the present application provide a method, device, device, and storage medium for identifying abnormal accounts. This solution can efficiently identify abnormal users in batches, and the identification accuracy and efficiency are higher.
  • the embodiment of the present application provides a method for identifying an abnormal account, which includes:
  • Clustering is performed based on the node vector of each user node, and an abnormal account is determined according to the clustering result.
  • an abnormal account identification device which includes:
  • a data acquisition module configured to acquire multiple user accounts and device attribute information associated with the user accounts, as well as business data corresponding to each user account;
  • a user association relationship determining module configured to determine a user association relationship between each user account among the plurality of user accounts according to the device attribute information
  • the vector calculation module is used to use each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, and the graph convolution network algorithm is used to calculate and obtain the Describe the node vector of each user node;
  • a clustering calculation module configured to perform clustering based on the node vector of each user node
  • the result analysis module is used to determine the abnormal account according to the clustering result.
  • the embodiment of the present application also provides an abnormal account identification device, the device includes:
  • processors one or more processors
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the abnormal account identification method described in the embodiment of the present application.
  • the embodiment of the present application also provides a storage medium storing computer-executable instructions, the computer-executable instructions are used to execute the abnormal account identification method described in the embodiment of the present application when executed by a computer processor .
  • the user association relationship between each user account among the multiple user accounts is determined according to the device attribute information, and then each user account is obtained For the corresponding business data, each user account is used as a user node, the business data corresponding to each user account is the attribute feature of the user node, and the user relationship is the edge.
  • clustering is performed based on the node vector of each user node, and abnormal accounts are determined according to the clustering results, so that abnormal users can be efficiently identified in batches, and the identification accuracy and identification efficiency are higher.
  • FIG. 1 is a flowchart of a method for identifying an abnormal account provided by an embodiment of the present application
  • Fig. 1a is a schematic diagram of association between user account and device attribute information provided by the embodiment of the present application
  • FIG. 2 is a flow chart of another abnormal account identification method provided by the embodiment of the present application.
  • Figure 2a is a schematic diagram of a framework of a graph convolutional network algorithm provided by an embodiment of the present application
  • FIG. 3 is a flow chart of another abnormal account identification method provided by the embodiment of the present application.
  • FIG. 4 is a flow chart of another abnormal account identification method provided by the embodiment of the present application.
  • FIG. 5 is a structural block diagram of an abnormal account identification device provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a device provided by an embodiment of the present application.
  • Fig. 1 is a flow chart of an abnormal account identification method provided by the embodiment of the present application.
  • This embodiment can be applied to the use of many application software such as user login, registration, social networking, etc., to detect and identify abnormal accounts, wherein Abnormal accounts are malicious accounts, vest accounts, agreement accounts, etc., which are different from normal user accounts. Abnormal accounts have behaviors such as batch operations, swiping orders, and malicious operations.
  • the abnormal account identification method can be executed by a computing device such as a server, a system application host, etc., and specifically includes the following steps:
  • Step S101 acquiring a plurality of user accounts and device attribute information associated with the user accounts, and determining a user association relationship between each user account in the plurality of user accounts according to the device attribute information;
  • Step S102 Obtain the service data corresponding to each user account, take each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, Obtaining the node vector of each user node by computing a graph convolutional network algorithm;
  • Step S103 performing clustering based on the node vector of each user node, and determining an abnormal account according to the clustering result.
  • the user account may be an account used by the user when using a certain software, logging in a forum or a video website, etc., such as a unique user ID (UID) assigned during registration.
  • UID unique user ID
  • a user can register one or more user accounts, the user accounts can use the same or different login devices to log in, and the network addresses used for each login can be the same or different.
  • the user After logging in with the user account, the user can perform related operations, such as sending barrage messages, leaving comments, following the host, etc.
  • the user account and device attribute information may be information recorded by the system background during user registration and login.
  • the device attribute information is data associated with the user account, such as account login device, IP address used, bound mobile phone number, etc.
  • the acquisition of user account and device attribute information takes time as a node, and the time node may be three months, that is, active user accounts and associated device attribute information within three months are acquired.
  • the user account and device attribute information may be stored in the form of a database table.
  • the form and content of its records are shown in the table below:
  • the user association relationship between each user account among the multiple user accounts is determined according to the device attribute information.
  • the user association relationship represents whether there is an association between users, and whether there is an association can be whether two user accounts have used the same login device, IP address, mobile phone number, etc., that is, whether there is an association between the two user accounts. If the same device attribute information exists, it is determined that the two are in an association relationship, and if it does not exist, it is determined that the user association relationship between the two user accounts is a non-association relationship.
  • determining the user association relationship between each user account in the plurality of user accounts according to the device attribute information includes: determining the device attribute association between each user account in the plurality of user accounts and the device attribute information relationship; determining the user association relationship between each user account according to the device attribute association relationship.
  • the device attribute association relationship is used to represent whether a certain user account is associated with a certain device attribute information.
  • the IP address used has a device attribute association relationship, otherwise, there is no device attribute association relationship.
  • uid1 has used ip1 and ip3 and device 1 to log in, then uid1 is associated with ip1, ip3 and device 1; uid2 uses ip1 and device 1 to log in, then uid2 is associated with ip1 and device 1; uid3 Use ip2, ip3 and device 2 to log in, then uid3 is associated with ip2, ip3 and device 2; uid4 uses ip1 to log in with device 1, then uid4 is associated with ip1 and device 1.
  • 1a which is a schematic diagram of the association between user accounts and device attribute information provided by an embodiment of the present application.
  • the judgment conditions include: when there are one or more identical
  • the association relationship of device attribute information it is judged that they are related to each other.
  • uid1 is associated with ip1
  • uid2 is associated with ip1
  • uid4 is associated with ip1, that is, uid1, uid2, and uid4 have the same device attribute information (ip1)
  • uid1 is associated with ip3
  • uid3 is also associated with ip3, then it is determined that uid1 is associated with uid3.
  • the association relationship can be stored in the database or cache separately in the form of a list, or can be integrated with a previously stored database table.
  • the business data refers to data of business attributes related to the user account.
  • the business data can be: user country code, registered device model, number of private chat messages sent within 3 days of registration, number of private chat messages sent within 3 days of registration, number of other users followed within 3 days of registration, and viewing within 3 days of registration The duration of the live broadcast, reward gifts within 3 days of registration, etc.
  • 52 dimensions of business data are selected for total statistics, that is, a 52-dimensional attribute feature is formed, and the attribute feature can be represented in the form of a vector.
  • a graph convolutional network algorithm is used for calculation to obtain a node vector of each user account.
  • each user account is used as a user node
  • the service data corresponding to each user account is the attribute feature of the user node
  • the user association relationship is the edge.
  • the node vector of each user node is calculated through the graph convolutional network algorithm.
  • uid can be converted into index (index) form representation
  • business data that is, user node attribute characteristics
  • labelencoder string encoding function
  • the graph convolutional network algorithm can be an algorithm based on frequency domain or air domain.
  • exemplary algorithms include ChebNet algorithm, GCN and so on.
  • an algorithm implemented based on the airspace as an example, an exemplary GraphSAGE model algorithm is included.
  • the above-mentioned user nodes, user node attribute feature vectors, and edge relationships are trained to calculate the embedding vector of each user node.
  • the node vector of each user node is clustered by using a clustering algorithm to obtain a clustering result, such as obtaining multiple clusters.
  • the clustering algorithm used may exemplarily be k-means clustering algorithm, hierarchical clustering algorithm, SOM clustering algorithm or FCM clustering algorithm, etc.
  • the abnormal account is finally determined according to the clustering result.
  • the way to determine the abnormal account includes any one or more of the following: according to the clustering cluster where the determined abnormal account is located, it is determined that the user account under the cluster is an abnormal account; Analyze the business data of the user accounts in , and determine the abnormal accounts according to the analysis results; according to manual identification and calibration, determine the user accounts in the calibrated clusters as abnormal accounts.
  • determining abnormal accounts according to the clustering results includes: calculating the average value of business data of all user accounts in each cluster, and marking the clusters according to the calculation results and preset logical judgment conditions ; Determining the user accounts in the abnormal clusters marked as abnormal accounts.
  • Determining the user accounts in the abnormal clusters marked as abnormal accounts Illustratively, taking the average number of followers and the average viewing time as the labeling conditions as examples, the average number of followers and the average viewing time of user accounts in each cluster are counted, and if the statistics show that they are significantly different from other clusters, Then it is determined that the user account under the cluster is an abnormal account.
  • each user account is used as a user node
  • the business data corresponding to each user account is the attribute feature of the user node
  • the user relationship is the edge.
  • FIG. 2 is a flow chart of another abnormal account identification method provided by the embodiment of the present application, which shows a specific method of calculating the node vector of each user node through the graph convolution network algorithm.
  • the technical solution is as follows:
  • Step S201 acquiring a plurality of user accounts and device attribute information associated with the user accounts, and determining a user association relationship between each user account in the plurality of user accounts according to the device attribute information;
  • Step S202 Obtain the service data corresponding to each user account, take each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, training with an inductive learning model of unsupervised learning to obtain the node vector of each user node; and
  • Step S203 performing clustering based on the node vector of each user node, and determining an abnormal account according to the clustering result.
  • an inductive learning model of unsupervised learning is used for training to obtain the node vector of each user node.
  • the GraphSage model is used for training to obtain the node vector of each user node.
  • the GraphSage model is used as an algorithm framework, which can easily obtain the representation of new nodes.
  • the method adopted by the GraphSage model is to learn how the information of a node is aggregated through the characteristics of its neighbor nodes.
  • the user node attribute characteristics and user association relationship of each user node are known, so that a representation of a new node can be obtained efficiently. Assume that it is necessary to aggregate the surrounding neighbor node information for K times.
  • Each aggregation is to aggregate the user node attribute characteristics of each user node obtained in the previous layer, and then assume the characteristics of the user node itself in the upper layer to obtain the Characteristics.
  • the final feature of the user node is obtained by repeating the aggregation K times in this way, and the user node feature of the bottom layer is the input user node feature.
  • FIG. 2a is a schematic diagram of a framework of a graph convolutional network algorithm provided by an embodiment of the present application.
  • V n is sampled from the negative sampling distribution P n (v) of node u
  • Q indicates the number of negative samples
  • u indicates the current node
  • v indicates the neighbors reachable by random walk
  • V n indicates negative sampling nodes
  • z indicates GraphSage
  • Each layer of GraphSage uses an aggregation function for the aggregation of neighbor node information.
  • the LSTM aggregation method is used. First, the neighbors are randomly sorted, and then the randomly sorted neighbor sequence embedding vectors are used as LSTM input.
  • the parameter setting method of the inductive learning model of unsupervised learning includes: aggregating the characteristics of neighbor nodes within two hops, and the aggregation method adopts long-short-term memory neural network for aggregation; when extracting user nodes, extract the first preset number of one-hop neighbor nodes, and a second preset number of two-hop neighbor nodes, where the second preset number of times is greater than the first preset number of times.
  • the GraphSage model as an example, its parameter settings and corresponding representation contents are as follows:
  • the setting of the above-mentioned parameters is a parameter value obtained after multiple experiments and has a better effect of identifying abnormal accounts.
  • each user account is used as a user node
  • the business data corresponding to each user account is the attribute feature of the user node
  • the user relationship is the edge
  • the inductive learning model is trained to obtain the node vector of each user node.
  • the aggregation method adopts the long short-term memory neural network for aggregation, when the user node is extracted, the first preset number of one-hop neighbor nodes and the second preset number of two-hop neighbor nodes are extracted, where the second preset The number of times is greater than the first preset number of times, realizing efficient, fast, and accurate node vector generation of user nodes, and finally improving the accuracy and efficiency of abnormal account identification.
  • FIG. 3 is a flow chart of another abnormal account identification method provided by the embodiment of the present application, which shows a specific clustering method based on the node vector of each user node. As shown in Figure 3, the technical solution is as follows:
  • Step S301 acquiring a plurality of user accounts and device attribute information associated with the user accounts, and determining a user association relationship between each user account in the plurality of user accounts according to the device attribute information;
  • Step S302 Obtain the service data corresponding to each user account, take each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, Obtaining the node vector of each user node by computing a graph convolutional network algorithm;
  • Step S303 clustering the node vectors of each user node through a density-based spatial clustering algorithm to obtain a plurality of clusters, and determining abnormal accounts according to the clustering results.
  • the method of obtaining the node vector of each user node through the graph convolutional network algorithm may be to use an inductive learning model of unsupervised learning to perform training to obtain the node vector of each user node; of course, it is also possible Other models are used for processing, but the processing effect is relatively worse than the inductive learning model of unsupervised learning.
  • an inductive learning model of unsupervised learning may be used for processing, but the processing effect is relatively worse than the inductive learning model of unsupervised learning.
  • the clustering algorithm adopts a density-based spatial clustering algorithm, specifically DBSCAN (Density-Based Spatial Clustering of Applications with Noise, a density-based clustering method with noise), which will have sufficient density Regions are divided into clusters, and clusters of arbitrary shape are found in a spatial database with noise.
  • the DBSCAN algorithm defines a "cluster" as the largest collection of density-connected points.
  • the determined embedding vectors of user nodes are trained using DBSCAN.
  • DBSCAN performs clustering according to the Euclidean distance between vectors, and clusters the nodes in the entire graph into N categories. Among them, the embedding vectors of abnormal accounts are densely clustered. As a result, they will be classified into the same cluster.
  • the calculation of the average value of the business data of all user accounts in each cluster can be calculated by using the step S103 explained in the section, and the clusters can be marked according to the calculation results and the preset logical judgment conditions; the marked results A user account in an abnormal cluster is determined as an abnormal account.
  • the logical judgment condition is that clusters greater than the preset average number of attention are marked as an example, and the preset average number of attention is exemplarily 200.
  • the above logical judgment condition is a condition marked after judgment for a single business data, and may also be a combined judgment of multiple business data, and the specific business data type is not limited. After the cluster 20 and the cluster 31 are marked, the user accounts in the cluster 20 and the cluster 31 are determined as abnormal accounts.
  • the node vector of each user node is clustered through the density-based spatial clustering algorithm to obtain multiple clusters, and the abnormal account is determined according to the clustering results, and the DBSCAN clustering algorithm is applied to specific The clustering process, because the algorithm divides the area with sufficient density into clusters, and finds clusters of any shape in the spatial database with noise, it can efficiently cluster the node vectors of the user account nodes, which is convenient for the final efficient, Quickly identify abnormal accounts.
  • FIG. 4 is a flow chart of another abnormal account identification method provided by the embodiment of the present application, which provides a real-time online method for determining whether a newly added user account is an abnormal account. As shown in Figure 4, the technical solution is as follows:
  • Step S401 acquiring a plurality of user accounts and device attribute information associated with the user accounts, and determining a user association relationship between each user account in the plurality of user accounts according to the device attribute information;
  • Step S402. Obtain the service data corresponding to each user account, take each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, Using an inductive learning model of unsupervised learning to train to obtain the node vector of each user node, and output the trained graphical model file;
  • Step S403 clustering the node vectors of each user node through a density-based spatial clustering algorithm to obtain multiple clusters, and output the trained clustering model file;
  • Step S404 calculate the average value of the business data of all user accounts in each cluster, mark the cluster according to the calculation result and the preset logical judgment conditions, and label the result as an abnormal user in the cluster
  • the account is determined to be an abnormal account
  • Step S405 obtain the newly added user node in real time, output the node vector through the training model recorded in the graph model file, and calculate the cluster cluster to which the node vector belongs through the training model recorded in the cluster model file to determine Whether the user account corresponding to the newly added user node is an abnormal account.
  • step S405 is performed after step S403, that is, after outputting the trained graph model file and clustering model file, and outputting the trained graph model file and clustering model file for real-time online Abnormal account identification.
  • the graphical model file and the clustering model file can be stored in the cache.
  • the training model recorded in the graphical model file outputs the node vector of the user account, and the training model recorded in the clustering model file
  • the model calculates the cluster to which the node vector belongs, and if it hits the cluster of abnormal accounts, it is determined that the newly added user account is an abnormal account, and corresponding risk control processing is performed.
  • step S405 is performed after step S404, that is, after identifying the abnormal account of the currently processed user account, the newly-added The user node is judged to determine whether the user account corresponding to the newly added user node is an abnormal account.
  • the execution order of the above steps S403 to S405 may be executed in the order of Step S403, Step S404 to Step S405, or executed in parallel with Step S404 and Step S405, and the specific execution order is not limited.
  • FIG. 5 is a structural block diagram of an abnormal account identification device provided by an embodiment of the present application.
  • the device is used to implement the abnormal account identification method provided in the above embodiment, and has corresponding functional modules and beneficial effects for executing the method.
  • the device specifically includes: a data acquisition module 101, a user association determination module 102, a vector calculation module 103, a cluster calculation module 104 and a result analysis module 105, wherein,
  • a data acquisition module 101 configured to acquire multiple user accounts and device attribute information associated with the user accounts, as well as business data corresponding to each user account;
  • a user association relationship determination module 102 configured to determine a user association relationship between each user account among the plurality of user accounts according to the device attribute information
  • the vector calculation module 103 is configured to use each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, which is calculated by a graph convolutional network algorithm The node vector of each user node;
  • a clustering calculation module 104 configured to perform clustering based on the node vector of each user node
  • the result analysis module 105 is configured to determine an abnormal account according to the clustering result.
  • each user account is used as a user node
  • the business data corresponding to each user account is the attribute feature of the user node
  • the user relationship is the edge.
  • the user association determination module 102 is specifically configured to:
  • the user association relationship between each user account is determined according to the device attribute association relationship.
  • the vector calculation module 103 is specifically configured to:
  • the node vector of each user node is obtained by training with an inductive learning model of unsupervised learning.
  • the parameter setting of the inductive learning model of the unsupervised learning includes:
  • the aggregation method uses long-term short-term memory neural network for aggregation;
  • one-hop neighbor nodes for a first preset number of times and two-hop neighbor nodes for a second preset number of times are extracted, and the second preset number of times is greater than the first preset number of times.
  • the cluster calculation module 104 is specifically configured to:
  • the node vectors of each user node are clustered through a density-based spatial clustering algorithm to obtain multiple clusters.
  • the result analysis module 105 is specifically used for:
  • the vector calculation module 103 is also used for:
  • the cluster calculation module 104 is also used for:
  • the trained clustering model file is output.
  • the data acquisition module 101 is also used to acquire newly added user nodes in real time, and the vector calculation module 103 is also used to output node vectors through the training model recorded in the graph model file;
  • the clustering calculation module 104 is also used to calculate the clustering cluster to which the node vector belongs through the training model recorded in the clustering model file, so that the result analysis module 105 can determine the corresponding Whether the user account is an abnormal account.
  • FIG. 6 is a schematic structural diagram of an abnormal account identification device provided in the embodiment of the present application.
  • the device includes a processor 201, a memory 202, an input device 203, and an output device 204;
  • the quantity can be one or more.
  • a processor 201 is taken as an example; the processor 201, memory 202, input device 203 and output device 204 in the device can be connected by a bus or in other ways, and in FIG. 6 by a bus Take connection as an example.
  • the memory 202 can be used to store software programs, computer-executable programs and modules, such as program instructions/modules corresponding to the abnormal account identification method in the embodiment of the present application.
  • the processor 201 executes various functional applications and data processing of the device by running the software programs, instructions and modules stored in the memory 202, that is, realizes the above-mentioned abnormal account identification method.
  • the input device 203 can be used to receive input numbers or character information, and generate key signal input related to user settings and function control of the device.
  • the output device 204 may include a display device such as a display screen.
  • the embodiment of the present application also provides a storage medium containing computer-executable instructions, the computer-executable instructions are used to perform an abnormal account identification method when executed by a computer processor, and the method includes:
  • the product network algorithm calculates and obtains the node vector of each user node
  • Clustering is performed based on the node vector of each user node, and an abnormal account is determined according to the clustering result.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Business, Economics & Management (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed in embodiments of the present application are an abnormal account identification method, apparatus and device, and a storage medium. The method comprises: obtaining a plurality of user accounts and device attribute information associated with the user accounts, and determining a user association relationship between the plurality of user accounts according to the device attribute information; obtaining service data corresponding to each user account, and calculating to obtain a node vector of each user node by means of a graph convolutional network algorithm by taking each user account as a user node, the service data corresponding to each user account as a user node attribute feature, and the user association relationship as an edge; and performing clustering on the basis of the node vector of each user node, and determining an abnormal account according to a clustering result. According to the present solution, abnormal users can be efficiently identified in batches, and the identification accuracy and the identification efficiency are high.

Description

非正常账号识别方法、装置、设备和存储介质Abnormal account identification method, device, equipment and storage medium 技术领域technical field
本申请实施例涉及计算机领域,尤其涉及一种非正常账号识别方法、装置、设备和存储介质。The embodiments of the present application relate to the field of computers, and in particular to a method, device, device, and storage medium for identifying an abnormal account.
背景技术Background technique
现有技术中识别非正常用户账号的方法通常采用机器学习分类算法或通过图算法及进行社群挖掘的方式。机器学习分类算法中,通过学习已有的非正常用户账号特征,从而预测出更多的非正常用户账号,但是分类算法容易忽略账号的社群特征。比如账号A和账号B在同一设备上活跃,可以认为是同一自然人操作,但是账号A已经作弊,账号B此时尚未作弊,那么B账号很难被预测出来。通过图算法进行社群挖掘的方式中,社群挖掘基于账号A和账号B的相同属性,从而连接到一个社群中,进而判断整个社群为非正常社群。然而,该种方式中,图节点的建立和社群挖掘,需要基于历史一段时间内用户和设备环境数据建立图谱,从而对图中用户进行社群类型划分和预测,由于历史数据量庞大、训练时间较长,因此绝大多数社群划分都应用在离线场景,并且无法对图中不存在的新增节点进行准确的划分。The method for identifying abnormal user accounts in the prior art usually adopts a machine learning classification algorithm or a graph algorithm and community mining. In the machine learning classification algorithm, more abnormal user accounts are predicted by learning the characteristics of existing abnormal user accounts, but the classification algorithm tends to ignore the community characteristics of accounts. For example, if account A and account B are active on the same device, it can be considered that they are operated by the same natural person, but account A has cheated and account B has not cheated at this time, then account B is difficult to predict. In the way of community mining through graph algorithm, community mining is based on the same attributes of account A and account B, so as to connect to a community, and then judge the entire community as an abnormal community. However, in this method, the establishment of graph nodes and community mining need to establish graphs based on user and device environment data in a period of time in history, so as to classify and predict the user community types in the graph. Due to the huge amount of historical data and training It takes a long time, so most of the community divisions are applied in offline scenarios, and it is impossible to accurately divide the newly added nodes that do not exist in the graph.
发明内容Contents of the invention
本申请实施例提供了一种非正常账号识别方法、装置、设备和存储介质,本方案可以高效的批量识别出非正常用户,识别准确率和识别效率更高。Embodiments of the present application provide a method, device, device, and storage medium for identifying abnormal accounts. This solution can efficiently identify abnormal users in batches, and the identification accuracy and efficiency are higher.
第一方面,本申请实施例提供了一种非正常账号识别方法,该方法包括:In the first aspect, the embodiment of the present application provides a method for identifying an abnormal account, which includes:
获取多个用户账号以及和用户账号关联的设备属性信息,根据所述设备属性信息确定所述多个用户账号中每个用户账号之间的用户关联关系;Obtaining multiple user accounts and device attribute information associated with the user accounts, and determining a user association relationship between each of the multiple user accounts according to the device attribute information;
获取所述每个用户账号对应的业务数据,以所述每个用户账号为用户节点,所述每个用户账号对应的业务数据为用户节点属性特征,所述用户关联关系为边,通过图卷积网络算法计算得到所述每个用户节点的节点向量;及Obtain the business data corresponding to each user account, take each user account as a user node, the business data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, through the graph volume The product network algorithm is calculated to obtain the node vector of each user node; and
基于所述每个用户节点的节点向量进行聚类,根据聚类结果确定非正常账 号。Clustering is performed based on the node vector of each user node, and an abnormal account is determined according to the clustering result.
第二方面,本申请实施例还提供了一种非正常账号识别装置,该装置包括:In the second aspect, the embodiment of the present application also provides an abnormal account identification device, which includes:
数据获取模块,用于获取多个用户账号以及和用户账号关联的设备属性信息,以及每个用户账号对应的业务数据;A data acquisition module, configured to acquire multiple user accounts and device attribute information associated with the user accounts, as well as business data corresponding to each user account;
用户关联关系确定模块,用于根据所述设备属性信息确定所述多个用户账号中每个用户账号之间的用户关联关系;A user association relationship determining module, configured to determine a user association relationship between each user account among the plurality of user accounts according to the device attribute information;
向量计算模块,用于以所述每个用户账号为用户节点,所述每个用户账号对应的业务数据为用户节点属性特征,所述用户关联关系为边,通过图卷积网络算法计算得到所述每个用户节点的节点向量;The vector calculation module is used to use each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, and the graph convolution network algorithm is used to calculate and obtain the Describe the node vector of each user node;
聚类计算模块,用于基于所述每个用户节点的节点向量进行聚类;及A clustering calculation module, configured to perform clustering based on the node vector of each user node; and
结果分析模块,用于根据聚类结果确定非正常账号。The result analysis module is used to determine the abnormal account according to the clustering result.
第三方面,本申请实施例还提供了一种非正常账号识别设备,该设备包括:In the third aspect, the embodiment of the present application also provides an abnormal account identification device, the device includes:
一个或多个处理器;one or more processors;
存储装置,用于存储一个或多个程序,storage means for storing one or more programs,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本申请实施例所述的非正常账号识别方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the abnormal account identification method described in the embodiment of the present application.
第四方面,本申请实施例还提供了一种存储计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行本申请实施例所述的非正常账号识别方法。In the fourth aspect, the embodiment of the present application also provides a storage medium storing computer-executable instructions, the computer-executable instructions are used to execute the abnormal account identification method described in the embodiment of the present application when executed by a computer processor .
在本申请实施例中,通过获取多个用户账号以及和用户账号关联的设备属性信息,根据设备属性信息确定多个用户账号中每个用户账号之间的用户关联关系,再获取每个用户账号对应的业务数据,以每个用户账号为用户节点,每个用户账号对应的业务数据为用户节点属性特征,用户关联关系为边,通过图卷积网络算法计算得到每个用户节点的节点向量后,基于每个用户节点的节点向量进行聚类,根据聚类结果确定非正常账号,由此可以高效的批量识别出非正常用户,识别准确率和识别效率更高。In this embodiment of the application, by obtaining multiple user accounts and the device attribute information associated with the user accounts, the user association relationship between each user account among the multiple user accounts is determined according to the device attribute information, and then each user account is obtained For the corresponding business data, each user account is used as a user node, the business data corresponding to each user account is the attribute feature of the user node, and the user relationship is the edge. After calculating the node vector of each user node through the graph convolution network algorithm , clustering is performed based on the node vector of each user node, and abnormal accounts are determined according to the clustering results, so that abnormal users can be efficiently identified in batches, and the identification accuracy and identification efficiency are higher.
附图说明Description of drawings
图1为本申请实施例提供的一种非正常账号识别方法的流程图;FIG. 1 is a flowchart of a method for identifying an abnormal account provided by an embodiment of the present application;
图1a为本申请实施例提供的一种用户账号和设备属性信息关联示意图;Fig. 1a is a schematic diagram of association between user account and device attribute information provided by the embodiment of the present application;
图2为本申请实施例提供的另一种非正常账号识别方法的流程图;FIG. 2 is a flow chart of another abnormal account identification method provided by the embodiment of the present application;
图2a为本申请实施例提供的一种图卷积网络算法的框架示意图;Figure 2a is a schematic diagram of a framework of a graph convolutional network algorithm provided by an embodiment of the present application;
图3为本申请实施例提供的另一种非正常账号识别方法的流程图;FIG. 3 is a flow chart of another abnormal account identification method provided by the embodiment of the present application;
图4为本申请实施例提供的另一种非正常账号识别方法的流程图;FIG. 4 is a flow chart of another abnormal account identification method provided by the embodiment of the present application;
图5为本申请实施例提供的一种非正常账号识别装置的结构框图;FIG. 5 is a structural block diagram of an abnormal account identification device provided by an embodiment of the present application;
图6为本申请实施例提供的一种设备的结构示意图。FIG. 6 is a schematic structural diagram of a device provided by an embodiment of the present application.
具体实施方式Detailed ways
图1为本申请实施例提供的一种非正常账号识别方法的流程图,本实施例可适用于用户登录、注册、社交等诸多应用软件的使用环节中,对非正常账号进行检测识别,其中非正常账号即恶意账号、马甲号、协议号等,区别于正常用户的账号,非正常账号存在批量操作、刷单、恶意操作等行为。该非正常账号识别方法可以由计算设备如服务器、系统应用主机等执行,具体包括以下步骤:Fig. 1 is a flow chart of an abnormal account identification method provided by the embodiment of the present application. This embodiment can be applied to the use of many application software such as user login, registration, social networking, etc., to detect and identify abnormal accounts, wherein Abnormal accounts are malicious accounts, vest accounts, agreement accounts, etc., which are different from normal user accounts. Abnormal accounts have behaviors such as batch operations, swiping orders, and malicious operations. The abnormal account identification method can be executed by a computing device such as a server, a system application host, etc., and specifically includes the following steps:
步骤S101、获取多个用户账号以及和用户账号关联的设备属性信息,根据所述设备属性信息确定所述多个用户账号中每个用户账号之间的用户关联关系;Step S101, acquiring a plurality of user accounts and device attribute information associated with the user accounts, and determining a user association relationship between each user account in the plurality of user accounts according to the device attribute information;
步骤S102、获取所述每个用户账号对应的业务数据,以所述每个用户账号为用户节点,所述每个用户账号对应的业务数据为用户节点属性特征,所述用户关联关系为边,通过图卷积网络算法计算得到所述每个用户节点的节点向量;及Step S102. Obtain the service data corresponding to each user account, take each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, Obtaining the node vector of each user node by computing a graph convolutional network algorithm; and
步骤S103、基于所述每个用户节点的节点向量进行聚类,根据聚类结果确定非正常账号。Step S103, performing clustering based on the node vector of each user node, and determining an abnormal account according to the clustering result.
其中,用户账号可以是用户在使用某款软件、登录某个论坛或视频网站等使用的账号,如在注册时分配的唯一的用户ID(UID)。一个用户可以注册一个或多个用户账号,用户账号可以使用相同或不同的登录设备进行登录,每次登录使用的网络地址可以相同也可以不同。用户通过使用用户账号登录后,可以执行相关的操作,如发送弹幕信息、留言评论、关注主播等。Wherein, the user account may be an account used by the user when using a certain software, logging in a forum or a video website, etc., such as a unique user ID (UID) assigned during registration. A user can register one or more user accounts, the user accounts can use the same or different login devices to log in, and the network addresses used for each login can be the same or different. After logging in with the user account, the user can perform related operations, such as sending barrage messages, leaving comments, following the host, etc.
在一个实施例中,首先获取多个用户账号以及和用户账号关联的设备属性 信息。该用户账号和设备属性信息可以是用户注册、登录使用过程中系统后台进行记录的信息。该设备属性信息为和用户账号关联的数据,如账号登录设备、使用的IP地址、绑定的手机号等。可选的,以时间为节点进行用户账号和设备属性信息的获取,该时间节点可以是三个月,即三个月内活跃的用户账号及关联的设备属性信息被获取。In one embodiment, first obtain multiple user accounts and device attribute information associated with the user accounts. The user account and device attribute information may be information recorded by the system background during user registration and login. The device attribute information is data associated with the user account, such as account login device, IP address used, bound mobile phone number, etc. Optionally, the acquisition of user account and device attribute information takes time as a node, and the time node may be three months, that is, active user accounts and associated device attribute information within three months are acquired.
其中,该用户账号以及设备属性信息可以以数据库表的形式进行存储。其记录形式和内容示例性的如下表所示:Wherein, the user account and device attribute information may be stored in the form of a database table. The form and content of its records are shown in the table below:
用户账号user account 登录设备log in device IP地址IP address 登录时间Log in time
uid1uid1
设备1equipment 1 ip1ip1 aaaaaa
uid2 uid2 设备1equipment 1 ip1ip1 bbb bbb
uid3uid3
设备2device 2 ip2ip2 ccc ccc
uid4uid4
设备1equipment 1 ip1ip1 ddd ddd
uid1uid1
设备1equipment 1 ip3ip3 eeeeee
uid3 uid3 设备2device 2 ip3ip3 ffffff
...... … ...... … ...... … ...... …
在一个实施例中,根据该设备属性信息确定多个用户账号中每个用户账号之间的用户关联关系。其中,该用户关联关系表征用户之间是否存在关联,是否存在关联可以是两个用户账号之间是否使用过相同的登录设备和、IP地址、手机号码等,即两个用户账号之间是否存在相同的设备属性信息,如果存在,则判定二者为关联关系,如果不存在,则判定此两个用户账号的用户关联关系为非关联关系。In one embodiment, the user association relationship between each user account among the multiple user accounts is determined according to the device attribute information. Wherein, the user association relationship represents whether there is an association between users, and whether there is an association can be whether two user accounts have used the same login device, IP address, mobile phone number, etc., that is, whether there is an association between the two user accounts. If the same device attribute information exists, it is determined that the two are in an association relationship, and if it does not exist, it is determined that the user association relationship between the two user accounts is a non-association relationship.
具体的,根据设备属性信息确定所述多个用户账号中每个用户账号之间的用户关联关系,包括:确定所述多个用户账号中每个用户账号和所述设备属性信息的设备属性关联关系;根据所述设备属性关联关系确定所述每个用户账号之间的用户关联关系。其中,该设备属性关联关系用于表征某个用户账号是否和某个设备属性信息关联,如用户账号通过某个登录设备或使用某个IP地址进行过登录,则该用户账号和登录的设备以及使用的IP地址为存在设备属性关联关系,反之则不存在设备属性关联关系。以上表记录的内容为例,uid1使用过ip1和ip3以及设备1进行登录,则uid1与ip1、ip3和设备1关联;uid2使用ip1 和设备1进行登录,则uid2与ip1、设备1关联;uid3使用ip2、ip3和设备2进行登录,则uid3与ip2、ip3和设备2关联;uid4使用ip1和设备1进行登录,则uid4与ip1和设备1关联。以图谱的形式进行表征可参考图1a,图1a为本申请实施例提供的一种用户账号和设备属性信息关联示意图。基于该设备属性关联关系确定每个用户账号之间的用户关联关系,具体的,针对两个用户账号之间在确定其是否存在关联关系时,其判断条件包括:当存在一个或多个相同的设备属性信息的关联关系时,判断其相互关联。以图1a为例,uid1与ip1关联,uid2与ip1关联,uid4与ip1关联,即uid1、uid2和uid4存在相同的设备属性信息(ip1),则确定uid1、uid2和uid4关联;uid1与ip3关联,uid3也与ip3关联,则确定uid1与uid3关联。可选的,在确定出用户关联关系后,可以将该关联关系以列表形式单独存储在数据库或缓存中也可与先前存储的数据库表整合。Specifically, determining the user association relationship between each user account in the plurality of user accounts according to the device attribute information includes: determining the device attribute association between each user account in the plurality of user accounts and the device attribute information relationship; determining the user association relationship between each user account according to the device attribute association relationship. Among them, the device attribute association relationship is used to represent whether a certain user account is associated with a certain device attribute information. The IP address used has a device attribute association relationship, otherwise, there is no device attribute association relationship. Take the content recorded in the above table as an example, uid1 has used ip1 and ip3 and device 1 to log in, then uid1 is associated with ip1, ip3 and device 1; uid2 uses ip1 and device 1 to log in, then uid2 is associated with ip1 and device 1; uid3 Use ip2, ip3 and device 2 to log in, then uid3 is associated with ip2, ip3 and device 2; uid4 uses ip1 to log in with device 1, then uid4 is associated with ip1 and device 1. For characterization in the form of a graph, reference may be made to FIG. 1a , which is a schematic diagram of the association between user accounts and device attribute information provided by an embodiment of the present application. Determine the user association relationship between each user account based on the device attribute association relationship. Specifically, when determining whether there is an association relationship between two user accounts, the judgment conditions include: when there are one or more identical When determining the association relationship of device attribute information, it is judged that they are related to each other. Taking Figure 1a as an example, uid1 is associated with ip1, uid2 is associated with ip1, and uid4 is associated with ip1, that is, uid1, uid2, and uid4 have the same device attribute information (ip1), then it is determined that uid1, uid2, and uid4 are associated; uid1 is associated with ip3 , uid3 is also associated with ip3, then it is determined that uid1 is associated with uid3. Optionally, after the user association relationship is determined, the association relationship can be stored in the database or cache separately in the form of a list, or can be integrated with a previously stored database table.
其中,业务数据指和用户账号相关的业务属性的数据。以直播应用为例,该业务数据可以是:用户国家码、注册设备型号、注册3天内发送私聊消息数、注册3天内发送私聊消息人数、注册3天内关注其他用户数、注册3天内观看直播时长、注册3天内打赏礼物等。在一个实施例中总计统计选取了52个维度的业务数据,即形成52维的属性特征,该属性特征可以以向量的形式进行表征。Wherein, the business data refers to data of business attributes related to the user account. Taking the live broadcast application as an example, the business data can be: user country code, registered device model, number of private chat messages sent within 3 days of registration, number of private chat messages sent within 3 days of registration, number of other users followed within 3 days of registration, and viewing within 3 days of registration The duration of the live broadcast, reward gifts within 3 days of registration, etc. In one embodiment, 52 dimensions of business data are selected for total statistics, that is, a 52-dimensional attribute feature is formed, and the attribute feature can be represented in the form of a vector.
在一个实施例中,在获取到用户账号、业务数据以及确定出用户关联关系后,使用图卷积网络算法进行计算以得到每个用户账号的节点向量。具体的,以每个用户账号为用户节点,每个用户账号对应的业务数据为用户节点属性特征,用户关联关系为边,通过图卷积网络算法计算得到每个用户节点的节点向量。针对用户节点,uid可以转换为index(索引)形式表征,业务数据即用户节点属性特征可以使用labelencoder(字符串编码函数)转换为数值变量形成属性向量,如(2,53,234,1,…,4)进行表征,针对用户关联关系其具体为关联的两个用户账号之间构建一条相连的边的形式进行表征。In one embodiment, after the user account and service data are acquired and the user association relationship is determined, a graph convolutional network algorithm is used for calculation to obtain a node vector of each user account. Specifically, each user account is used as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is the edge. The node vector of each user node is calculated through the graph convolutional network algorithm. For user nodes, uid can be converted into index (index) form representation, business data, that is, user node attribute characteristics, can be converted into numerical variables to form attribute vectors using labelencoder (string encoding function), such as (2, 53, 234, 1, ... , 4) to perform characterization, and to perform characterization in the form of a connected edge constructed between two associated user accounts specifically for the user association relationship.
其中,图卷积网络算法可以为基于频域或基于空域实现的算法。以基于频域为例,示例性的算法包括ChebNet算法、GCN等。以基于空域实现的算法为例,示例性的包括GraphSAGE模型算法。以GraphSAGE模型算法为例,对上述的用户节点、用户节点属性特征向量、边关系进行训练,计算得到每个用户 节点的embedding向量。Among them, the graph convolutional network algorithm can be an algorithm based on frequency domain or air domain. Taking frequency domain as an example, exemplary algorithms include ChebNet algorithm, GCN and so on. Taking an algorithm implemented based on the airspace as an example, an exemplary GraphSAGE model algorithm is included. Taking the GraphSAGE model algorithm as an example, the above-mentioned user nodes, user node attribute feature vectors, and edge relationships are trained to calculate the embedding vector of each user node.
在一个实施例中,通过使用聚类算法对每个用户节点的节点向量进行聚类以得到聚类结果,如得到多个聚类簇。其中,使用的聚类算法示例性的可以是k-means聚类算法、层次聚类算法、SOM聚类算法或FCM聚类算法等。In one embodiment, the node vector of each user node is clustered by using a clustering algorithm to obtain a clustering result, such as obtaining multiple clusters. Wherein, the clustering algorithm used may exemplarily be k-means clustering algorithm, hierarchical clustering algorithm, SOM clustering algorithm or FCM clustering algorithm, etc.
在得到聚类结果后,根据聚类结果来最终确定非正常账号。具体的,其确定非正常账号的方式包括下述任意一种或多种:根据已确定的非正常账号所在的聚类簇确定该簇下的用户账号为非正常账号;对每个聚类簇中的用户账号的业务数据进行分析,根据分析结果确定出非正常账号;根据人工识别标定,将被标定的聚类簇中的用户账号确定为非正常账号。After the clustering result is obtained, the abnormal account is finally determined according to the clustering result. Specifically, the way to determine the abnormal account includes any one or more of the following: according to the clustering cluster where the determined abnormal account is located, it is determined that the user account under the cluster is an abnormal account; Analyze the business data of the user accounts in , and determine the abnormal accounts according to the analysis results; according to manual identification and calibration, determine the user accounts in the calibrated clusters as abnormal accounts.
在一个实施例中,根据聚类结果确定非正常账号,包括:计算每个聚类簇中所有用户账号的业务数据的平均值,根据计算结果以及预设的逻辑判断条件对聚类簇进行标注;将标注结果为非正常的聚类簇中的用户账号确定为非正常账号。示例性的,以平均关注数、平均观看时长作为标注条件为例,对每个聚类簇中用户账号的平均关注数、平均观看时长进行统计,如果统计出其明显异于其他聚类簇,则确定该聚类簇下的用户账号为非正常账号。In one embodiment, determining abnormal accounts according to the clustering results includes: calculating the average value of business data of all user accounts in each cluster, and marking the clusters according to the calculation results and preset logical judgment conditions ; Determining the user accounts in the abnormal clusters marked as abnormal accounts. Illustratively, taking the average number of followers and the average viewing time as the labeling conditions as examples, the average number of followers and the average viewing time of user accounts in each cluster are counted, and if the statistics show that they are significantly different from other clusters, Then it is determined that the user account under the cluster is an abnormal account.
相应的,在确定出非正常账号后,对其进行相应的风控处理。Correspondingly, after determining the abnormal account, carry out corresponding risk control treatment on it.
由上述方案可知,通过获取多个用户账号以及和用户账号关联的设备属性信息,根据设备属性信息确定多个用户账号中每个用户账号之间的用户关联关系,再获取每个用户账号对应的业务数据,以每个用户账号为用户节点,每个用户账号对应的业务数据为用户节点属性特征,用户关联关系为边,通过图卷积网络算法计算得到每个用户节点的节点向量后,基于每个用户节点的节点向量进行聚类,根据聚类结果确定非正常账号,由此可以高效的批量识别出非正常用户,识别准确率和识别效率更高。It can be seen from the above solution that by obtaining multiple user accounts and device attribute information associated with the user accounts, the user association relationship between each user account among the multiple user accounts is determined according to the device attribute information, and then the user account corresponding to each user account is obtained. For business data, each user account is used as a user node, the business data corresponding to each user account is the attribute feature of the user node, and the user relationship is the edge. After calculating the node vector of each user node through the graph convolution network algorithm, based on The node vector of each user node is clustered, and abnormal accounts are determined according to the clustering results, so that abnormal users can be efficiently identified in batches, and the identification accuracy and identification efficiency are higher.
图2为本申请实施例提供的另一种非正常账号识别方法的流程图,给出了一种具体的通过图卷积网络算法计算得到所述每个用户节点的节点向量的方法。如图2所示,技术方案具体如下:FIG. 2 is a flow chart of another abnormal account identification method provided by the embodiment of the present application, which shows a specific method of calculating the node vector of each user node through the graph convolution network algorithm. As shown in Figure 2, the technical solution is as follows:
步骤S201、获取多个用户账号以及和用户账号关联的设备属性信息,根据所述设备属性信息确定所述多个用户账号中每个用户账号之间的用户关联关系;Step S201, acquiring a plurality of user accounts and device attribute information associated with the user accounts, and determining a user association relationship between each user account in the plurality of user accounts according to the device attribute information;
步骤S202、获取所述每个用户账号对应的业务数据,以所述每个用户账号 为用户节点,所述每个用户账号对应的业务数据为用户节点属性特征,所述用户关联关系为边,采用无监督学习的归纳学习模型进行训练得到所述每个用户节点的节点向量;及Step S202. Obtain the service data corresponding to each user account, take each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, training with an inductive learning model of unsupervised learning to obtain the node vector of each user node; and
步骤S203、基于所述每个用户节点的节点向量进行聚类,根据聚类结果确定非正常账号。Step S203, performing clustering based on the node vector of each user node, and determining an abnormal account according to the clustering result.
在一个实施例中,采用无监督学习的归纳学习模型进行训练得到所述每个用户节点的节点向量。如采用GraphSage模型进行训练得到所述每个用户节点的节点向量。其中,GraphSage模型作为一种算法框架,可以方便地得到新节点的表示。GraphSage模型采用的方式是学习一个节点的信息如何通过其邻居节点的特征聚合而来的。本方案中,已知各个用户节点的用户节点属性特征和用户关联关系,由此可以高效地得到一个新节点的表示。假设需要进行K次聚合周围邻居节点信息,每一次聚合,都是将上一层得到的各个用户节点的用户节点属性特征聚合一次,再假设该用户节点自身在上一层的特征,得到该层的特征。如此反复聚合K次,得到该用户节点的最后的特征,最下面一层的用户节点特征即为输入的用户节点特征。示例性的如图2a所示,图2a为本申请实施例提供的一种图卷积网络算法的框架示意图。In one embodiment, an inductive learning model of unsupervised learning is used for training to obtain the node vector of each user node. For example, the GraphSage model is used for training to obtain the node vector of each user node. Among them, the GraphSage model is used as an algorithm framework, which can easily obtain the representation of new nodes. The method adopted by the GraphSage model is to learn how the information of a node is aggregated through the characteristics of its neighbor nodes. In this solution, the user node attribute characteristics and user association relationship of each user node are known, so that a representation of a new node can be obtained efficiently. Assume that it is necessary to aggregate the surrounding neighbor node information for K times. Each aggregation is to aggregate the user node attribute characteristics of each user node obtained in the previous layer, and then assume the characteristics of the user node itself in the upper layer to obtain the Characteristics. The final feature of the user node is obtained by repeating the aggregation K times in this way, and the user node feature of the bottom layer is the input user node feature. An example is shown in FIG. 2a, which is a schematic diagram of a framework of a graph convolutional network algorithm provided by an embodiment of the present application.
该采用无监督学习表示的GraphSage的损失函数如下:The loss function of GraphSage represented by unsupervised learning is as follows:
Figure PCTCN2022096060-appb-000001
Figure PCTCN2022096060-appb-000001
V n是从节点u的负采样分布P n(v)采样得到,Q表示负采样的数量,u表示当前节点,v表示随机游走可到达的邻居,V n表示负采样节点,z表示GraphSage模型输出的embedding向量,两个embedding向量的相似度通过向量点击的方法得到。GraphSage每一层对邻居节点信息的聚合使用聚合函数,本实施例中采用LSTM的聚合方法,其中首先对邻居随机排序,然后将随机排序的邻居序列embedding向量作为LSTM输入。 V n is sampled from the negative sampling distribution P n (v) of node u, Q indicates the number of negative samples, u indicates the current node, v indicates the neighbors reachable by random walk, V n indicates negative sampling nodes, and z indicates GraphSage The embedding vector output by the model, the similarity between the two embedding vectors is obtained by the vector click method. Each layer of GraphSage uses an aggregation function for the aggregation of neighbor node information. In this embodiment, the LSTM aggregation method is used. First, the neighbors are randomly sorted, and then the randomly sorted neighbor sequence embedding vectors are used as LSTM input.
在一个实施例中,该无监督学习的归纳学习模型的参数设置方式包括:聚合两跳内的邻居节点特征,聚合方式采用长短期记忆神经网络进行聚合;用户节点抽取时,抽取第一预设次数的一跳邻居节点,以及第二预设次数的二跳邻 居节点,所述第二预设次数大于所述第一预设次数。具体的,以GraphSage模型为例,其参设置以及相应表征内容如下:In one embodiment, the parameter setting method of the inductive learning model of unsupervised learning includes: aggregating the characteristics of neighbor nodes within two hops, and the aggregation method adopts long-short-term memory neural network for aggregation; when extracting user nodes, extract the first preset number of one-hop neighbor nodes, and a second preset number of two-hop neighbor nodes, where the second preset number of times is greater than the first preset number of times. Specifically, taking the GraphSage model as an example, its parameter settings and corresponding representation contents are as follows:
K=2:聚合两跳内邻居特征;S1=3(表征第一预设次数),S2=5(表征第二预设次数):抽样时少量抽取一跳节点邻居,多抽取二跳节点;对每个节点进行步长为5的50次随机游走;负采样每个节点采样20个;聚合方式使用LSTM进行邻居聚合;embedding向量纬度50。最终得到每个用户节点的50维的50embedding向量。其中,上述参数的设置为多次实验后得出的具有较优识别非正常账号效果的参数数值。K=2: Aggregate the characteristics of neighbors within two hops; S1=3 (representing the first preset number of times), S2=5 (representing the second preset number of times): when sampling, a small number of neighbors of one-hop nodes are extracted, and more nodes of two hops are extracted; Perform 50 random walks with a step size of 5 for each node; negative sampling samples 20 for each node; the aggregation method uses LSTM for neighbor aggregation; embedding vector latitude 50. Finally, a 50-dimensional 50embedding vector of each user node is obtained. Wherein, the setting of the above-mentioned parameters is a parameter value obtained after multiple experiments and has a better effect of identifying abnormal accounts.
由上述方案可知,通过获取每个用户账号对应的业务数据,以每个用户账号为用户节点,每个用户账号对应的业务数据为用户节点属性特征,述用户关联关系为边,采用无监督学习的归纳学习模型进行训练得到所述每个用户节点的节点向量,通过GraphSAGE模型的使用,利用了其强大的归纳学习属性,同时采用无监督的学习训练方式,参数设置过程中,聚合两跳内的邻居节点特征,聚合方式采用长短期记忆神经网络进行聚合,用户节点抽取时,抽取第一预设次数的一跳邻居节点,以及第二预设次数的二跳邻居节点,其中第二预设次数大于第一预设次数,实现了高效、快速、准确的用户节点的节点向量生成,以最终提高了非正常账号识别的准确率和效率。It can be seen from the above scheme that by obtaining the business data corresponding to each user account, each user account is used as a user node, the business data corresponding to each user account is the attribute feature of the user node, and the user relationship is the edge, using unsupervised learning The inductive learning model is trained to obtain the node vector of each user node. Through the use of the GraphSAGE model, its powerful inductive learning attribute is used, and an unsupervised learning and training method is adopted. During the parameter setting process, the two-hop inner Neighbor node characteristics of the neighbor node, the aggregation method adopts the long short-term memory neural network for aggregation, when the user node is extracted, the first preset number of one-hop neighbor nodes and the second preset number of two-hop neighbor nodes are extracted, where the second preset The number of times is greater than the first preset number of times, realizing efficient, fast, and accurate node vector generation of user nodes, and finally improving the accuracy and efficiency of abnormal account identification.
图3为本申请实施例提供的另一种非正常账号识别方法的流程图,给出了一种具体的基于每个用户节点的节点向量进行聚类的方法。如图3所示,技术方案具体如下:FIG. 3 is a flow chart of another abnormal account identification method provided by the embodiment of the present application, which shows a specific clustering method based on the node vector of each user node. As shown in Figure 3, the technical solution is as follows:
步骤S301、获取多个用户账号以及和用户账号关联的设备属性信息,根据所述设备属性信息确定所述多个用户账号中每个用户账号之间的用户关联关系;Step S301, acquiring a plurality of user accounts and device attribute information associated with the user accounts, and determining a user association relationship between each user account in the plurality of user accounts according to the device attribute information;
步骤S302、获取所述每个用户账号对应的业务数据,以所述每个用户账号为用户节点,所述每个用户账号对应的业务数据为用户节点属性特征,所述用户关联关系为边,通过图卷积网络算法计算得到所述每个用户节点的节点向量;及Step S302. Obtain the service data corresponding to each user account, take each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, Obtaining the node vector of each user node by computing a graph convolutional network algorithm; and
步骤S303、通过基于密度的空间聚类算法对所述每个用户节点的节点向量进行聚类得到多个聚类簇,根据聚类结果确定非正常账号。Step S303, clustering the node vectors of each user node through a density-based spatial clustering algorithm to obtain a plurality of clusters, and determining abnormal accounts according to the clustering results.
可选的,该通过图卷积网络算法计算得到所述每个用户节点的节点向量的 方式可以是采用无监督学习的归纳学习模型进行训练得到所述每个用户节点的节点向量;当然也可以采用其他模型进行处理,但处理效果相对差于无监督学习的归纳学习模型,该模型的具体内容参见步骤S202的解释部分,此处不在赘述。Optionally, the method of obtaining the node vector of each user node through the graph convolutional network algorithm may be to use an inductive learning model of unsupervised learning to perform training to obtain the node vector of each user node; of course, it is also possible Other models are used for processing, but the processing effect is relatively worse than the inductive learning model of unsupervised learning. For the specific content of this model, please refer to the explanation part of step S202, which will not be repeated here.
在一个实施例中,聚类算法采用基于密度的空间聚类算法,具体为DBSCAN(Density-Based Spatial Clustering of Applications with Noise,具有噪声的基于密度的聚类方法),该算法将具有足够密度的区域划分为簇,并在具有噪声的空间数据库中发现任意形状的簇,DBSCAN算法将“簇”定义为密度相连的点的最大集合。具体的,将确定出的用户节点的embedding向量使用DBSCAN进行训练,DBSCAN根据向量间的欧式距离进行簇聚类,将整个图中节点聚成N类,其中,非正常账号的embedding向量聚集密集,由此会被划分到同一个聚类簇中。In one embodiment, the clustering algorithm adopts a density-based spatial clustering algorithm, specifically DBSCAN (Density-Based Spatial Clustering of Applications with Noise, a density-based clustering method with noise), which will have sufficient density Regions are divided into clusters, and clusters of arbitrary shape are found in a spatial database with noise. The DBSCAN algorithm defines a "cluster" as the largest collection of density-connected points. Specifically, the determined embedding vectors of user nodes are trained using DBSCAN. DBSCAN performs clustering according to the Euclidean distance between vectors, and clusters the nodes in the entire graph into N categories. Among them, the embedding vectors of abnormal accounts are densely clustered. As a result, they will be classified into the same cluster.
相应的,在得到多个聚类簇后,对该多个聚类簇中的数据进行分析以确定出非正常账号。可选的,可以采用步骤S103解释部分提及的计算每个聚类簇中所有用户账号的业务数据的平均值,根据计算结果以及预设的逻辑判断条件对聚类簇进行标注;将标注结果为非正常的聚类簇中的用户账号确定为非正常账号。具体的,以业务数据为平均关注数,逻辑判断条件为大于预设平均关注数的簇被进行标注为例,该预设平均关注数示例性的为200。假定当前确定出50个聚类簇,通过对每个聚类簇中用户关注数求取平均值后,发现簇20和簇31对应的平均关注数分别为300和500,则相应的簇20和簇31被标注。需要说明的是,上述逻辑判断条件为针对单一业务数据进行判断后标注的条件,还可以是多个业务数据的组合判断,具体的业务数据类型不做限定。在簇20和簇31被标注后,将簇20和簇31中的用户账号确定为非正常账号。Correspondingly, after obtaining multiple clusters, analyze the data in the multiple clusters to determine abnormal accounts. Optionally, the calculation of the average value of the business data of all user accounts in each cluster can be calculated by using the step S103 explained in the section, and the clusters can be marked according to the calculation results and the preset logical judgment conditions; the marked results A user account in an abnormal cluster is determined as an abnormal account. Specifically, take the business data as the average number of attention, and the logical judgment condition is that clusters greater than the preset average number of attention are marked as an example, and the preset average number of attention is exemplarily 200. Assuming that 50 clusters are currently determined, after calculating the average number of user attention in each cluster, it is found that the average number of attention corresponding to cluster 20 and cluster 31 is 300 and 500 respectively, then the corresponding cluster 20 and Cluster 31 is labeled. It should be noted that the above logical judgment condition is a condition marked after judgment for a single business data, and may also be a combined judgment of multiple business data, and the specific business data type is not limited. After the cluster 20 and the cluster 31 are marked, the user accounts in the cluster 20 and the cluster 31 are determined as abnormal accounts.
由上述方案可知,通过基于密度的空间聚类算法对所述每个用户节点的节点向量进行聚类得到多个聚类簇,根据聚类结果确定非正常账号,将DBSCAN聚类算法应用于具体的聚类过程,由于该算法将具有足够密度的区域划分为簇,并在具有噪声的空间数据库中发现任意形状的簇,可以高效的针对用户账号节点的节点向量进行聚类,便于最终高效、快速的进行非正常账号的识别。It can be seen from the above scheme that the node vector of each user node is clustered through the density-based spatial clustering algorithm to obtain multiple clusters, and the abnormal account is determined according to the clustering results, and the DBSCAN clustering algorithm is applied to specific The clustering process, because the algorithm divides the area with sufficient density into clusters, and finds clusters of any shape in the spatial database with noise, it can efficiently cluster the node vectors of the user account nodes, which is convenient for the final efficient, Quickly identify abnormal accounts.
图4为本申请实施例提供的另一种非正常账号识别方法的流程图,给出了一种实时在线针对新增用户账号确定其是否为非正常账号的方法。如图4所示,技术方案具体如下:FIG. 4 is a flow chart of another abnormal account identification method provided by the embodiment of the present application, which provides a real-time online method for determining whether a newly added user account is an abnormal account. As shown in Figure 4, the technical solution is as follows:
步骤S401、获取多个用户账号以及和用户账号关联的设备属性信息,根据所述设备属性信息确定所述多个用户账号中每个用户账号之间的用户关联关系;Step S401, acquiring a plurality of user accounts and device attribute information associated with the user accounts, and determining a user association relationship between each user account in the plurality of user accounts according to the device attribute information;
步骤S402、获取所述每个用户账号对应的业务数据,以所述每个用户账号为用户节点,所述每个用户账号对应的业务数据为用户节点属性特征,所述用户关联关系为边,采用无监督学习的归纳学习模型进行训练得到所述每个用户节点的节点向量,并输出训练完成的图模型文件;Step S402. Obtain the service data corresponding to each user account, take each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, Using an inductive learning model of unsupervised learning to train to obtain the node vector of each user node, and output the trained graphical model file;
步骤S403、通过基于密度的空间聚类算法对所述每个用户节点的节点向量进行聚类得到多个聚类簇,并输出训练完成的聚类模型文件;Step S403, clustering the node vectors of each user node through a density-based spatial clustering algorithm to obtain multiple clusters, and output the trained clustering model file;
步骤S404、计算每个聚类簇中所有用户账号的业务数据的平均值,根据计算结果以及预设的逻辑判断条件对聚类簇进行标注,将标注结果为非正常的聚类簇中的用户账号确定为非正常账号;及Step S404, calculate the average value of the business data of all user accounts in each cluster, mark the cluster according to the calculation result and the preset logical judgment conditions, and label the result as an abnormal user in the cluster The account is determined to be an abnormal account; and
步骤S405、实时获取新增的用户节点,通过所述图模型文件记录的训练模型输出节点向量,通过所述聚类模型文件记录的训练模型计算得到所述节点向量所属的聚类簇,以确定所述新增的用户节点对应的用户账号是否为非正常账号。Step S405, obtain the newly added user node in real time, output the node vector through the training model recorded in the graph model file, and calculate the cluster cluster to which the node vector belongs through the training model recorded in the cluster model file to determine Whether the user account corresponding to the newly added user node is an abnormal account.
在一个实施例中,该步骤S405在步骤S403之后,即输出训练完成的图模型文件和聚类模型文件之后执行,将训练完毕的图模型文件以及聚类模型文件进行输出以用于实时在线的非正常账号识别。示例性的,可将图模型文件以及聚类模型文件存储至缓存中,当新增用户节点时,通过图模型文件记录的训练模型输出该用户账号的节点向量,通过聚类模型文件记录的训练模型计算得到该节点向量所属的聚类簇,如果其命中非正常账号的聚类簇,则确定该新增的用户账号为非正常账号,进行相应的风控处理。In one embodiment, step S405 is performed after step S403, that is, after outputting the trained graph model file and clustering model file, and outputting the trained graph model file and clustering model file for real-time online Abnormal account identification. Exemplarily, the graphical model file and the clustering model file can be stored in the cache. When a new user node is added, the training model recorded in the graphical model file outputs the node vector of the user account, and the training model recorded in the clustering model file The model calculates the cluster to which the node vector belongs, and if it hits the cluster of abnormal accounts, it is determined that the newly added user account is an abnormal account, and corresponding risk control processing is performed.
在另一个实施例中,该步骤S405在步骤S404之后执行,即在对当前处理的用户账号进行非正常账号识别后,进一步的通过输出的训练完成的图模型文件和聚类模型文件对新增的用户节点进行判断,以确定该新增的用户节点对应的用户账号是否为非正常账号。上述步骤S403至步骤S405的执行顺序可以是 步骤S403、步骤S404到步骤S405的顺序执行,也可以是步骤S404和步骤S405并列执行,具体的执行顺序不做限定。In another embodiment, step S405 is performed after step S404, that is, after identifying the abnormal account of the currently processed user account, the newly-added The user node is judged to determine whether the user account corresponding to the newly added user node is an abnormal account. The execution order of the above steps S403 to S405 may be executed in the order of Step S403, Step S404 to Step S405, or executed in parallel with Step S404 and Step S405, and the specific execution order is not limited.
由上述方案可知,通过实时获取新增的用户节点,通过图模型文件记录的训练模型输出节点向量,通过聚类模型文件记录的训练模型计算得到节点向量所属的聚类簇,以确定所述新增的用户节点对应的用户账号是否为非正常账号,其中图模型文件基于GraphSage无监督学习训练得到,聚类模型文件通过DBSCAN算法对节点向量聚类训练得到,可以实现实时、在线的对用户账号是否为非正常账号的识别。It can be seen from the above scheme that by acquiring newly added user nodes in real time, outputting node vectors through the training model recorded in the graph model file, and calculating the cluster cluster to which the node vector belongs through the training model recorded in the clustering model file, to determine the new Whether the user account corresponding to the added user node is an abnormal account, in which the graph model file is obtained based on GraphSage unsupervised learning training, and the clustering model file is obtained through the DBSCAN algorithm for node vector clustering training, which can realize real-time and online user account Identification of whether it is an abnormal account.
图5为本申请实施例提供的一种非正常账号识别装置的结构框图,该装置用于执行上述实施例提供的非正常账号识别方法,具备执行方法相应的功能模块和有益效果。如图5所示,该装置具体包括:数据获取模块101、用户关联关系确定模块102、向量计算模块103、聚类计算模块104和结果分析模块105,其中,FIG. 5 is a structural block diagram of an abnormal account identification device provided by an embodiment of the present application. The device is used to implement the abnormal account identification method provided in the above embodiment, and has corresponding functional modules and beneficial effects for executing the method. As shown in Figure 5, the device specifically includes: a data acquisition module 101, a user association determination module 102, a vector calculation module 103, a cluster calculation module 104 and a result analysis module 105, wherein,
数据获取模块101,用于获取多个用户账号以及和用户账号关联的设备属性信息,以及每个用户账号对应的业务数据;A data acquisition module 101, configured to acquire multiple user accounts and device attribute information associated with the user accounts, as well as business data corresponding to each user account;
用户关联关系确定模块102,用于根据所述设备属性信息确定所述多个用户账号中每个用户账号之间的用户关联关系;A user association relationship determination module 102, configured to determine a user association relationship between each user account among the plurality of user accounts according to the device attribute information;
向量计算模块103,用于以所述每个用户账号为用户节点,所述每个用户账号对应的业务数据为用户节点属性特征,所述用户关联关系为边,通过图卷积网络算法计算得到所述每个用户节点的节点向量;The vector calculation module 103 is configured to use each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, which is calculated by a graph convolutional network algorithm The node vector of each user node;
聚类计算模块104,用于基于所述每个用户节点的节点向量进行聚类;A clustering calculation module 104, configured to perform clustering based on the node vector of each user node;
结果分析模块105,用于根据聚类结果确定非正常账号。The result analysis module 105 is configured to determine an abnormal account according to the clustering result.
由上述方案可知,通过获取多个用户账号以及和用户账号关联的设备属性信息,根据设备属性信息确定多个用户账号中每个用户账号之间的用户关联关系,再获取每个用户账号对应的业务数据,以每个用户账号为用户节点,每个用户账号对应的业务数据为用户节点属性特征,用户关联关系为边,通过图卷积网络算法计算得到每个用户节点的节点向量后,基于每个用户节点的节点向量进行聚类,根据聚类结果确定非正常账号,由此可以高效的批量识别出非正常用户,识别准确率和识别效率更高。It can be seen from the above solution that by obtaining multiple user accounts and device attribute information associated with the user accounts, the user association relationship between each user account among the multiple user accounts is determined according to the device attribute information, and then the user account corresponding to each user account is obtained. For business data, each user account is used as a user node, the business data corresponding to each user account is the attribute feature of the user node, and the user relationship is the edge. After calculating the node vector of each user node through the graph convolution network algorithm, based on The node vector of each user node is clustered, and abnormal accounts are determined according to the clustering results, so that abnormal users can be efficiently identified in batches, and the identification accuracy and identification efficiency are higher.
在一个可能的实施例中,所述用户关联关系确定模块102具体用于:In a possible embodiment, the user association determination module 102 is specifically configured to:
确定所述多个用户账号中每个用户账号和所述设备属性信息的设备属性关联关系;Determine the device attribute association relationship between each user account in the plurality of user accounts and the device attribute information;
根据所述设备属性关联关系确定所述每个用户账号之间的用户关联关系。The user association relationship between each user account is determined according to the device attribute association relationship.
在一个可能的实施例中,所述向量计算模块103具体用于:In a possible embodiment, the vector calculation module 103 is specifically configured to:
采用无监督学习的归纳学习模型进行训练得到所述每个用户节点的节点向量。The node vector of each user node is obtained by training with an inductive learning model of unsupervised learning.
在一个可能的实施例中,所述无监督学习的归纳学习模型的参数设置包括:In a possible embodiment, the parameter setting of the inductive learning model of the unsupervised learning includes:
聚合两跳内的邻居节点特征,聚合方式采用长短期记忆神经网络进行聚合;Aggregate the characteristics of neighbor nodes within two hops, and the aggregation method uses long-term short-term memory neural network for aggregation;
用户节点抽取时,抽取第一预设次数的一跳邻居节点,以及第二预设次数的二跳邻居节点,所述第二预设次数大于所述第一预设次数。When extracting user nodes, one-hop neighbor nodes for a first preset number of times and two-hop neighbor nodes for a second preset number of times are extracted, and the second preset number of times is greater than the first preset number of times.
在一个可能的实施例中,所述聚类计算模块104具体用于:In a possible embodiment, the cluster calculation module 104 is specifically configured to:
通过基于密度的空间聚类算法对所述每个用户节点的节点向量进行聚类,得到多个聚类簇。The node vectors of each user node are clustered through a density-based spatial clustering algorithm to obtain multiple clusters.
在一个可能的实施例中,所述结果分析模块105具体用于:In a possible embodiment, the result analysis module 105 is specifically used for:
计算每个聚类簇中所有用户账号的业务数据的平均值,根据计算结果以及预设的逻辑判断条件对聚类簇进行标注;Calculate the average value of the business data of all user accounts in each cluster, and mark the clusters according to the calculation results and preset logical judgment conditions;
将标注结果为非正常的聚类簇中的用户账号确定为非正常账号。Determining the user accounts in the abnormal clusters marked as abnormal accounts.
在一个可能的实施例中,所述向量计算模块103还用于:In a possible embodiment, the vector calculation module 103 is also used for:
在通过图卷积网络算法计算得到所述每个用户节点的节点向量之后,输出训练完成的图模型文件;After obtaining the node vector of each user node through the graph convolutional network algorithm, output the graph model file that has been trained;
所述聚类计算模块104还用于:The cluster calculation module 104 is also used for:
在基于所述每个用户节点的节点向量进行聚类之后,输出训练完成的聚类模型文件。After the clustering is performed based on the node vector of each user node, the trained clustering model file is output.
在一个可能的实施例中,所述数据获取模块101还用于实时获取新增的用户节点,所述向量计算模块103还用于通过所述图模型文件记录的训练模型输出节点向量;所述聚类计算模块104还用于通过所述聚类模型文件记录的训练模型计算得到所述节点向量所属的聚类簇,以用于所述结果分析模块105确定所述新增的用户节点对应的用户账号是否为非正常账号。In a possible embodiment, the data acquisition module 101 is also used to acquire newly added user nodes in real time, and the vector calculation module 103 is also used to output node vectors through the training model recorded in the graph model file; The clustering calculation module 104 is also used to calculate the clustering cluster to which the node vector belongs through the training model recorded in the clustering model file, so that the result analysis module 105 can determine the corresponding Whether the user account is an abnormal account.
图6为本申请实施例提供的一种非正常账号识别设备的结构示意图,如图6所示,该设备包括处理器201、存储器202、输入装置203和输出装置204;设备中处理器201的数量可以是一个或多个,图6中以一个处理器201为例;设备中的处理器201、存储器202、输入装置203和输出装置204可以通过总线或其他方式连接,图6中以通过总线连接为例。存储器202作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本申请实施例中的非正常账号识别方法对应的程序指令/模块。处理器201通过运行存储在存储器202中的软件程序、指令以及模块,从而执行设备的各种功能应用以及数据处理,即实现上述的非正常账号识别方法。输入装置203可用于接收输入的数字或字符信息,以及产生与设备的用户设置以及功能控制有关的键信号输入。输出装置204可包括显示屏等显示设备。FIG. 6 is a schematic structural diagram of an abnormal account identification device provided in the embodiment of the present application. As shown in FIG. 6, the device includes a processor 201, a memory 202, an input device 203, and an output device 204; The quantity can be one or more. In FIG. 6, a processor 201 is taken as an example; the processor 201, memory 202, input device 203 and output device 204 in the device can be connected by a bus or in other ways, and in FIG. 6 by a bus Take connection as an example. As a computer-readable storage medium, the memory 202 can be used to store software programs, computer-executable programs and modules, such as program instructions/modules corresponding to the abnormal account identification method in the embodiment of the present application. The processor 201 executes various functional applications and data processing of the device by running the software programs, instructions and modules stored in the memory 202, that is, realizes the above-mentioned abnormal account identification method. The input device 203 can be used to receive input numbers or character information, and generate key signal input related to user settings and function control of the device. The output device 204 may include a display device such as a display screen.
本申请实施例还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行一种非正常账号识别方法,该方法包括:The embodiment of the present application also provides a storage medium containing computer-executable instructions, the computer-executable instructions are used to perform an abnormal account identification method when executed by a computer processor, and the method includes:
获取多个用户账号以及和用户账号关联的设备属性信息,根据所述设备属性信息确定所述多个用户账号中每个用户账号之间的用户关联关系;Obtaining multiple user accounts and device attribute information associated with the user accounts, and determining a user association relationship between each of the multiple user accounts according to the device attribute information;
获取所述每个用户账号对应的业务数据,以所述每个用户账号为用户节点,所述每个用户账号对应的业务数据为用户节点属性特征,所述用户关联关系为边,通过图卷积网络算法计算得到所述每个用户节点的节点向量;Obtain the business data corresponding to each user account, take each user account as a user node, the business data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, through the graph volume The product network algorithm calculates and obtains the node vector of each user node;
基于所述每个用户节点的节点向量进行聚类,根据聚类结果确定非正常账号。Clustering is performed based on the node vector of each user node, and an abnormal account is determined according to the clustering result.

Claims (11)

  1. 非正常账号识别方法,其特征在于,包括:The abnormal account identification method is characterized in that it includes:
    获取多个用户账号以及和用户账号关联的设备属性信息,根据所述设备属性信息确定所述多个用户账号中每个用户账号之间的用户关联关系;Obtaining multiple user accounts and device attribute information associated with the user accounts, and determining a user association relationship between each of the multiple user accounts according to the device attribute information;
    获取所述每个用户账号对应的业务数据,以所述每个用户账号为用户节点,所述每个用户账号对应的业务数据为用户节点属性特征,所述用户关联关系为边,通过图卷积网络算法计算得到所述每个用户节点的节点向量;及Obtain the business data corresponding to each user account, take each user account as a user node, the business data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, through the graph volume The product network algorithm is calculated to obtain the node vector of each user node; and
    基于所述每个用户节点的节点向量进行聚类,根据聚类结果确定非正常账号。Clustering is performed based on the node vector of each user node, and an abnormal account is determined according to the clustering result.
  2. 根据权利要求1所述的非正常账号识别方法,其特征在于,所述根据所述设备属性信息确定所述多个用户账号中每个用户账号之间的用户关联关系,包括:The method for identifying abnormal accounts according to claim 1, wherein the determining the user association relationship between each of the plurality of user accounts according to the device attribute information includes:
    确定所述多个用户账号中每个用户账号和所述设备属性信息的设备属性关联关系;Determine the device attribute association relationship between each user account in the plurality of user accounts and the device attribute information;
    根据所述设备属性关联关系确定所述每个用户账号之间的用户关联关系。The user association relationship between each user account is determined according to the device attribute association relationship.
  3. 根据权利要求1所述的非正常账号识别方法,其特征在于,所述通过图卷积网络算法计算得到所述每个用户节点的节点向量,包括:The abnormal account identification method according to claim 1, wherein the node vector of each user node obtained through the calculation of the graph convolution network algorithm includes:
    采用无监督学习的归纳学习模型进行训练得到所述每个用户节点的节点向量。The node vector of each user node is obtained by training with an inductive learning model of unsupervised learning.
  4. 根据权利要求3所述的非正常账号识别方法,其特征在于,所述无监督学习的归纳学习模型的参数设置包括:The abnormal account identification method according to claim 3, wherein the parameter setting of the inductive learning model of the unsupervised learning comprises:
    聚合两跳内的邻居节点特征,聚合方式采用长短期记忆神经网络进行聚合;Aggregate the characteristics of neighbor nodes within two hops, and the aggregation method uses long-term short-term memory neural network for aggregation;
    用户节点抽取时,抽取第一预设次数的一跳邻居节点,以及第二预设次数的二跳邻居节点,所述第二预设次数大于所述第一预设次数。When extracting user nodes, one-hop neighbor nodes for a first preset number of times and two-hop neighbor nodes for a second preset number of times are extracted, and the second preset number of times is greater than the first preset number of times.
  5. 根据权利要求1所述的非正常账号识别方法,其特征在于,所述基于所述每个用户节点的节点向量进行聚类,包括:The abnormal account identification method according to claim 1, wherein the clustering based on the node vector of each user node includes:
    通过基于密度的空间聚类算法对所述每个用户节点的节点向量进行聚类,得到多个聚类簇。The node vectors of each user node are clustered through a density-based spatial clustering algorithm to obtain multiple clusters.
  6. 根据权利要求5所述的非正常账号识别方法,其特征在于,所述根据聚类结果确定非正常账号,包括:The abnormal account identification method according to claim 5, wherein said determining the abnormal account according to the clustering result comprises:
    计算每个聚类簇中所有用户账号的业务数据的平均值,根据计算结果以及预设的逻辑判断条件对聚类簇进行标注;及Calculate the average value of the business data of all user accounts in each cluster, and mark the clusters according to the calculation results and preset logical judgment conditions; and
    将标注结果为非正常的聚类簇中的用户账号确定为非正常账号。Determining the user accounts in the abnormal clusters marked as abnormal accounts.
  7. 根据权利要求1-6中任一项所述的非正常账号识别方法,其特征在于,在通过图卷积网络算法计算得到所述每个用户节点的节点向量之后,还包括:The abnormal account identification method according to any one of claims 1-6, characterized in that, after calculating the node vector of each user node through the graph convolutional network algorithm, further comprising:
    输出训练完成的图模型文件;Output the trained graph model file;
    在基于所述每个用户节点的节点向量进行聚类之后,还包括:After performing clustering based on the node vector of each user node, it also includes:
    输出训练完成的聚类模型文件。Output the clustering model file after training.
  8. 根据权利要求7所述的非正常账号识别方法,其特征在于,还包括:The abnormal account identification method according to claim 7, further comprising:
    实时获取新增的用户节点,通过所述图模型文件记录的训练模型输出节点向量;及Obtaining newly added user nodes in real time, and outputting node vectors through the training model recorded in the graph model file; and
    通过所述聚类模型文件记录的训练模型计算得到所述节点向量所属的聚类簇,以确定所述新增的用户节点对应的用户账号是否为非正常账号。The clustering cluster to which the node vector belongs is calculated through the training model recorded in the clustering model file, so as to determine whether the user account corresponding to the newly added user node is an abnormal account.
  9. 非正常账号识别装置,包括:Abnormal account identification devices, including:
    数据获取模块,用于获取多个用户账号以及和用户账号关联的设备属性信息,以及每个用户账号对应的业务数据;A data acquisition module, configured to acquire multiple user accounts and device attribute information associated with the user accounts, as well as business data corresponding to each user account;
    用户关联关系确定模块,用于根据所述设备属性信息确定所述多个用户账号中每个用户账号之间的用户关联关系;A user association relationship determining module, configured to determine a user association relationship between each user account among the plurality of user accounts according to the device attribute information;
    向量计算模块,用于以所述每个用户账号为用户节点,所述每个用户账号对应的业务数据为用户节点属性特征,所述用户关联关系为边,通过图卷积网络算法计算得到所述每个用户节点的节点向量;The vector calculation module is used to use each user account as a user node, the service data corresponding to each user account is the attribute feature of the user node, and the user association relationship is an edge, and the graph convolution network algorithm is used to calculate and obtain the Describe the node vector of each user node;
    聚类计算模块,用于基于所述每个用户节点的节点向量进行聚类;及A clustering calculation module, configured to perform clustering based on the node vector of each user node; and
    结果分析模块,用于根据聚类结果确定非正常账号。The result analysis module is used to determine the abnormal account according to the clustering result.
  10. 一种非正常账号识别设备,所述设备包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-8中任一项所述的非正常账号识别方法。An abnormal account identification device, the device includes: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more processors , so that the one or more processors implement the abnormal account identification method according to any one of claims 1-8.
  11. 一种存储计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如权利要求1-8中任一项所述的非正常账号识别 方法。A storage medium storing computer-executable instructions, the computer-executable instructions are used to execute the abnormal account identification method according to any one of claims 1-8 when executed by a computer processor.
PCT/CN2022/096060 2021-05-28 2022-05-30 Abnormal account identification method, apparatus and device, and storage medium WO2022247955A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110593675.1A CN113378899B (en) 2021-05-28 2021-05-28 Abnormal account identification method, device, equipment and storage medium
CN202110593675.1 2021-05-28

Publications (1)

Publication Number Publication Date
WO2022247955A1 true WO2022247955A1 (en) 2022-12-01

Family

ID=77574848

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/096060 WO2022247955A1 (en) 2021-05-28 2022-05-30 Abnormal account identification method, apparatus and device, and storage medium

Country Status (2)

Country Link
CN (1) CN113378899B (en)
WO (1) WO2022247955A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116542673A (en) * 2023-07-05 2023-08-04 成都乐超人科技有限公司 Fraud identification method and system applied to machine learning

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378899B (en) * 2021-05-28 2024-05-28 百果园技术(新加坡)有限公司 Abnormal account identification method, device, equipment and storage medium
CN113806518A (en) * 2021-09-23 2021-12-17 湖北天天数链技术有限公司 Matching method and device, resume recommendation method and device
CN114581693B (en) * 2022-03-07 2023-11-03 支付宝(杭州)信息技术有限公司 User behavior mode distinguishing method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263227A (en) * 2019-05-15 2019-09-20 阿里巴巴集团控股有限公司 Clique based on figure neural network finds method and system
CN110784470A (en) * 2019-10-30 2020-02-11 上海观安信息技术股份有限公司 Method and device for determining abnormal login of user
US20200311844A1 (en) * 2019-03-27 2020-10-01 Uber Technologies, Inc. Identifying duplicate user accounts in an identification document processing system
CN111882446A (en) * 2020-07-28 2020-11-03 哈尔滨工业大学(威海) Abnormal account detection method based on graph convolution network
CN112116007A (en) * 2020-09-18 2020-12-22 四川长虹电器股份有限公司 Batch registration account detection method based on graph algorithm and clustering algorithm
CN113378899A (en) * 2021-05-28 2021-09-10 百果园技术(新加坡)有限公司 Abnormal account identification method, device, equipment and storage medium

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108052543B (en) * 2017-11-23 2021-02-26 北京工业大学 Microblog similar account detection method based on graph analysis clustering
CN109936525B (en) * 2017-12-15 2020-07-31 阿里巴巴集团控股有限公司 Abnormal account number prevention and control method, device and equipment based on graph structure model
CN108418825B (en) * 2018-03-16 2021-03-19 创新先进技术有限公司 Risk model training and junk account detection methods, devices and equipment
CN109003089B (en) * 2018-06-28 2022-06-10 中国工商银行股份有限公司 Risk identification method and device
US10938853B1 (en) * 2018-08-29 2021-03-02 Amazon Technologies, Inc. Real-time detection and clustering of emerging fraud patterns
US11463472B2 (en) * 2018-10-24 2022-10-04 Nec Corporation Unknown malicious program behavior detection using a graph neural network
CN110032665B (en) * 2019-03-25 2023-11-17 创新先进技术有限公司 Method and device for determining graph node vector in relational network graph
CN110032606B (en) * 2019-03-29 2021-05-14 创新先进技术有限公司 Sample clustering method and device
CN110648195B (en) * 2019-08-28 2022-02-25 苏宁云计算有限公司 User identification method and device and computer equipment
CN112714093B (en) * 2019-10-25 2023-05-12 深信服科技股份有限公司 Account abnormity detection method, device, system and storage medium
CN111259985B (en) * 2020-02-19 2023-06-30 腾讯云计算(长沙)有限责任公司 Classification model training method and device based on business safety and storage medium
CN111309975A (en) * 2020-02-20 2020-06-19 支付宝(杭州)信息技术有限公司 Method and system for enhancing attack resistance of graph model
CN111582872A (en) * 2020-05-06 2020-08-25 支付宝(杭州)信息技术有限公司 Abnormal account detection model training method, abnormal account detection device and abnormal account detection equipment
CN111698247B (en) * 2020-06-11 2021-09-07 腾讯科技(深圳)有限公司 Abnormal account detection method, device, equipment and storage medium
CN112069398A (en) * 2020-08-24 2020-12-11 腾讯科技(深圳)有限公司 Information pushing method and device based on graph network
CN112215500B (en) * 2020-10-15 2022-06-28 支付宝(杭州)信息技术有限公司 Account relation identification method and device
CN112597439B (en) * 2020-12-07 2024-03-01 贵州财经大学 Method and system for detecting abnormal account number of online social network
CN112765373B (en) * 2021-01-29 2023-03-21 北京达佳互联信息技术有限公司 Resource recommendation method and device, electronic equipment and storage medium
CN112508691B (en) * 2021-02-04 2021-05-14 北京淇瑀信息科技有限公司 Risk prediction method and device based on relational network labeling and graph neural network
CN112818257B (en) * 2021-02-19 2022-09-02 北京邮电大学 Account detection method, device and equipment based on graph neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200311844A1 (en) * 2019-03-27 2020-10-01 Uber Technologies, Inc. Identifying duplicate user accounts in an identification document processing system
CN110263227A (en) * 2019-05-15 2019-09-20 阿里巴巴集团控股有限公司 Clique based on figure neural network finds method and system
CN110784470A (en) * 2019-10-30 2020-02-11 上海观安信息技术股份有限公司 Method and device for determining abnormal login of user
CN111882446A (en) * 2020-07-28 2020-11-03 哈尔滨工业大学(威海) Abnormal account detection method based on graph convolution network
CN112116007A (en) * 2020-09-18 2020-12-22 四川长虹电器股份有限公司 Batch registration account detection method based on graph algorithm and clustering algorithm
CN113378899A (en) * 2021-05-28 2021-09-10 百果园技术(新加坡)有限公司 Abnormal account identification method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116542673A (en) * 2023-07-05 2023-08-04 成都乐超人科技有限公司 Fraud identification method and system applied to machine learning
CN116542673B (en) * 2023-07-05 2023-09-08 成都乐超人科技有限公司 Fraud identification method and system applied to machine learning

Also Published As

Publication number Publication date
CN113378899A (en) 2021-09-10
CN113378899B (en) 2024-05-28

Similar Documents

Publication Publication Date Title
WO2022247955A1 (en) Abnormal account identification method, apparatus and device, and storage medium
CN105808988B (en) Method and device for identifying abnormal account
US9195910B2 (en) System and method for classification with effective use of manual data input and crowdsourcing
US9390378B2 (en) System and method for high accuracy product classification with limited supervision
WO2022134794A1 (en) Method and apparatus for processing public opinions about news event, storage medium, and computer device
Hariharakrishnan et al. Survey of pre-processing techniques for mining big data
CN113992349B (en) Malicious traffic identification method, device, equipment and storage medium
CN108241867B (en) Classification method and device
CN107786388A (en) A kind of abnormality detection system based on large scale network flow data
CN110648172B (en) Identity recognition method and system integrating multiple mobile devices
WO2021104444A1 (en) Data flow classification method, apparatus and system
CN113452802A (en) Equipment model identification method, device and system
CN109783805A (en) A kind of network community user recognition methods and device
CN116881430B (en) Industrial chain identification method and device, electronic equipment and readable storage medium
CN111368060B (en) Self-learning method, device and system for conversation robot, electronic equipment and medium
US10666536B1 (en) Network asset discovery
US20220222752A1 (en) Methods for analyzing insurance data and devices thereof
CN105590232A (en) Client relation generation method and apparatus, and electronic device
CN117216736A (en) Abnormal account identification method, data scheduling platform and graph computing platform
CN114003803A (en) Method and system for discovering media account in specific region on social platform
CN112445939A (en) Social network group discovery system, method and storage medium
CN116244612B (en) HTTP traffic clustering method and device based on self-learning parameter measurement
CN114528946B (en) Autonomous domain system sibling relationship identification method
CN113987309B (en) Personal privacy data identification method and device, computer equipment and storage medium
CN116975300B (en) Information mining method and system based on big data set

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22810678

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22810678

Country of ref document: EP

Kind code of ref document: A1