CN116595467A - Abnormal user detection method based on dynamic weighted graph convolution and storage medium - Google Patents

Abnormal user detection method based on dynamic weighted graph convolution and storage medium Download PDF

Info

Publication number
CN116595467A
CN116595467A CN202310529213.2A CN202310529213A CN116595467A CN 116595467 A CN116595467 A CN 116595467A CN 202310529213 A CN202310529213 A CN 202310529213A CN 116595467 A CN116595467 A CN 116595467A
Authority
CN
China
Prior art keywords
neighbor
users
information
graph convolution
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310529213.2A
Other languages
Chinese (zh)
Inventor
易舒婷
张剑凯
周韵
黄琬庭
宋雨灿
黄可欣
张一弛
唐鸿锐
李岳洋
曹琳玲
张学虹
邓旭聪
陈浪
张芳
吴寿勇
杜德道
付饶
何军
杨伶俐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Sichuan Electric Power Co Ltd
Original Assignee
State Grid Sichuan Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Sichuan Electric Power Co Ltd filed Critical State Grid Sichuan Electric Power Co Ltd
Priority to CN202310529213.2A priority Critical patent/CN116595467A/en
Publication of CN116595467A publication Critical patent/CN116595467A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an abnormal user detection method based on dynamic weighted graph convolution and a storage medium, wherein a knowledge graph embedding technology is adopted at a representation layer to capture various interactive relation information among users; adopting a graph convolution technology as a classification model at a classification layer to divide users into normal users and abnormal users; the graph convolution technology is adopted to realize the propagation of information among users, and when neighbor information is aggregated, importance of different neighbor nodes is distinguished through a dynamic weighted graph convolution operator by combining trust propagation rules, and the aggregated neighbor information is filtered. According to the method, a knowledge graph embedding technology is combined with social network abnormal user detection, and multi-relation information among users is automatically learned; adopting a graph convolution technology to realize information transmission among users; and in consideration of the asymmetry of the relationship, when the neighbor information is aggregated, the importance of different neighbor nodes is distinguished through a dynamic weighted graph convolution operator, and the aggregated neighbor information is filtered, so that the detection performance is effectively improved.

Description

Abnormal user detection method based on dynamic weighted graph convolution and storage medium
Technical Field
The invention belongs to the technical field of abnormal user detection, and particularly relates to an abnormal user detection method based on dynamic weighted graph convolution and a storage medium.
Background
The concept of social networks was first proposed by Barnes in 1954, mainly referring to a new form of human social interaction via the internet. In recent years, with the progress of science and technology, online social networks (online social network, OSN) meet various demands of people, and today, social websites have become a wide platform for people to connect with each other, share information, emotion, photos, posts, states and the like, and are fused with the depth of work and life of people. The 2021-year financial report of the domestic well-known online social network platform, xinlang microblog shows that the number of active users in the month of 2021 and 9 reaches 5.73 hundred million. The online social network application appears and rapidly develops, so that the use mode of the Internet is deeply changed by people, namely, the construction and maintenance of the social relationship on the Internet are changed from simple information search and web browsing, and the creation, communication and sharing of the information based on the social relationship are realized, thereby providing an information sharing mode for individuals around the world. However, with the explosive development of online social networks, many illegal users are attracted to consider the online social networks as tools for benefits, and many users in online social applications such as foreign Twitter, facebook, domestic newwave microblogs, netease and the like are often plagued by various abnormal users. Social network abnormal users are endless: malicious users who implement malicious behaviors and spread malicious information do not participate in normal online social activities, zombie users which take following, touting other users and topics as main purposes, junk users who release junk information and bad information, false accounts generated by batch registration and the like. The 48 th period of Chinese Internet development report issued by the Chinese Internet information security center shows that phishing is the second most highly responsible for issuing false information in various network security problems encountered by netizens. The abnormal users disturb the normal operation development of the social platform, bring a plurality of potential threats to the benefit of the public and have adverse effects on the network environment and the society. For these reasons, identifying abnormal users is an important topic in the field of social networking research, and has important roles in batting phishing, purifying the network environment, maintaining social stability, and the like.
The detection problem of abnormal users is essentially a classification problem, and the users are identified as normal users or abnormal users by constructing a detection model. The existing work often detects abnormal users based on attribute features and content features, such as the method proposed by kuduguta et al, combines a deep neural network of a long-short-term memory architecture, and simultaneously utilizes user release content (user generated content, UGC) and user attribute information to detect social robots at a twitter level. Tan et al use links contained in UGC to form a user-link graph, providing an unsupervised detection method. Rout et al propose a LA-MSBD model by integrating trust calculation models and link-based MSBD features that update the probability of participants publishing malicious links in a push by performing a limited set of learning actions. Zhao et al measure similarity between user release content and news through event words and domain words contained in UGC from time similarity, space similarity and text similarity, and further adopt a trust propagation method to detect abnormal users, wherein the similarity is used as initial credibility of user generation information.
There are many methods based on relational features, such as in connection with graph neural networks, where the class of nodes is generally considered to be associated with the class of their neighbors, e.g., the literature uses graph neural networks as classifiers; starting from the behavior mode, for example, jiang Meng, by analyzing the characteristics of synchronicity, compactness and the like of suspicious behaviors of abnormal users, a novel suspicious behavior detection method, catchSync, is provided, and suspicious nodes can be automatically distinguished from a large-scale graph.
When the detection technology is developed, the abnormal users can also implement some camouflage behaviors to evade detection, for example, characteristics of personal data such as hashtag ratio, attention rate, URL ratio, forwarding number and the like can be manipulated among the abnormal users through malicious mutual powder, friend adding, praise and the like; the published content features, such as emotion words, emoticons, and commonly used words, can also be manipulated to be deceptive to the detector, so that the spammer drift phenomenon mentioned in the literature can occur, which is a serious problem for methods based on attribute features and content features, and the features of abnormal users fluctuate over time while the features of normal users remain stable, so that the classifier trained in the past data cannot be used to detect abnormal users in new data sets. The relationship between users is a feature of high robustness because abnormal users cannot easily manipulate the user's interaction behavior. In the past, only a single social relationship among users is often considered, and the social network is a complex network structure and comprises a plurality of interaction relationships.
Disclosure of Invention
The invention aims to learn various interactive relation information among nodes by combining a knowledge graph embedding technology and detect abnormal users based on the multi-relation information among the nodes.
The invention is realized mainly by the following technical scheme:
an abnormal user detection method based on dynamic weighted graph convolution comprises the following steps:
step S100: aiming at the multi-relation characteristics of the online social network, capturing various interactive relation information among users by adopting a knowledge graph embedding technology at a representation layer;
step S200: adopting a graph convolution technology as a classification model at a classification layer to divide users into normal users and abnormal users; the graph convolution technology is adopted to realize the propagation of information among users, and when neighbor information is aggregated, importance of different neighbor nodes is distinguished through a dynamic weighted graph convolution operator by combining trust propagation rules, and the aggregated neighbor information is filtered;
step S300: and (3) sampling the unbalanced data set by adopting an undersampling method, optimally training a classification model by adopting a balanced sample subset obtained by sampling, and detecting abnormal users based on the trained model.
In order to better implement the present invention, in step S100, a knowledge graph embedding technique is used to perform vectorization representation on nodes in the multi-relationship social network; embedding nodes and relationships in an online social network OSN into R using a TransE model d0, wherein Rd0 For dimension d 0 The relation l in each triplet instance (h, l, t) of the online social network OSN is regarded as a translation from entity h to entity t, by constantly adjusting h, l, t such that h+l≡t is as much as possible, whereas when this triplet is not included, h+l should be far from t.
In order to better implement the present invention, further, in the step S100, a loss function of the representation layer is as follows:
wherein S is a multi-relation directed social graph obtained by abstracting user interaction behavior,
s' is a negative-working sample and,
[x] + =max(x,0),
S′ (h,l,t) = { (h ', l, t) |h' ∈e }. U { (h, l, t ')|t' ∈e }, E represents a set of edges in the diagram,
gamma >0, representing the edge;
the embedded representation of the nodes and relationships in the social network is obtained by iterative training to minimize the loss function and then input to the classification layer for subsequent detection work.
In order to better implement the present invention, further, in step S200, the information transfer between users is primarily implemented by adopting the convolution of the multiple relationship diagrams, the neighbor information of the users is gathered, and the output of each layer is primarily calculated as follows:
wherein f is the activation function,
for the weight parameter matrix of the k-th layer relation with respect to R e R,
d k for the dimension of the output vector of the k-th layer,
a is the adjacency matrix of nodes,
i is an identity matrix, and the matrix is a matrix,
for the adjacency matrix after regularization,
H k for the output of the k-th layer,
initial H 0 =x, X represents the embedded representation of the user input by the learning layer;
wherein the matrixThe definition is as follows:
in order to better realize the invention, further, before gathering the neighbor information, the embedded representations of the neighbors and the connected edges are fused through the cyclic correlation operation, and the cyclic correlation operation is defined as follows:
wherein a, b E R d Representing two vectors to be fused;
d is the dimension of the output vector;
the representation output of each level of nodes is then calculated as follows:
wherein N (v) is a neighbor node of node v,
is a vector representation of the k-layer node v,
for the regularized adjacency matrix described above,
Z r for the embedded representation of the relation r,
phi is a vector fusion function.
In order to better implement the present invention, further, in step S200, a dynamic weight is introduced during convolution to distinguish importance degrees of neighboring nodes in consideration of a difference between an active party and a receiving party in an interaction relationship caused by an asymmetric relationship:
the neighbors of node u in the social network are divided into three types: incoming neighbor v 1 Outgoing neighbor v 2 Bidirectional neighbors v 3 The method comprises the steps of carrying out a first treatment on the surface of the Assume that:
u and v 3 Having the same label;
if u's incoming neighbor v 1 When it is an abnormal user, v 1 No information is provided for the tag of u; when v 1 When the user is a normal user, u is also normal;
if outgoing neighbor v of u 2 When it is a normal user, v 2 No information is provided for the tag of u; when v 2 When the user is abnormal, u is also abnormal;
and using the label output by each iteration as prior information, dynamically updating the weight of the neighbor during convolution operation, wherein the weight is assigned as follows:
when v is a u bidirectional neighbor:
when v is the incoming neighbor of u:
when v is the outgoing neighbor of u:
wherein ,represents the weight assigned to neighbor v of u at the t+1st training,
w is more than 0.5 and is a set super parameter,
λ t,v the probability that user v, representing the t-th training output, is an abnormal user.
To better implement the invention, further, initially, each neighbor is assigned a value of λ=0.5,
adding the dynamically updated weight to obtain the final output of each layer as follows:
according to the classification of the output prediction nodes of the last layer, the neural network is optimized by adopting a cross entropy loss function, and the formula is as follows:
f(h v )=softmax(ah v +b)
wherein: a is the weight of the material, and,
b is the bias.
A computer readable storage medium storing computer program instructions which, when executed by a processor, implement the method described above.
The beneficial effects of the invention are as follows:
in order to cope with the situation that an abnormal user disguises own behavior to evade detection, the method starts from social relationship features with high robustness, and builds an abnormal user detection scheme based on the multi-relationship features of the online social network. According to the method, a knowledge graph embedding technology is combined with social network abnormal user detection, and multi-relation information among users is automatically learned; adopting a graph convolution technology to realize information transmission among users; and in consideration of the asymmetry of the relationship, when the neighbor information is aggregated, the importance of different neighbor nodes is distinguished through a dynamic weighted graph convolution operator, and the aggregated neighbor information is filtered, so that the detection performance is effectively improved.
Drawings
FIG. 1 is a schematic block diagram of the present invention;
FIG. 2 is a schematic view of node neighbor types;
fig. 3 is a graph showing experimental comparison of different values of the super parameter w in the attention mechanism based on twitter dataset in example 1.
Detailed Description
Example 1:
the invention discloses an abnormal user detection method based on dynamic weighted graph convolution, which designs an ADSDG method, as shown in figure 1, wherein a knowledge graph embedding technology is adopted at a representation layer to capture various interaction relation information among users; inspired by a detection method based on propagation, a graph convolution technology is adopted as a classification model at a classification layer, and users are classified into normal users and abnormal users; combining with belief propagation rules, designing a dynamic weighted convolution process to distinguish the importance degrees of different neighbors; and sampling the unbalanced data set by adopting an undersampling method for model training in a classification stage.
Preferably, the social network node embedding is expressed as follows:
most of the different composition is embedded in a preset meta-path mode, however, social network platforms are various, interaction modes are different, and meta-paths of different platforms are difficult to unify. In the classification task, the heterogeneous graph can be embedded and represented by a deep learning method, however, the deep learning method is trained by relying on label information of nodes, and does not embed and represent relation information among the nodes. Therefore, in order to improve the expandability of the method and perform embedded representation learning on multi-relationship information among users, the ADDSDG adopts a knowledge graph embedding technology to perform vectorization representation on nodes in the multi-relationship social network. The knowledge graph embedding does not need the work such as presetting element paths, and the like, realizes vectorization of nodes and relations in the graph by fitting triples formed by the relations among the nodes, and can fully learn the relation information among the nodes.
The knowledge graph can be regarded as a knowledge-rich heterogram consisting of a series of triples (h, l, t), whereinRepresenting entities, L e L representing relationships between entities. The online social network OSN-G (V, E) is represented by a graph structure, wherein V represents nodes existing in the social network, such as user nodes and UGC nodes, E represents edges formed by interaction relationships existing among the nodes, such as forwarding, focusing, commenting and the like, and obviously a heterogeneous structure is obtainedAnd (5) a graph structure. Mapping V.fwdarw.E, E.fwdarw.L.
ADDSDG embeds nodes and relationships in OSN into R using TransE model d0 . Considering the relation l in each triplet instance (h, l, t) of the OSN as a translation from entity h to entity t, by constantly adjusting h, l, t, as much as possible, h+l≡t, whereas when this triplet is not included, h+l should be far from t.
Based on the above ideas, the embedded representation of training learning entities and relationships is optimized, defining a loss function as follows:
wherein ,
[x] + =max(x,0),
S′ (j,l,t) ={(h′,l,t)|h′∈E}∪{(h,l,t′)|t′∈E},
gamma >0, and represents margin.
The embedded representation of the nodes and relationships in the social network is obtained by iterative training to minimize the loss function and then input to the classification layer for subsequent detection work.
Preferably, convolution in combination with a dynamically weighted multi-relationship graph:
the graph rolling neural network is mainly used for simple undirected graph, the interactive relation of the online social network in reality is asymmetric, and the graph rolling neural network does not distinguish the importance degree of the neighbor nodes. In order to solve the problems, the invention designs a dynamic weighted graph convolution model for a multi-relation directed social network:
social graph g= (V, R, E, X, Z) of the classification layer, where V represents a set of user nodes in the graph, R represents a set of relationships, E represents a set of edges in the graph,an embedded representation of the user for representing the learning layer input, < >>To represent the embedded representation of the relationship of the learning layer inputs, d 0 To represent the embedded dimension of the layer. Each user has a corresponding binary label y e {0,1},1 representing an abnormal user, and 0 representing a normal user.
And preliminarily adopting a multi-relation graph convolution to realize the information transmission process among users and gathering neighbor information of the users. K is defined as the depth of the neighbors that aggregate the information. If k=1, only the information aggregating the first order neighbors is considered. For k=2, then information of second order neighbors is also contained, and so on. The preliminary output of each layer is calculated as follows:
this is a multi-relationship graph convolution approach, where f is the activation function,weight parameter matrix for k-th layer relation about R E R, d i For the dimension of the i-th layer output vector, A is the adjacent matrix of the nodes, H k For output of the k-th layer, initially H 0 =X,/>Is a regularized adjacency matrix, wherein the matrix +.>The definition is as follows:
in addition, in order to introduce the relationship type information of the neighbors, the ADDSDG introduces an entity-relationship fusion operation, and before gathering the neighbor information, the embedded representations of the neighbors and the connected edges are fused through a circular correlation operation, wherein the circular correlation operation is defined as follows:
wherein a, b E R d Representing the two vectors to be fused.
The representation output of each level of nodes after the entity-relationship operation is combined is calculated as follows:
wherein N (v) is a neighbor node of node v,for the vector representation of the k-th layer node v, the initial representation of the node is assigned according to X, +.>For the regularized adjacency matrix, < > is>For the weight parameter matrix of the k-th layer about the relation R E R, Z r For the embedded representation of the relationship r, φ is the vector fusion function described above.
However, the asymmetry of the interaction relationship is not considered in the above work, so the invention further considers the difference between the active party and the receiving party in the interaction relationship caused by the asymmetry relationship, and introduces dynamic weight in convolution to distinguish the importance degree of the neighbor nodes.
The invention divides the neighbors of the nodes in the social network into three types, namely an incoming neighbor, an outgoing neighbor and a bidirectional neighbor. As shown in the figure2, for node to node u, v 1 One-way input to u, v 1 An incoming neighbor of u; u is input to v in one direction 2 V is then 2 An outgoing neighbor of u; v 3 Two-way input to u, v 3 Is a bi-directional neighbor of u.
The present invention proposes three intuitive hypotheses:
for the neighbors v of user nodes u and u:
the two-way neighbor relation is likely to be the same type of user. Thus, u tends to be equal to v 3 With the same label.
The interactions proactively issued by the abnormal users are indiscriminate, and the normal users tend to proactively issue interactions to the normal users. Thus, if u's incoming neighbor v 1 When it is an abnormal user, v 1 No information is provided for the tag of u. When v 1 When it is a normal user, u is also often normal.
The normal user will receive the interactions from both types of users and the abnormal user will tend to receive only the interactions from the abnormal user. Thus, if the outgoing neighbor v of u 2 When it is a normal user, v 2 No information is provided for the tag of u. When v 2 When an abnormal user, u is also often abnormal.
To sum up, two-way neighbors v 3 Incoming neighbor v for normal user 1 Outgoing neighbor v for an anomalous user 2 More information can be provided for the classification of the user. When aggregating neighbor information, neighbors with high information validity should be weighted more heavily. The method utilizes the label output by each iteration as prior information to dynamically update the weight of the neighbor during the convolution operation. The weight assignment method is as follows:
when v is a u bidirectional neighbor:
when v is the incoming neighbor of u:
when v is the outgoing neighbor of u:
wherein ,the weight assigned to the neighbor v of u in the t+1st training is represented, w >0.5 is the set super-parameter, lambda t,v The probability that user v, representing the t-th training output, is an abnormal user. Initially, λ=0.5 is assigned to each neighbor.
Adding the dynamically updated weight to obtain the final output of each layer as follows:
according to the classification of the output prediction nodes of the last layer, the neural network is optimized by adopting a cross entropy loss function, and the formula is as follows:
f(h v )=softmax(ah v +b)
wherein
And (3) sampling the unbalanced data set by using an undersampling method NearMiss, and optimally training the model by using a balanced sample subset obtained by sampling.
The test results were as follows:
1. experimental data set
To verify the applicability and effectiveness of ADDSDG, two public datasets containing multiple social relationships were employed for testing.
Twitter-Tas datasets are provided by Li et al, which crawl from tweets, including user nodes and tweet nodes, and concerns between users, forwarding and replying relationships between tweets and tweets, publishing and mentioning relationships between users and tweets. After sample filtering the dataset, the statistical information is shown in tables 1 and 2.
TABLE 1 Twitter-Tas dataset distribution
The second data set was from tagged. Com, supplied by Fakhraei et al. The data set is an unbalanced data set, and comprises 5607447 interaction records of basic identity information of users subjected to implicit processing and 7 relations among the users within 10 days, wherein the statistical information is shown in a table.
Table 2Tagged dataset distribution
2. Experiment setting and evaluation index
Considering the limitations of physical devices, experiments first scale down the data set. Seed users are randomly selected from the twitter data set, then expansion is carried out according to the interaction relation among the users, and finally 2000 users and effective tweets issued by the users are selected, wherein the effective tweets refer to interaction behaviors between the tweets and other users in the data subset or tweets issued by other users. And similarly, randomly selecting seed users from the triggered data set, expanding according to the interaction relation among the users, and finally selecting 80000 users.
Setting experimental super-parameters and embedding dimension d 0 =100, number of convolution layers k=2, weight parameter w=0.75. In order to comprehensively evaluate the experimental results, a Recall value, an F1 value and an AUC value are selected as evaluation indexes. Running the experiment for multiple times, and finally taking the experiment for multiple timesThe average value of the experimental results is taken as the final experimental result.
3. Baseline method
To verify the validity of the ADDSDG method, a comparison is now made with the following several graph-based methods.
The GCN method can gather the characteristic information of the neighbor nodes, fuse the local structure information of the nodes, learn the characteristic information and the structure information of the nodes, and then realize the node classification task by utilizing the obtained characteristic information.
In order to enhance the resistance capability of a GNN-based fraud detector to feature camouflage and relationship camouflage of a fraudster, a label-aware similarity measurement method and a reinforcement-learning-based similarity-aware neighbor selector are provided. Together with the two neural modules, a relational-aware aggregator is further proposed to maximize computational utility.
The DCI method, which proposes a new graph self-supervised learning anomaly detection scheme, called Deep Cluster Infomax (DCI), for node representation learning, captures intrinsic graph attributes in a more focused feature space by clustering the entire graph into multiple parts.
4 verification of validity of detection method
In order to verify the effectiveness of ADSDG, experimental comparison is performed on a real multi-relation social network data set with a baseline method, wherein the CRAE-CNN method needs attribute features of nodes, topological features extracted from a social graph are used as initial feature vectors of the nodes in the experiment, and experimental results are shown in table 3.
TABLE 3 comparison of test results for different test models
Experimental results show that compared with the GCN method, the method has the advantages that the Recall value, the F1 value and the AUC value on the twitter data set and the Tagged data set are obviously improved. The method is characterized in that when the GCN learns the embedded representation of the node, the method mainly depends on the label information of the node, and although the structural information of the node can be learned through a graph convolution operator, all connecting edges are implicitly considered as indiscriminate, so that the learning of interactive relation information is ignored, and the method is combined with a knowledge graph embedding technology to learn multi-relation information among users, so that the detection performance is effectively improved. The working focus of the CARE-GNN method is to filter neighbor nodes by combining node characteristics and reinforcement learning technology, the working focus of the DCI method is to improve representation learning by combining graph clustering, the application of multi-relation characteristics is ignored in the working process of the two methods, semantic information of relation is not considered, the distinction between an incoming neighbor and an outgoing neighbor is not considered, the relation in a social network is simply regarded as symmetrical, the method considers the asymmetry of the relation through dynamic weighting, and the filtering is performed when neighbor information is aggregated, so that the importance degree of different neighbor information is distinguished, and therefore, better detection performance is achieved on a Twitter-Tas data set and a Tagged data set.
5 model analysis
Model analysis experiments were performed on the twitter-tas dataset. In order to verify the effectiveness of the dynamic convolution operator designed by the invention, experimental comparison of different values of the super parameter w in the attention mechanism is carried out. The experimental results are shown in FIG. 3.
When w >0.5, it means that the model assigns higher weights to neighbors with high information validity when aggregating neighbor information. When w=0.5, the weights of all neighbors are the same and are 0.5, which means that the importance of the neighbors is not distinguished when the model gathers neighbor information, and the detection result is obviously reduced, thereby indicating that the added dynamic weighting mechanism of the invention can effectively improve the performance of the model. When w is less than 0.5, the model is used for gathering neighbor information, but rather assigns higher weight to neighbors with low information effectiveness, at the moment, the performance of the model is greatly reduced, the fact that information effectiveness differences among different neighbors of a user exist is verified, the importance degree of the neighbors of the user is distinguished by adding dynamic weighting, and the performance of the detection method can be effectively improved.
The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any simple modification, equivalent variation, etc. of the above embodiment according to the technical matter of the present invention fall within the scope of the present invention.

Claims (8)

1. The abnormal user detection method based on the dynamic weighted graph convolution is characterized by comprising the following steps of:
step S100: aiming at the multi-relation characteristics of the online social network, capturing various interactive relation information among users by adopting a knowledge graph embedding technology at a representation layer;
step S200: adopting a graph convolution technology as a classification model at a classification layer to divide users into normal users and abnormal users; the graph convolution technology is adopted to realize the propagation of information among users, and when neighbor information is aggregated, importance of different neighbor nodes is distinguished through a dynamic weighted graph convolution operator by combining trust propagation rules, and the aggregated neighbor information is filtered;
step S300: and (3) sampling the unbalanced data set by adopting an undersampling method, optimally training a classification model by adopting a balanced sample subset obtained by sampling, and detecting abnormal users based on the trained model.
2. The abnormal user detection method based on dynamic weighted graph convolution according to claim 1, wherein in the step S100, a knowledge graph embedding technology is adopted to perform vectorization representation on nodes in a multi-relationship social network; embedding nodes and relationships in an online social network OSN into R using a TransE model d0, wherein Rd0 For dimension d 0 The relation l in each triplet instance (h, l, t) of the online social network OSN is regarded as a translation from entity h to entity t, by constantly adjusting h, l, t such that h+l≡t is as much as possible, whereas when this triplet is not included, h+l should be far from t.
3. The abnormal user detection method based on dynamic weighted graph convolution according to claim 2, wherein in the step S100, the loss function of the representation layer is as follows:
wherein S is a multi-relation directed social graph obtained by abstracting user interaction behavior,
s' is a negative-working sample and,
[x] + =max(x,0),
S′ (h,l,t) = { (h ', l, t) |h' ∈e }. U { (h, l, t ')|t' ∈e }, E represents a set of edges in the diagram,
gamma >0, representing the edge;
the embedded representation of the nodes and relationships in the social network is obtained by iterative training to minimize the loss function and then input to the classification layer for subsequent detection work.
4. The abnormal user detection method based on dynamic weighted graph convolution according to claim 1, wherein in step S200, the information transfer between users is realized by preliminarily adopting the multi-relation graph convolution, the neighbor information of the users is gathered, and the preliminary output of each layer is calculated as follows:
wherein f is the activation function,
for the weight parameter matrix of the k-th layer relation with respect to R e R,
d k for the dimension of the output vector of the k-th layer,
a is the adjacency matrix of nodes,
i is an identity matrix, and the matrix is a matrix,
for the adjacency matrix after regularization,
H k for the output of the k-th layer,
initial H 0 =x, X represents the embedded representation of the user input by the learning layer;
wherein the matrixThe definition is as follows:
5. the abnormal user detection method based on dynamic weighted graph convolution according to claim 4, wherein before gathering neighbor information, the embedded representations of the neighbors and the connected edges are fused through a cyclic correlation operation, and the cyclic correlation operation is defined as follows:
wherein a, b E R d Representing two vectors to be fused;
d is the dimension of the output vector;
the representation output of each level of nodes is then calculated as follows:
wherein N (v) is a neighbor node of node v,
is a vector representation of the k-layer node v,
for the regularized adjacency matrix described above,
Z r for the embedded representation of the relation r,
phi is a vector fusion function.
6. The abnormal user detection method based on the dynamic weighted graph convolution according to claim 5, wherein in the step S200, the difference between the active party and the receiving party in the interaction relationship caused by the asymmetric relationship is considered, and the dynamic weight is introduced during the convolution to distinguish the importance degree of the neighbor nodes:
the neighbors of node u in the social network are divided into three types: incoming neighbor v 1 Outgoing neighbor v 2 Bidirectional neighbors v 3 The method comprises the steps of carrying out a first treatment on the surface of the Assume that:
u and v 3 Having the same label;
if u's incoming neighbor v 1 When it is an abnormal user, v 1 No information is provided for the tag of u; when v 1 When the user is a normal user, u is also normal;
if outgoing neighbor v of u 2 When it is a normal user, v 2 No information is provided for the tag of u; when v 2 When the user is abnormal, u is also abnormal;
and using the label output by each iteration as prior information, dynamically updating the weight of the neighbor during convolution operation, wherein the weight is assigned as follows:
when v is a u bidirectional neighbor:
when v is the incoming neighbor of u:
when v is the outgoing neighbor of u:
wherein ,represents the weight assigned to neighbor v of u at the t+1st training,
w is more than 0.5 and is a set super parameter,
λ t,v the probability that user v, representing the t-th training output, is an abnormal user.
7. The method for detecting abnormal users based on dynamic weighted graph convolution according to claim 6, wherein, initially, λ=0.5 is assigned to each neighbor,
adding the dynamically updated weight to obtain the final output of each layer as follows:
according to the classification of the output prediction nodes of the last layer, the neural network is optimized by adopting a cross entropy loss function, and the formula is as follows:
f(h v )=softmax(ah v +b)
wherein: a is the weight of the material, and,
b is the bias.
8. A computer readable storage medium storing computer program instructions, which when executed by a processor implement the method of any one of claims 1-7.
CN202310529213.2A 2023-05-11 2023-05-11 Abnormal user detection method based on dynamic weighted graph convolution and storage medium Pending CN116595467A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310529213.2A CN116595467A (en) 2023-05-11 2023-05-11 Abnormal user detection method based on dynamic weighted graph convolution and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310529213.2A CN116595467A (en) 2023-05-11 2023-05-11 Abnormal user detection method based on dynamic weighted graph convolution and storage medium

Publications (1)

Publication Number Publication Date
CN116595467A true CN116595467A (en) 2023-08-15

Family

ID=87607422

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310529213.2A Pending CN116595467A (en) 2023-05-11 2023-05-11 Abnormal user detection method based on dynamic weighted graph convolution and storage medium

Country Status (1)

Country Link
CN (1) CN116595467A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117520995A (en) * 2024-01-03 2024-02-06 中国海洋大学 Abnormal user detection method and system in network information platform

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117520995A (en) * 2024-01-03 2024-02-06 中国海洋大学 Abnormal user detection method and system in network information platform
CN117520995B (en) * 2024-01-03 2024-04-02 中国海洋大学 Abnormal user detection method and system in network information platform

Similar Documents

Publication Publication Date Title
Wanda et al. DeepProfile: Finding fake profile in online social network using dynamic CNN
Beskow et al. Bot conversations are different: leveraging network metrics for bot detection in twitter
CN112861967B (en) Social network abnormal user detection method and device based on heterogeneous graph neural network
Wanda et al. DeepFriend: finding abnormal nodes in online social networks using dynamic deep learning
Belenguer et al. A review of federated learning in intrusion detection systems for iot
Wanda et al. DeepOSN: Bringing deep learning as malicious detection scheme in online social network
Zhu et al. Binarizedattack: Structural poisoning attacks to graph-based anomaly detection
Boshmaf et al. Thwarting fake OSN accounts by predicting their victims
Macskassy et al. Suspicion scoring of networked entities based on guilt-by-association, collective inference, and focused data access1
Fu et al. Leveraging careful microblog users for spammer detection
Yin et al. An anomaly detection model based on deep auto-encoder and capsule graph convolution via sparrow search algorithm in 6G internet-of-everything
CN116595467A (en) Abnormal user detection method based on dynamic weighted graph convolution and storage medium
Silva et al. A statistical analysis of intrinsic bias of network security datasets for training machine learning mechanisms
Zhang et al. Tweetscore: Scoring tweets via social attribute relationships for twitter spammer detection
Sudharson et al. A Survey on ATTACK–A nti T errorism T echnique for A DHOC Using C lustering and K nowledge Extraction
Kang et al. Adversarial learning of balanced triangles for accurate community detection on signed networks
Deng et al. Markov-driven graph convolutional networks for social spammer detection
Li et al. The devil is in the conflict: Disentangled information graph neural networks for fraud detection
CN116680633B (en) Abnormal user detection method, system and storage medium based on multitask learning
Wang et al. Dual structural consistency preserving community detection on social networks
Zeng et al. Influential simplices mining via simplicial convolutional network
Adibi et al. The KOJAK group finder: Connecting the dots via integrated knowledge-based and statistical reasoning
Praveena et al. Hybrid gated recurrent unit and convolutional neural network-based deep learning mechanism for efficient shilling attack detection in social networks
Dou A review of recent advance in online spam detection
Sheikhan et al. Using particle swarm optimization in fuzzy association rules‐based feature selection and fuzzy ARTMAP‐based attack recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination