CN116304311A - Online social network spam comment user detection method - Google Patents

Online social network spam comment user detection method Download PDF

Info

Publication number
CN116304311A
CN116304311A CN202310148077.2A CN202310148077A CN116304311A CN 116304311 A CN116304311 A CN 116304311A CN 202310148077 A CN202310148077 A CN 202310148077A CN 116304311 A CN116304311 A CN 116304311A
Authority
CN
China
Prior art keywords
node
neighbors
dimension
neural network
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310148077.2A
Other languages
Chinese (zh)
Inventor
杨泽
戴维迪
邵明来
李天鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202310148077.2A priority Critical patent/CN116304311A/en
Publication of CN116304311A publication Critical patent/CN116304311A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for detecting users of online social network spam comments, which comprises the following steps: step one, diagram construction and pretreatment: establishing a graph structure by taking online social platform users as nodes and the interaction relationship among the users as edges, and constructing an adjacency matrix; manually marking part of data, giving the number and the label of the marked nodes, wherein 1 represents a spammer and 0 represents a normal user; establishing a confidence vector; step two, constructing a graph neural network: the graph neural network comprises two layers, the output dimension of the last layer is 2, the 1 st dimension represents the confidence that the neural network judges the node as a spam comment sender, the 2 nd dimension represents the confidence that the neural network judges the node as a normal user, the graph neural network acquires the characteristics of the node by aggregating the characteristics of the node neighbors, the category information of the neighbors is considered when the node characteristics are extracted, and different characteristic aggregation strategies are executed for the neighbors of different types; and thirdly, iterative optimization.

Description

Online social network spam comment user detection method
Technical Field
The invention belongs to the field of data mining, and relates to an anomaly detection method based on a graph neural network. According to the method, an online social network is modeled as a graph, the users with the spam comments are regarded as abnormal nodes in the online social network, the graph neural network fused with side information is used for extracting the characteristics of the users of the online social network, and the characteristics are input into a classifier for semi-supervised abnormal detection.
Background
With the development of the internet, more and more online platforms such as microblogs, popular critique, bean paste and the like are emerging. With the rise in user cardinality, there are increasing numbers of nonsensical and malicious reviews on these online platforms. In addition, some bad merchants hire special swipe accounts to send good reviews under their products for the purpose of swiping the rate. These malicious users severely disrupt the trustworthiness of various online platforms. Only manual inspection consumes a lot of manpower, so the demand for intelligent detection of users of spam comments is increasing.
By modeling users as nodes, various interactions between users as edges may establish an online social network for an online social platform, such that graph-based algorithms may be used to detect spammer senders. Due to the complexity of the graph data, camouflage of the spammer, and other characteristics, detection of the spammer still faces various challenges.
The graph neural network is widely applied to various graph learning tasks due to the excellent performance of the graph neural network on graph feature extraction. Fdgars [1] A method for detecting spammer comment senders by using a graph neural network. However, in order to avoid being detected by the algorithm, the spammer may generate some disguising behavior, such as establishing normal interactions with a large number of normal users, or disguising his own user attributes and transmitted comments to be similar to normal user attributes and comments. At this time, the structure of the graph neural network needs to be optimized to adapt to the detection task of the spammer when camouflage behaviors exist.
[1]Wang J,Wen R,Wu C,et al.FdGars:Fraudster Detection via Graph ConvolutionalNetworks in Online App Review System[C].In Companion of The 2019 WorldWide Web Conference,2019:310–316
Disclosure of Invention
The invention mainly aims to provide a method for detecting users of spam comments in an online social network, which is used for more accurately detecting spammer comment senders in the social network. The technical proposal is as follows:
a method for detecting users of online social network spam comments comprises the following steps:
step one, graph construction and preprocessing
(1) Establishing a graph structure by taking online social platform users as nodes and the interaction relationship among the users as edges, and constructing an adjacency matrix;
(2) Digitizing the user attributes, constructing an attribute matrix, wherein each row of the attribute matrix represents the attribute of the corresponding user;
(3) Manually marking part of data, giving the number and the label of the marked nodes, wherein 1 represents a spammer, 0 represents a normal user, and dividing a training set and a testing set;
(4) Establishing a confidence vector
Figure BDA0004089780970000021
Wherein N represents the number of nodes, the i-th site is 0 time and represents the node i more likely to be a normal user, and the i-th site is 1 time and represents the node i more likely to be a spammer comment sender; initializing the confidence vector, enabling the corresponding position of the node with the label of 0 in the training set to be 0 in the B, enabling the corresponding position of the node with the label of 1 in the training set to be 1 in the B, and enabling the rest positions to be 0;
step two, constructing a graph neural network
The graph neural network comprises two layers, the output dimension of the last layer is 2, the 1 st dimension represents the confidence that the neural network judges the node as a spam comment sender, the 2 nd dimension represents the confidence that the neural network judges the node as a normal user, the graph neural network acquires the characteristics of the node by aggregating the characteristics of the node neighbors, the category information of the neighbors is considered when the node characteristics are extracted, and different characteristic aggregation strategies are executed for the neighbors of different types;
the neural network comprises the following processes for each layer of graph:
(1) Feature h of user u using full connectivity layer u The user characteristic z after dimension reduction is obtained by dimension reduction u The formula is as follows:
z u =W t h u
wherein,,
Figure BDA0004089780970000022
weight matrix of full connection layer, d in Inputting dimensions for the layer, d out Outputting dimensions for the layer;
(2) Regarding the node v as a central node, for each neighbor u of the node v under the relationship r, calculating the importance coefficient of the node v according to the relationship with the central node
Figure BDA0004089780970000023
The formula is as follows:
Figure BDA0004089780970000024
wherein,,
Figure BDA0004089780970000025
is a trainable weight vector;
(3) Judging whether the neighbors of the node are similar to the neighbors of the node according to the confidence vector B, and putting the similar neighbors of the node v under the relation r into a set
Figure BDA0004089780970000026
In the heterogeneous neighbors put into the set +.>
Figure BDA0004089780970000027
In (a) and (b);
(4) Respectively carrying out normalization operation on importance coefficients of the two types of neighbors to obtain attention scores for aggregation; for node v's neighbor u under relationship r, ifNode u is similar to node v, its attention score
Figure BDA0004089780970000028
The method is characterized by comprising the following steps:
Figure BDA0004089780970000029
wherein,,
Figure BDA00040897809700000210
is a set of homogeneous neighbors of node v; exp is a natural exponential function; sigma is a nonlinear activation function; similarly, if the neighbor node is heterogeneous with node v, its attention score +.>
Figure BDA00040897809700000211
The method is characterized by comprising the following steps:
Figure BDA0004089780970000031
wherein,,
Figure BDA0004089780970000032
is a set of heterogeneous neighbors of node v;
(5) Respectively calculating the embedding of similar neighbors of the central node v under the relation r according to the attention scores obtained by the previous step
Figure BDA0004089780970000033
Embedding of heterogeneous neighbors of the central node v>
Figure BDA0004089780970000034
The calculation formula is as follows:
Figure BDA0004089780970000035
Figure BDA0004089780970000036
(6) For each node v, in its characteristic z v The Euclidean distance between the node and other node characteristics is used as a basis to obtain a k neighbor graph formed by k neighbor nodes
Figure BDA0004089780970000037
(7) According to
Figure BDA0004089780970000038
Performing an aggregation operation to obtain k neighbor embedding h of each node v knn,v The formula is as follows:
Figure BDA0004089780970000039
wherein K is the neighbor number selected by K neighbors;
Figure BDA00040897809700000310
is a weight matrix; />
Figure BDA00040897809700000311
K neighbor set for node v;
(8) For each node v, its homogeneous neighbors are embedded
Figure BDA00040897809700000312
Heterogeneous neighbor embedding->
Figure BDA00040897809700000313
And k neighbor embedding h knn,v The fusion can obtain the comprehensive embedding of the node v under the relation r>
Figure BDA00040897809700000314
(9) Introducing a multi-head attention mechanism, repeating the steps (1) to (8) for H times, and combining the steps
Figure BDA00040897809700000315
Splicing to obtain the characteristic of the multi-head post-attention node v under the relation r>
Figure BDA00040897809700000316
(10) The operation of splicing and linear transformation is adopted to make the relation of multiple
Figure BDA00040897809700000317
Integration into h' v
After stacking two layers of the graph neural network, the output h of the last layer is obtained v,out The output is a two-dimensional vector; the 1 st dimension represents the confidence level of the sender of the spam comment, and the 2 nd dimension represents the confidence level of the normal user; for h v,out The probability value of the node belonging to the normal node and the abnormal node can be represented after the softmax operation is taken; when h v,out When the 0 th dimension value of (2) is larger than the 1 st dimension value, the node is judged as a normal node; when h v,out When the 1 st dimension value is larger than the 0 th dimension value, the node is determined as an abnormal node;
step three, iterative optimization
(1) Inputting the whole graph into a graph neural network to obtain an output result h out
Figure BDA00040897809700000318
Is all h v,out Is connected with the longitudinal splicing of the two parts;
(2) Undersampling training labels to obtain node sets participating in loss calculation
Figure BDA00040897809700000319
The number of normal nodes participating in loss calculation is similar to that of abnormal nodes, so that the influence of imbalance of the labels 01 is avoided;
(3) Calculated according to the following formula
Figure BDA0004089780970000041
Loss of->
Figure BDA0004089780970000042
Figure BDA0004089780970000043
Wherein y is v A label representing node v;
(4) Output h according to model out Updating confidence vector B to let h out The corresponding position of the row with the 1 st dimension being greater than the 2 nd dimension in the B is 1, and the rest is 0;
(5) According to the loss
Figure BDA0004089780970000044
Executing a gradient descent algorithm;
(6) When losing
Figure BDA0004089780970000045
Stopping training when convergence is performed;
step four, outputting the unlabeled user category
(1) Obtaining model output h out Taking out the row corresponding to the node without the label;
(2) If node i is at h out The 1 st dimension value of the corresponding row in the list is larger than the 2 nd dimension value, the node is a spammer comment sender, and otherwise, the node is a normal user.
Firstly modeling users as nodes, modeling the interaction relationship among the users as edges to establish a graph structure, and simultaneously, manually marking a small number of spammer comment senders; then, a graph neural network is built, the neural network mainly comprises three parts, namely neighborhood feature extraction, global feature extraction and feature fusion, the graph neural network finally outputs a two-dimensional vector, the first dimension can be regarded as the probability that the user is a spammer comment sender, and the second dimension can be regarded as the probability that the user is a normal user; then, iterative optimization is carried out by using a gradient descent algorithm, and in each iteration, the loss of the neural network is calculated by using the label information and the cross entropy and the parameters of the neural network are updated by using gradient descent according to the loss; and finally, acquiring the output of the neural network as a detection result after loss convergence. The invention has the following characteristics: the label information required to be manually marked is less; the detection capability of the disguised spammer is high.
Drawings
FIG. 1 is a flow chart of the steps performed.
Detailed Description
Users have various interaction relationships on an online social platform, for example, in a popular comment, the users can interact through mutual comments, and also can interact through mutual comments on the same article. Therefore, the invention mainly solves the problem of detecting the spammer comment senders on the multiple relation diagrams. For a multiple relationship graph
Figure BDA0004089780970000046
Wherein,,
Figure BDA0004089780970000047
Figure BDA0004089780970000048
representing node set,/->
Figure BDA0004089780970000049
Representing a collection of node attributes. For each relation r.epsilon. {1,2, …, R } there is a set of edges +.>
Figure BDA00040897809700000410
Wherein (1)>
Figure BDA00040897809700000411
Representative node v i And node v j There is an edge under the relation r. The specific steps of the invention are as follows:
1) Graph construction and preprocessing
First, users are nodes, and the interactive relation among the users is an edge to build a graph structure and an adjacency matrix.
And secondly, digitizing the user attributes to construct an attribute matrix, wherein each row of the matrix represents the attributes of the corresponding user.
And thirdly, manually marking 3% of data, and giving the number and the label of the marked nodes. 1 represents a spam comment sender, and 0 represents a normal user.
And fourthly, dividing the training set and the testing set according to the manual labeling, wherein the ratio of the training set to the testing set is 7:3.
Fifth, a confidence vector is established
Figure BDA0004089780970000051
Wherein N represents the number of nodes, the ith position is 0 time representing node i more likely to be a normal user, and the ith position is 1 time representing node i more likely to be a spammer comment sender. Initializing the confidence vector, enabling the corresponding position of the node with the label of 0 in the training set to be 0 in the B, and enabling the corresponding position of the node with the label of 1 in the training set to be 1 in the B. The remaining positions are all 0.
2) Graph neural network construction
The graph neural network comprises two layers, the output dimension of the last layer is 2, the 1 st dimension represents the confidence that the neural network judges the node as a spammer comment sender, and the 2 nd dimension represents the confidence that the neural network judges the node as a normal user.
Conventional spam sender detection often ignores the disguising phenomenon of the spammer. The characteristics of the node are acquired by aggregating the characteristics of the node neighbors through the graph neural network. If the spammer interacts with a large number of normal users, the spammer may get similar features to the normal users after passing through the neural network. Therefore, the method considers the category information of the neighbors when extracting the node characteristics, and executes different characteristic aggregation strategies for the neighbors of different types.
The following procedure is included for each layer of neural network:
algorithm 1: graph neural network of fusion edge type
First step, using the full connection layer to connect the feature h of user u u The user characteristic z after dimension reduction is obtained by dimension reduction u The specific formula is as follows:
z u =W t h u
wherein,,
Figure BDA0004089780970000052
weight matrix of full connection layer, d in Inputting dimensions for the layer, d out Outputting dimensions for the layer;
secondly, regarding the node v as a central node, and calculating importance coefficients of each neighbor u of the node v under the relationship r according to the relationship between the node v and the central node
Figure BDA0004089780970000053
The specific formula is as follows:
Figure BDA0004089780970000054
wherein,,
Figure BDA0004089780970000055
is a trainable weight vector.
Thirdly, judging whether the neighbors of the node are similar to the nodes according to the confidence coefficient vector B, and putting the similar neighbors of the node v under the relation r into a set
Figure BDA0004089780970000056
In the heterogeneous neighbors put into the set +.>
Figure BDA0004089780970000057
Is a kind of medium.
And fourthly, respectively carrying out normalization operation on importance coefficients of the two types of neighbors to obtain attention scores for aggregation. For the neighbor u of the node v under the relation r, if the node u is similar to the node v, the attention score of the neighbor u
Figure BDA0004089780970000061
The following formula can be used to determine:
Figure BDA0004089780970000062
wherein,,
Figure BDA0004089780970000063
is a set of homogeneous neighbors of node v; exp is a natural exponential function; sigma is an arbitrary nonlinear activation function. Similarly, if the neighbor node is heterogeneous with node v, its attention score +.>
Figure BDA0004089780970000064
The following formula can be used to determine:
Figure BDA0004089780970000065
wherein,,
Figure BDA0004089780970000066
is a set of heterogeneous neighbors of node v.
Fifthly, respectively calculating the embedding of the similar neighbors of the central node v under the relation r according to the attention score calculated in the previous step
Figure BDA0004089780970000067
Embedding of heterogeneous neighbors of the central node v>
Figure BDA0004089780970000068
The calculation formula is as follows:
Figure BDA0004089780970000069
Figure BDA00040897809700000610
sixth step, for each node v, the characteristic z v The Euclidean distance between the node and other node characteristics is used as a basis to obtain a k neighbor graph formed by k neighbor nodes
Figure BDA00040897809700000611
Seventh step, according to
Figure BDA00040897809700000612
Performing an aggregation operation to obtain k neighbor embedding h of each node v knn,v The formula is as follows:
Figure BDA00040897809700000613
wherein K is the neighbor number selected by K neighbors, and is generally 2;
Figure BDA00040897809700000614
is a weight matrix; />
Figure BDA00040897809700000615
Is the k-nearest neighbor set of node v.
Eighth step, for each node v, its same kind of neighbors are embedded
Figure BDA00040897809700000616
Heterogeneous neighbor embedding->
Figure BDA00040897809700000617
And k neighbor embedding h knn,v The fusion can obtain the comprehensive embedding of the node v under the relation r>
Figure BDA00040897809700000618
Figure BDA00040897809700000619
Wherein,,
Figure BDA00040897809700000620
is a linear transformation matrix for embedding the node itself, the similar neighbor and the heterogeneous neighborInlet and k nearest neighbor embedding integration as d out Vector of dimension; and I is a splicing operation.
Ninth, a multi-head attention mechanism is introduced, the first to eighth steps are repeated for H times, and these are repeated
Figure BDA00040897809700000621
Splicing to obtain the characteristic of the multi-head post-attention node v under the relation r>
Figure BDA00040897809700000622
The recommended value of H is 4.
Tenth step, the relation is that
Figure BDA00040897809700000623
Integration into h' v . Directly adopts splicing and linear transformation operation. The formula is as follows:
Figure BDA0004089780970000071
wherein,,
Figure BDA0004089780970000072
is a weight matrix for Rd out Dimension node feature dimension reduction to d out
The above is an operation flow of the one-layer graph neural network. The output is a two-dimensional vector. The 1 st dimension represents the confidence of the sender of the spam comment, and the 2 nd dimension represents the confidence of the normal user. For h v,out And after the softmax operation is taken, the probability value of the node belonging to the normal node and the abnormal node can be represented. When h v,out When the 0 th dimension value of (2) is larger than the 1 st dimension value, the node is judged as a normal node; when h v,out When the 1 st dimension value is larger than the 0 th dimension value, the node is determined to be an abnormal node.
3) Iterative optimization
The first step, inputting the whole graph into a graph neural network to obtain an output result h out
Figure BDA0004089780970000073
Is all h v,out Is a longitudinal splice of (c).
Step two, undersampling the training label to obtain a node set participating in loss calculation
Figure BDA0004089780970000074
The number of normal nodes participating in loss calculation is similar to that of abnormal nodes, so that the influence of imbalance of the tag 01 is avoided.
Third, calculate according to the following formula
Figure BDA0004089780970000075
Loss of->
Figure BDA0004089780970000076
Figure BDA0004089780970000077
Wherein y is v A label representing node v.
Fourth step, outputting h according to the model out Updating confidence vector B to let h out Rows of dimension 1 and greater than dimension 2 of the row in B correspond to positions 1 and the remainder are 0.
Fifth step, according to the loss
Figure BDA0004089780970000078
A gradient descent algorithm is performed.
Sixth step, when losing
Figure BDA0004089780970000079
Training is stopped when convergence occurs.
4) Label-free user class output
First, obtaining a model output h out And taking out the row corresponding to the node without the label.
Second, if node i is at h out The 1 st dimension of the corresponding row is largeIn dimension 2, the node is the spammer, and otherwise is the normal user.
Thirdly, if various indexes such as the accuracy of the model and the like are required to be obtained, the labels of the test set and the corresponding output results are used for carrying out corresponding calculation.
The method and the system can be suitable for detection tasks of the spammer in various online platforms. And the method can effectively detect the spammer comment sender in camouflage. In comment data of Amazon instrument commodities, the invention takes users as nodes, interaction relations among the users are edges, user attributes are node characteristics to establish a graph structure, and a candidate list of spam comment senders is output after iterative training is carried out by using a graph neural network. The recall rate can reach 90%.

Claims (1)

1. A method for detecting users of online social network spam comments comprises the following steps:
step one, graph construction and preprocessing
(1) Establishing a graph structure by taking online social platform users as nodes and the interaction relationship among the users as edges, and constructing an adjacency matrix;
(2) Digitizing the user attributes, constructing an attribute matrix, wherein each row of the attribute matrix represents the attribute of the corresponding user;
(3) Manually marking part of data, giving the number and the label of the marked nodes, wherein 1 represents a spammer, 0 represents a normal user, and dividing a training set and a testing set;
(4) Establishing a confidence vector
Figure FDA0004089780960000011
Wherein N represents the number of nodes, the i-th site is 0 time and represents the node i more likely to be a normal user, and the i-th site is 1 time and represents the node i more likely to be a spammer comment sender; initializing the confidence vector, enabling the corresponding position of the node with the label of 0 in the training set to be 0 in the B, enabling the corresponding position of the node with the label of 1 in the training set to be 1 in the B, and enabling the rest positions to be 0;
step two, constructing a graph neural network
The graph neural network comprises two layers, the output dimension of the last layer is 2, the 1 st dimension represents the confidence that the neural network judges the node as a spam comment sender, the 2 nd dimension represents the confidence that the neural network judges the node as a normal user, the graph neural network acquires the characteristics of the node by aggregating the characteristics of the node neighbors, the category information of the neighbors is considered when the node characteristics are extracted, and different characteristic aggregation strategies are executed for the neighbors of different types;
the neural network comprises the following processes for each layer of graph:
(1) Feature h of user u using full connectivity layer u The user characteristic z after dimension reduction is obtained by dimension reduction u The formula is as follows:
z u =W t h u
wherein,,
Figure FDA0004089780960000012
weight matrix of full connection layer, d in Inputting dimensions for the layer, d out Outputting dimensions for the layer;
(2) Regarding the node v as a central node, for each neighbor u of the node v under the relationship r, calculating the importance coefficient of the node v according to the relationship with the central node
Figure FDA0004089780960000013
The formula is as follows:
Figure FDA0004089780960000014
wherein,,
Figure FDA0004089780960000015
is a trainable weight vector;
(3) Judging whether the neighbors of the node are similar to the neighbors of the node according to the confidence vector B, and putting the similar neighbors of the node v under the relation r into a set
Figure FDA0004089780960000021
In the heterogeneous neighbors put into the set +.>
Figure FDA0004089780960000022
In (a) and (b);
(4) Respectively carrying out normalization operation on importance coefficients of the two types of neighbors to obtain attention scores for aggregation; for the neighbor u of the node v under the relation r, if the node u is similar to the node v, the attention score of the neighbor u
Figure FDA0004089780960000023
The method is characterized by comprising the following steps:
Figure FDA0004089780960000024
wherein,,
Figure FDA0004089780960000025
is a set of homogeneous neighbors of node v; exp is a natural exponential function; sigma is a nonlinear activation function; similarly, if the neighbor node is heterogeneous with node v, its attention score +.>
Figure FDA0004089780960000026
The method is characterized by comprising the following steps:
Figure FDA0004089780960000027
wherein,,
Figure FDA0004089780960000028
is a set of heterogeneous neighbors of node v;
(5) Respectively calculating the embedding of similar neighbors of the central node v under the relation r according to the attention scores obtained by the previous step
Figure FDA0004089780960000029
Embedding of heterogeneous neighbors of the central node v>
Figure FDA00040897809600000210
The calculation formula is as follows:
Figure FDA00040897809600000211
Figure FDA00040897809600000212
(6) For each node v, in its characteristic z v The Euclidean distance between the node and other node characteristics is used as a basis to obtain a k neighbor graph formed by k neighbor nodes
Figure FDA00040897809600000213
(7) According to
Figure FDA00040897809600000214
Performing an aggregation operation to obtain k neighbor embedding h of each node v knn,v The formula is as follows:
Figure FDA00040897809600000215
wherein K is the neighbor number selected by K neighbors;
Figure FDA00040897809600000216
is a weight matrix; />
Figure FDA00040897809600000217
K neighbor set for node v;
(8) For each node v, its homogeneous neighbors are embedded
Figure FDA00040897809600000218
Heterogeneous neighbor embedding->
Figure FDA00040897809600000219
And k neighbor embedding h knn,v The fusion can obtain the comprehensive embedding of the node v under the relation r>
Figure FDA00040897809600000220
(9) Introducing a multi-head attention mechanism, repeating the steps (1) to (8) for H times, and combining the steps
Figure FDA0004089780960000031
Splicing to obtain the characteristic of the multi-head post-attention node v under the relation r>
Figure FDA0004089780960000032
(10) The operation of splicing and linear transformation is adopted to make the relation of multiple
Figure FDA0004089780960000033
Integration into h' v
After stacking two layers of the graph neural network, the output h of the last layer is obtained v,out The output is a two-dimensional vector; the 1 st dimension represents the confidence level of the sender of the spam comment, and the 2 nd dimension represents the confidence level of the normal user; for h v,out The probability value of the node belonging to the normal node and the abnormal node can be represented after the softmax operation is taken; when h v,out When the 0 th dimension value of (2) is larger than the 1 st dimension value, the node is judged as a normal node; when h v,out When the 1 st dimension value is larger than the 0 th dimension value, the node is determined as an abnormal node;
step three, iterative optimization
(1) Inputting the whole graph into a graph neural network to obtain an output result h out
Figure FDA0004089780960000034
Is all h v,out Is connected with the longitudinal splicing of the two parts;
(2) Undersampling training labels to obtain node sets participating in loss calculation
Figure FDA0004089780960000035
The number of normal nodes participating in loss calculation is similar to that of abnormal nodes, so that the influence of imbalance of the labels 01 is avoided;
(3) Calculated according to the following formula
Figure FDA0004089780960000036
Loss of->
Figure FDA0004089780960000037
Figure FDA0004089780960000038
Wherein y is v A label representing node v;
(4) Output h according to model out Updating confidence vector B to let h out The corresponding position of the row with the 1 st dimension being greater than the 2 nd dimension in the B is 1, and the rest is 0;
(5) According to the loss
Figure FDA0004089780960000039
Executing a gradient descent algorithm;
(6) When losing
Figure FDA00040897809600000310
Stopping training when convergence is performed;
step four, outputting the unlabeled user category
(1) Obtaining model output h out Taking out the row corresponding to the node without the label;
(2) If node i is at h out The 1 st dimension value of the corresponding row in the list is larger than the 2 nd dimension value, and the node is a spam commentThe sender, and vice versa, is a normal user.
CN202310148077.2A 2023-02-22 2023-02-22 Online social network spam comment user detection method Pending CN116304311A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310148077.2A CN116304311A (en) 2023-02-22 2023-02-22 Online social network spam comment user detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310148077.2A CN116304311A (en) 2023-02-22 2023-02-22 Online social network spam comment user detection method

Publications (1)

Publication Number Publication Date
CN116304311A true CN116304311A (en) 2023-06-23

Family

ID=86816079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310148077.2A Pending CN116304311A (en) 2023-02-22 2023-02-22 Online social network spam comment user detection method

Country Status (1)

Country Link
CN (1) CN116304311A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117828514A (en) * 2024-03-04 2024-04-05 清华大学深圳国际研究生院 User network behavior data anomaly detection method based on graph structure learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117828514A (en) * 2024-03-04 2024-04-05 清华大学深圳国际研究生院 User network behavior data anomaly detection method based on graph structure learning
CN117828514B (en) * 2024-03-04 2024-05-03 清华大学深圳国际研究生院 User network behavior data anomaly detection method based on graph structure learning

Similar Documents

Publication Publication Date Title
CN108805200B (en) Optical remote sensing scene classification method and device based on depth twin residual error network
CN111881350B (en) Recommendation method and system based on mixed graph structured modeling
CN111222332B (en) Commodity recommendation method combining attention network and user emotion
CN113961718A (en) Knowledge inference method based on industrial machine fault diagnosis knowledge graph
CN111292195A (en) Risk account identification method and device
CN114817663B (en) Service modeling and recommendation method based on class perception graph neural network
CN111753207B (en) Collaborative filtering method for neural map based on comments
CN110851491A (en) Network link prediction method based on multiple semantic influences of multiple neighbor nodes
CN110489661B (en) Social relationship prediction method based on generation of confrontation network and transfer learning
CN112417063B (en) Heterogeneous relation network-based compatible function item recommendation method
KR102284436B1 (en) Method and Device for Completing Social Network Using Artificial Neural Network
CN112381179A (en) Heterogeneous graph classification method based on double-layer attention mechanism
CN109447110A (en) The method of the multi-tag classification of comprehensive neighbours' label correlative character and sample characteristics
CN116304311A (en) Online social network spam comment user detection method
CN115687925A (en) Fault type identification method and device for unbalanced sample
CN106997373A (en) A kind of link prediction method based on depth confidence network
CN112819024B (en) Model processing method, user data processing method and device and computer equipment
CN114064627A (en) Knowledge graph link completion method and system for multiple relations
CN114898121A (en) Concrete dam defect image description automatic generation method based on graph attention network
CN113628059A (en) Associated user identification method and device based on multilayer graph attention network
CN112465226A (en) User behavior prediction method based on feature interaction and graph neural network
CN114942998B (en) Knowledge graph neighborhood structure sparse entity alignment method integrating multi-source data
CN113361928B (en) Crowd-sourced task recommendation method based on heterogram attention network
CN112668633B (en) Adaptive graph migration learning method based on fine granularity field
CN109978013A (en) A kind of depth clustering method for figure action identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination