CN112445957A

CN112445957A - Social network abnormal user detection method, system, medium, equipment and terminal

Info

Publication number: CN112445957A
Application number: CN202011226262.1A
Authority: CN
Inventors: 朱辉; 俞志鹏; 李鹤麟; 李晖; 兰玮; 文浩斌
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-11-05
Filing date: 2020-11-05
Publication date: 2021-03-05

Abstract

The invention belongs to the technical field of social network data mining, and discloses a method, a system, a medium, equipment and a terminal for detecting abnormal users in a social network, which preprocess crawled social network data and construct a social network adjacency matrix, a social network attribute matrix and a social network adjacency attribute matrix; based on the social network attribute matrix and the social network adjacent attribute matrix, obtaining a social network user low-dimensional representation matrix by using a deep neural network model of a self-coding structure, and updating an abnormal value of each user in the social network; and finally, evaluating the abnormal degree of each user in the social network through the abnormal value, and finishing the detection and identification of the abnormal user in the social network. According to the method, network representation learning and the detection task of the abnormal social network user are combined, the influence of the abnormal social network user on the representation learning of the social network can be effectively reduced while the abnormal social network user is identified, a robust network embedded vector is generated, and convenience is provided for a downstream data mining task.

Description

Social network abnormal user detection method, system, medium, equipment and terminal

Technical Field

The invention belongs to the technical field of social network data mining, and particularly relates to a method, a system, a medium, equipment and a terminal for detecting abnormal users in a social network.

Background

At present: with the rapid development and wide application of internet technology, social networks gradually become an essential component in people's digital life due to their convenience, entertainment and real-time. On one hand, the social network bears massive media information and social information, and on the other hand, the social network also contains a large amount of privacy information and huge commercial value, so that the social network attracts a large number of malicious attackers. Malicious attackers create false accounts or steal normal accounts, and carry out malicious behaviors such as malicious information publishing, financial transaction fraud, network attack launching and the like in the social network, thereby seriously threatening the life and property safety of people and the regular order and trust relationship of the social network. These malicious attackers are collectively referred to as anomalous users.

The following difficulties exist in the detection and identification of abnormal user nodes in the social network:

(1) the traditional social network anomaly detection method needs to spend a great deal of time overhead and labor cost. The user base number in the social network is large, the coverage range is wide, various abnormal users are covered, the behavior characteristics of the abnormal users can be dynamically changed along with the time, and when the abnormal users change the behavior modes, the traditional social network abnormality detection method cannot effectively process the abnormal users.

(2) The complexity of social networks presents a great difficulty for anomaly detection efforts. Due to the existence of edges in the topological structure, the expression of the data shows the characteristics of high-dimensional sparsity, high coupling of user nodes and repeated iteration of relationships among the user nodes, so that the user characteristics are difficult to capture.

At present, in view of the above problems, solutions have been proposed:

(1) the method solves the abnormal factors of each social network account on the consistency of the topological structure, the node attribute and the structure attribute by constructing a joint optimization model of the topological structure and the node attribute in the social network, and jointly evaluates the three abnormal factors to complete the detection and the identification of the abnormal social network account.

(2) A method for detecting abnormal users in a social network based on graph embedding comprises the steps of constructing a user node embedding model according to community attribution relation values of user nodes in the social network, further solving an embedding weighting vector and an abnormal level of the user nodes, and defining the user nodes with the abnormal level larger than a maximum threshold value or smaller than a minimum threshold value as abnormal user nodes.

However, the above solutions all have certain limitations:

(1) the method and the system for detecting the abnormal account of the social network based on network representation learning have the defects that:

1) matrix factorization techniques are not suitable for large-scale social networks;

2) real-world social networks exhibit highly complex non-linearities that are difficult to capture by matrix decomposition techniques.

(2) The method for detecting the abnormal users in the social network based on graph embedding has the defects that:

1) the community structure of the social network is lack of universal definition, the community detection in the large-scale social network is difficult, and the accuracy of community structure division directly relates to the effect of the method;

2) the method lacks the restriction on the abnormal user nodes during embedding so as to reduce the influence of the abnormal user nodes on the final embedded vector, so that the method is difficult to construct a robust graph embedding model.

Through the above analysis, the problems and defects of the prior art are as follows:

(1) the existing social network abnormal account detection method and system matrix decomposition technology based on network representation learning are not suitable for large-scale social networks; real-world social networks exhibit highly complex non-linearities that are difficult to capture by matrix decomposition techniques.

(2) The community structure of the social network of the existing abnormal user detection method based on graph embedding in the social network is lack of universal definition, the community detection in a large-scale social network is difficult, and the accuracy of community structure division directly relates to the effect of the method; and the method lacks the restriction on the abnormal user nodes during embedding so as to reduce the influence of the abnormal user nodes on the final embedded vector, so that the method is difficult to construct a robust graph embedding model.

The difficulty in solving the above problems and defects is: aiming at a large-scale social network, a robust model which provides convenience for downstream data mining tasks can be generated while effectively detecting and identifying abnormal users in the social network is ensured.

The significance of solving the problems and the defects is as follows: the method has important significance for the problems of social network site safety, user privacy protection and the like, and also has important research value for the problems of group event monitoring, public opinion guide analysis and the like in the social network.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a method, a system, a medium, equipment and a terminal for detecting abnormal users in a social network.

The invention is realized in such a way, and provides a method for detecting abnormal users in a social network, which comprises the following steps:

preprocessing the crawled social network data, constructing a social network adjacency matrix, a social network attribute matrix and a social network adjacency attribute matrix, wherein the social network attribute matrix is used as the input of the model, the social network adjacency attribute matrix is used as the expected output of the model, and the model is trained on the basis of loss reduction.

Based on the social network attribute matrix and the social network adjacent attribute matrix, a social network user low-dimensional representation matrix is obtained by utilizing a deep neural network model of a self-coding structure, and an abnormal value of each user in the social network is updated, so that on one hand, the abnormal detection and identification of the social network can be carried out, and on the other hand, a robust user low-dimensional representation matrix is generated by introducing a coefficient factor inversely proportional to the abnormal value.

And evaluating the abnormal degree of each user in the social network through the abnormal value, wherein the abnormal value of each user is ranked from high to low so as to complete the detection and identification of the abnormal user in the social network.

Further, the social network abnormal user detection method comprises the following steps:

(1) constructing a social network (V, E, A) by using the social network data set, wherein V is a set of all nodes in the social network, E is a set of all edges in the social network, and A is a set of all node attributes in the social network;

(2) preprocessing the social network data (V, E, A) in the step (1) to construct an N multiplied by N dimensional social network adjacency matrix G, N multiplied by M dimensional social network attribute matrix A and an N multiplied by M dimensional social network adjacency attribute matrix

(3) Constructing a social network abnormal user detection model based on network embedding;

(4) initializing relevant parameters of the model, and repeatedly and iteratively calculating by a gradient descent method to reduce the loss function value

And sequencing all the nodes in the social network from high to low according to the abnormal values until convergence, and outputting the result and feeding back the result to data mining personnel for detecting and identifying the abnormal nodes in the social network.

Further, the preprocessing step includes:

1) and (3) shaping the unique identification of the social network node: aiming at each user node v in the social network data set, taking the row number of the user node v as a unique integer index of the user node v, and starting the row number index from 1;

2) constructing a social network adjacency matrix: constructing an N multiplied by N-dimensional social network adjacency matrix G based on every two pairwise attention relationship matrixes of the dimension of E multiplied by 2 in the social network data;

3) constructing a social network attribute matrix: the attribute vectors of all nodes in the social network data set form an N multiplied by M-dimensional social network attribute matrix A;

4) constructing a social network adjacency attribute matrix: aiming at each node v in the graph, acquiring a neighbor node set Neigh (v), if the neighbor node exists, the adjacent attribute vector is the average value of all the neighbor node attribute vectors, namely

If there is no neighbor node, then the adjacent attribute vector is assigned as its own attribute vector, i.e. the neighbor node does not exist

The operation is executed on all nodes in the social network to obtain the social network adjacent attribute matrix with the dimension of N multiplied by M

Further, the model building step includes:

1) constructing a deep neural network model based on a self-coding structure: the encoder mainly comprises K layers of full connection layers, wherein K is a positive integer greater than or equal to 1, M-dimensional attribute vectors are finally reduced into D-dimensional hidden layer output through the K layers of encoders, and the full connection layers are connected through a hyperbolic tangent activation function;

the decoder also comprises K layers of full connection layers, the D-dimensional input vector is finally expanded into an M-dimensional output vector through the K layers of decoders, and the full connection layers are connected through a hyperbolic tangent activation function;

constructing an N multiplied by D-dimensional social network user low-dimensional representation matrix E based on the N multiplied by D-dimensional social network user low-dimensional representation matrix E;

2) constructing a loss function of a deep neural network model based on a self-coding structure: the method has the advantages that the abnormal nodes can be detected and identified to the maximum extent by the social network abnormal user detection model based on network embedding, meanwhile, the influence of the abnormal nodes on social media network representation learning is reduced as much as possible, and the following loss function is constructed;

3) updating the outliers of the deep neural network model based on the self-coding structure.

Further, the deep neural network model based on the self-coding structure is constructed, and the full connection layers are connected through a hyperbolic tangent activation function:

the decoder consists of K layers of full connection layers, and the full connection layers are connected through a hyperbolic tangent activation function:

further, the loss function:

where N is the total number of nodes in the social network, M is the dimension of the node attribute vector, and Y represents the adjacency attribute matrix, specifically, Y_ijRepresents the jth adjacency attribute value of the ith node,

a adjacency matrix representing the output of the deep neural network based on the self-coding structure may be, in particular,

a j-th adjacency attribute value, λ, of an ith node in a matrix representing the output of the deep neural network_iAn outlier representing the ith node, identifying the degree of anomaly for that node;

outlier updates are based on the following formula:

where M is the dimension of the node attribute vector and Y represents the adjacency attribute matrix, in particular Y_ijRepresents the jth adjacency attribute value of the ith node,

a j-th adjacency attribute value representing an i-th node in a matrix of the deep neural network output,

representing a calculation matrix

The Frobenius norm of the time-varying abnormal value needs to be updated after the deep neural network parameters are updated in each iteration.

It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:

preprocessing the crawled social network data, and constructing a social network adjacency matrix, a social network attribute matrix and a social network adjacency attribute matrix;

based on the social network attribute matrix and the social network adjacent attribute matrix, obtaining a social network user low-dimensional representation matrix by using a deep neural network model of a self-coding structure, and updating an abnormal value of each user in the social network;

and evaluating the abnormal degree of each user in the social network through the abnormal value, and finishing the detection and identification of the abnormal user in the social network.

It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

The invention also aims to provide an information data processing terminal, which is used for realizing the social network abnormal user detection method.

Another object of the present invention is to provide a system for detecting an abnormal user in a social network, which implements the method for detecting an abnormal user in a social network, the system comprising:

the data preprocessing module is used for preprocessing the crawled social network data and constructing a social network adjacency matrix, a social network attribute matrix and a social network adjacency attribute matrix;

the abnormal value updating module is used for obtaining a social network user low-dimensional representation matrix by utilizing a deep neural network model of a self-coding structure based on the social network attribute matrix and the social network adjacent attribute matrix, and updating the abnormal value of each user in the social network;

and the abnormal user detection and identification module is used for evaluating the abnormal degree of each user in the social network through the abnormal value so as to complete the detection and identification of the abnormal user in the social network.

By combining all the technical schemes, the invention has the advantages and positive effects that: the experimental effect is shown in fig. 6, the abscissa represents the data of the top L% of abnormal user nodes in the social network after all the user nodes are ranked from high to low according to the abnormal values, and the ordinate represents the recall rate of the abnormal user node detection and identification in the social network.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.

Fig. 1 is a flowchart of a social network abnormal user detection method according to an embodiment of the present invention.

FIG. 2 is a schematic structural diagram of a system for detecting abnormal users in a social network according to an embodiment of the present invention;

in fig. 2: 1. a data preprocessing module; 2. an outlier update module; 3. and an abnormal user detection and identification module.

Fig. 3 is a flowchart of an implementation of a method for detecting an abnormal user in a social network according to an embodiment of the present invention.

Fig. 4 is an overall frame diagram of a social network abnormal user detection method according to an embodiment of the present invention.

Fig. 5 is a social network diagram of a social network abnormal user detection method according to an embodiment of the present invention.

Fig. 6 is an experimental effect diagram of an embodiment of the social network abnormal user detection method according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Aiming at the problems in the prior art, the invention provides a method, a system, a medium, a device and a terminal for detecting abnormal users in a social network, and the invention is described in detail with reference to the accompanying drawings.

As shown in fig. 1, the method for detecting abnormal users in social networks provided by the present invention includes the following steps:

s101: preprocessing the crawled social network data, and constructing a social network adjacency matrix, a social network attribute matrix and a social network adjacency attribute matrix;

s102: based on the social network attribute matrix and the social network adjacent attribute matrix, obtaining a social network user low-dimensional representation matrix by using a deep neural network model of a self-coding structure, and updating an abnormal value of each user in the social network;

s103: and evaluating the abnormal degree of each user in the social network through the abnormal value, and finishing the detection and identification of the abnormal user in the social network.

Those skilled in the art can also implement the method for detecting abnormal users in social network provided by the present invention by using other steps, and the method for detecting abnormal users in social network provided by the present invention in fig. 1 is only a specific embodiment.

As shown in fig. 2, the system for detecting abnormal users in social networks provided by the present invention includes:

the data preprocessing module 1 is used for preprocessing the crawled social network data and constructing a social network adjacency matrix, a social network attribute matrix and a social network adjacency attribute matrix;

the abnormal value updating module 2 is used for obtaining a social network user low-dimensional representation matrix by using a deep neural network model of a self-coding structure based on the social network attribute matrix and the social network adjacent attribute matrix, and updating the abnormal value of each user in the social network;

and the abnormal user detection and identification module 3 is used for evaluating the abnormal degree of each user in the social network through the abnormal value so as to complete the detection and identification of the abnormal user in the social network.

The technical solution of the present invention is further described below with reference to the accompanying drawings.

As shown in fig. 3, the method for detecting abnormal users in social networks provided by the present invention includes the following steps:

(1) constructing a social network (V, E, a) by using a social network data set as shown in fig. 4, where in the data set, the total number of user nodes | | | V | | is 2708, the total number of directed edges | | | E | | | is 5429, and the user attribute vector dimension M is 1433;

(2) preprocessing the social network data (V, E, A) in (1) to construct a 2708 × 2708 dimensional social network adjacency matrix G, a 2708 × 1433 dimensional social network attribute matrix A and a 2708 × 1433 dimensional social network adjacency attribute matrix

Specifically, the pretreatment steps are as follows:

2) constructing a social network adjacency matrix: constructing a 2708X 2708 dimensional social network adjacency matrix G based on 5429X 2 dimensional pairwise attention relationship matrixes in social network data;

3) constructing a social network attribute matrix: the attribute vectors of all nodes in the social network data set form a 2708 multiplied by 1433 dimensional social network attribute matrix A;

The 2708 x 1433 dimensional social network adjacency attribute matrix can be obtained by executing the above operations on all nodes in the social network

(3) The method comprises the following steps of constructing a social network abnormal user detection model based on network embedding, specifically, the model construction steps are as follows:

1) constructing a deep neural network model based on a self-coding structure: in the encoder of this embodiment, we assign K to 5 and D to 32, as shown in fig. 5, the encoder is mainly composed of five fully-connected layers, the first layer reduces 1433-dimensional attribute vector to 512-dimensional, the second layer reduces 512-dimensional data of the upper layer to 256-dimensional, the third layer reduces 256-dimensional data of the upper layer to 128-dimensional, the fourth layer reduces 128-dimensional data of the upper layer to 64-dimensional, the fifth layer reduces 64-dimensional data of the upper layer to 32-dimensional hidden layer output, and the fully-connected layers are connected by hyperbolic tangent activation function, that is:

in the decoder of this embodiment, we assign K to 5 and D to 32, as shown in fig. 5, the decoder is mainly composed of five fully-connected layers, the first layer expands the input data of 32 dimensions to 64 dimensions, the second layer expands the 64-dimensional data of the upper layer to 128 dimensions, the third layer expands the 128-dimensional data of the upper layer to 256 dimensions, the fourth layer expands the 256-dimensional data of the upper layer to 512 dimensions, the fifth layer expands the 512-dimensional data of the upper layer to 1433 dimensions, and the fully-connected layers are connected by a hyperbolic tangent activation function, that is:

constructing a 2708 multiplied by 32-dimensional social network user low-dimensional representation matrix E based on the representation matrix;

2) constructing a loss function of a deep neural network model based on a self-coding structure: in order to enable a social network abnormal user detection model based on network embedding to detect and identify abnormal nodes to the maximum extent and reduce the influence of the abnormal nodes on social media network representation learning as much as possible, the following loss functions are constructed:

where 2708 is the total number of nodes in the social network, 1433 is the dimension of the node attribute vector, and Y represents the adjacency attribute matrix, specifically, Y_ijRepresents the jth adjacency attribute value of the ith node,

a j-th adjacency attribute value, λ, of an ith node in a matrix representing the output of the deep neural network_iAn outlier representing the ith node, identifying the degree of abnormality for that node.

3) Updating abnormal values of the deep neural network model based on the self-coding structure: outlier updates are based on the following formula:

where 1433 is the dimension of the node attribute vector and Y represents the adjacency attribute matrix, in particular Y_ijRepresents the jth adjacency attribute value of the ith node,

representing a calculation matrix

Until convergence, all nodes in the social network are ranked from high to low according to abnormal values, and the result is output and fed back to data mining personnel for detecting and identifying the abnormal nodes existing in the social network, wherein the experimental effect is as shown in fig. 6, the abscissa represents the data of the first L% of abnormal user nodes ranked from high to low according to the abnormal values in the social network, and the ordinate represents the recall rate of the abnormal user nodes detected and identified in the social network.

Demonstration section (concrete examples/Positive Experimental data capable of demonstrating the inventive step of the present invention, etc.)

The experimental data set is shown in the following table:

the experimental effect evaluation is shown in the following table:

it should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for detecting abnormal users in a social network is characterized by comprising the following steps:

2. The method of detecting users with abnormal social network according to claim 1, wherein the method of detecting users with abnormal social network comprises the steps of:

(1) constructing a social network (V, E, A) by using the social network data set, wherein V is a set of all nodes in the social network, E is a set of all edges in the social network, and A is an attribute matrix of all nodes in the social network;

3. The social networking anomalous user detection method of claim 2, wherein said preprocessing step includes:

4. The method of social network anomaly user detection according to claim 2, said model building step comprising:

5. The method for detecting the abnormal users in the social network as claimed in claim 4, wherein the deep neural network model based on the self-coding structure is constructed, and the full connection layers are connected through a hyperbolic tangent activation function:

6. the social networking anomalous user detection method of claim 4, wherein said loss function:

wherein N is the total number of nodes in the social networkM is the dimension of the node attribute vector, Y represents the adjacency attribute matrix, in particular Y_ijRepresents the jth adjacency attribute value of the ith node,

outlier updates are based on the following formula:

representing a calculation matrix

7. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:

8. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

9. An information data processing terminal, characterized in that the information data processing terminal is used for implementing the social network abnormal user detection method of any one of claims 1 to 6.

10. A social network abnormal user detection system for implementing the social network abnormal user detection method of any one of claims 1 to 6, wherein the social network abnormal user detection system comprises: