CN111275562A - Dynamic community discovery method based on recursive convolutional neural network and self-encoder - Google Patents

Dynamic community discovery method based on recursive convolutional neural network and self-encoder Download PDF

Info

Publication number
CN111275562A
CN111275562A CN202010056877.8A CN202010056877A CN111275562A CN 111275562 A CN111275562 A CN 111275562A CN 202010056877 A CN202010056877 A CN 202010056877A CN 111275562 A CN111275562 A CN 111275562A
Authority
CN
China
Prior art keywords
network
node
space
neural network
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010056877.8A
Other languages
Chinese (zh)
Inventor
吴伶
陈志华
张岐山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202010056877.8A priority Critical patent/CN111275562A/en
Publication of CN111275562A publication Critical patent/CN111275562A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a dynamic community discovery method based on a recursive convolutional neural network and an autoencoder. The method comprises the following steps: firstly, constructing a network space characteristic learning model based on a convolutional neural network, and learning the space topological characteristics of the network to obtain a network space characteristic vector; secondly, fusing a network space characteristic learning model based on a convolutional neural network, taking a network space characteristic vector as the input of the model, constructing a network space-time characteristic learning model based on the recurrent neural network, the convolutional neural network and an autoencoder, and learning the space-time characteristic of the network to obtain a network space-time characteristic vector; and finally, carrying out community discovery on the basis of the network space-time characteristic vector to detect the dynamic community structure of the social network. The method can be applied to analyzing the social network, autonomously learns and extracts the spatial-temporal characteristics of the social network, and can further improve the modularity of the community structure, so that the topological structure of the real network and the like are disclosed, and the network user behavior, information propagation and the like are effectively predicted.

Description

Dynamic community discovery method based on recursive convolutional neural network and self-encoder
Technical Field
The invention relates to a dynamic community discovery method based on a recursive convolutional neural network and an autoencoder.
Background
With the development of the internet, particularly the mobile internet, social network platforms aiming at friend making, information sharing, and the like have been rapidly developed. People on the social network platforms at home and abroad, represented by the newwave microblog, the WeChat, the Taobao, the Twitter and the Facebook, can issue opinions, make friends and interact, transmit information, promote commodities and the like. According to Facebook's 1 st quarter report in 2018, it was shown that an average of 22 million users per month used Facebook, with daily active users up to 14 million and an average of 5 new accounts per second being created. In addition, the number of active users per month of the domestic popular social software Wechat firstly breaks through 10 hundred million customs in 2018.
Online social networks have become a bridge connecting network information space-time with the human physical world, profoundly changing people's behavioral patterns and social modalities. The online social network enables face-to-face communication of people not to be limited by time, space, distance and cost, greatly changes life styles of people, improves life quality of people and brings negative hidden troubles to economic life of people and even safety and stability of the country. Therefore, big data analysis on social networks has become an important research branch in the field of data mining in recent years.
The community structure refers to that the nodes forming the community in a network are relatively close to each other or have relatively high similarity, that is, generally speaking, the degree of closeness of the node connection in the community is much higher than that of the node connection between communities. Typical applications of community discovery in the real world include discovering interests or behavioral patterns common to people, finding circles of friends from social networks or groups belonging to the same organization, and so forth. The community discovery is to reveal the topological structure and function of a real network, and macroscopic phenomena and microscopic behaviors of information on the network from a viewing angle so as to help an information manager understand the dynamics and the evolution mechanism of the network, and further effectively predict the network user behaviors and control the information transmission on the network. At present, community discovery is a rapidly developing hot research field branch in the field of social network data mining.
The complex nature of complex social networks, such as node massiveness, structural complexity, and multidimensional demonstrations, make community discovery in social networks challenging.
Firstly, the node massiveness puts a very strict requirement on the performance of a complex social network community discovery algorithm, only an algorithm with linear time complexity can be operated in a real social network and practically performs related analysis and application, and only few algorithms meet the linear or near-linear time complexity requirement at present.
Secondly, the structural complexity also provides a challenge for the performance of the complex social network community discovery algorithm, and the real network structure simultaneously comprises overlapping, layering and multiplicity, namely the community structure which can be discovered by the algorithm is required to have the overlapping and layering structures and simultaneously express multiple information of nodes.
Finally, the multidimensional evolution provides higher-order challenges for the performance of the complex social network community discovery algorithm, the real network structure evolves along with the evolution of time and often shows a trend of expansion and growth, namely the community structure discovered by the algorithm can not only detect the community structure, but also capture and track the evolution rule of the community structure along with the change of the time dimension.
In essence, community discovery on complex social networks is a spatiotemporal-related complex network graph feature mining problem. Deep learning, which is an excellent machine learning method capable of autonomously learning data features, has been successfully extended by learners to spatio-temporal feature learning of network graph data in recent years.
Disclosure of Invention
The invention aims to provide a dynamic community discovery method based on a recurrent convolutional neural network and an autoencoder, which can be applied to analysis of a social network, autonomous learning and extraction of the spatial-temporal characteristics of the social network, and further can improve the modularity of a community structure, so that the topological structure and the like of a real network are disclosed, and the network user behavior, information propagation and the like are effectively predicted.
In order to achieve the purpose, the technical scheme of the invention is as follows: a dynamic community discovery method based on a recurrent convolutional neural network and an autoencoder comprises the steps of firstly, constructing a network space characteristic learning model based on the recurrent neural network, and learning the space topological characteristic of the network to obtain a network space characteristic vector; secondly, fusing a network space characteristic learning model based on a convolutional neural network, taking a network space characteristic vector as the input of the model, constructing a network space-time characteristic learning model based on the recurrent neural network, the convolutional neural network and an autoencoder, and learning the space-time characteristic of the network to obtain a network space-time characteristic vector; and finally, carrying out community discovery on the basis of the network space-time characteristic vector to detect the dynamic community structure of the social network.
In an embodiment of the present invention, before the network spatial feature learning model based on the convolutional neural network is constructed and spatial topological features of the network are learned to obtain network spatial feature vectors, a specific implementation process is as follows:
1) preprocessing a data set:
1.1) selecting opinion leaders: transforming the network topological structure of the data set to obtain an adjacency matrix of the network, analyzing the network topological structure based on the adjacency matrix, and finding out the most influential opinion leader node in the group;
constructing an adjacency matrix A according to whether connection exists between nodes in a network space structure of the data set, wherein E is a collection of edges connected between the nodes in the network space structure; if there is a connection e between node i and node ji,jIf so, then the variable ai,jIs 1; if there is no connection e between node i and node ji,jIf (a), the variable ai,jFor 0, the adjacency matrix transformation method is shown in formula (1):
Figure BDA0002371482930000022
wherein
Figure BDA0002371482930000021
Constructing a state transition probability matrix C between nodes according to the adjacency matrix A; if the number of the nodes connected with other nodes is more, the transition probability of each connection is lower; if the number of the nodes connected with other nodes is less, the transition probability of each connection is higher; that is, if node i is connected to node j only and is not connected to other nodes, node j is an important node for node i, so transition probability ci,jThe larger the value of (C), the more the transition probability matrix C is calculated in equation (2):
Figure BDA0002371482930000031
wherein
Figure BDA0002371482930000032
In the initial stage, the influence score of each node is set to be 1, and a node influence score matrix is defined as an S matrix and initialization thereof are shown in a formula (3); then, calculating a limit matrix of the node influence score matrix according to the transition probability matrix C, wherein the node influence score limit matrix is S*,S*See formula (4); finally, according to the node influence score limiting matrix, finding out the node i with the highest influence score, namely the opinion leader node i _ leader, and calculating the node i with the highest influence score according to a formula (5):
S=[s1s2… sn]wherein the initial value si=1 (3)
Figure BDA0002371482930000033
Figure BDA0002371482930000034
1.2) selecting adjacent nodes: after finding out the opinion leader nodes, finding out nodes which are highly adjacent to the opinion leader so as to reconstruct a network space structure of the data set according to the indirect connection proximity among the nodes;
when the found opinion node i is found, calculating the non-direct connection proximity between the node i and the node j by adopting an Euclidean distance r (i, j), wherein a calculation formula is shown in (6); after the distances between the node i and other nodes are calculated and compared, the node j with the shortest distance to the node i is obtained, namely the node j _ neighbor closest to the opinion leader node i _ leader is obtained, and the calculation formula is shown in (7);
Figure BDA0002371482930000035
wherein d (i, j, k) ═ xi,k-xj,k
Figure BDA0002371482930000036
Wherein j ≠ i (7)
1.3) matrix transformation: according to the found nearest neighbor node j _ neighbor to the opinion leader node i _ leader, taking the opinion leader node i _ leader as a first row of the reconstructed adjacent matrix, and taking the nearest neighbor node j _ neighbor as a second row; if the nearest neighbor node is more than one, repeating the step 1) to select the opinion leader node from the nodes of the rest untransformed column positions, and then repeating the processes of 2) -3); the calculation for reconstructing the adjacency matrix X ', X' by repeating the steps is shown in formula (8):
Figure BDA0002371482930000041
wherein
Figure BDA0002371482930000042
2) Constructing a network space feature learning model based on a convolutional neural network, learning the space topological feature of the network to obtain a network space feature vector: constructing a combined model of a convolutional neural network and an autoencoder, namely a network space feature learning model, wherein the neural network comprises an input layer with n neurons, a convolutional layer with q neurons and an output layer with n neurons; and then, dividing the reconstructed adjacent matrix X' into n data of 1 × n, and taking the n data as the input of a neural network of the network space feature learning model to obtain a network space feature vector.
In an embodiment of the present invention, the network space feature learning model based on the convolutional neural network is constructed by taking a network space feature vector as an input of the model, and a network spatio-temporal feature learning model based on the recurrent neural network, the convolutional neural network and the self-encoder is constructed, and a specific process of learning spatio-temporal features of the network to obtain the network spatio-temporal feature vector is as follows:
constructing a network space-time characteristic learning model based on a recurrent neural network, a convolutional neural network and an autoencoder, wherein the neural network comprises an input layer with q neurons, a convolutional layer with p neurons and an output layer with q neurons; and obtaining network space feature vectors of t time points through a network space feature learning model, and inputting the network space features of q dimensionalities of each time point as a neural network of the network space-time feature learning model to obtain the network space-time feature vectors.
In an embodiment of the invention, an algorithm adopted for community discovery on the basis of the network space characteristic vector is a K-means algorithm, and the network space-time characteristic vector is grouped through the K-means algorithm, so that community discovery can be realized, and a community structure is detected.
In one embodiment of the invention, the method is applied to analyzing social networks.
Compared with the prior art, the invention has the following beneficial effects: the method can be applied to analyzing the social network, autonomously learns and extracts the spatial-temporal characteristics of the social network, and can further improve the modularity of the community structure, so that the topological structure of the real network and the like are disclosed, and the network user behavior, information propagation and the like are effectively predicted.
Drawings
FIG. 1 is a diagram of a community discovery model based on recursive convolutional self-coding.
Fig. 2 is an expanded recurrent neural network.
Fig. 3 is a neural network structure of a recurrent neural network in combination with an auto-encoder.
Fig. 4 is a generalized recurrent neural network architecture incorporating an autoencoder.
FIG. 5 shows the variation of the edges of the Email-Enron data set at various times.
FIG. 6 shows the reconstruction results of the network space structure of 8 time slices of the Email-Enron data set.
Fig. 7 is a comparison of the modularity of the DRAER algorithm and the AE algorithm.
Detailed Description
The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.
The invention provides a dynamic community discovery method based on a recurrent convolutional neural network and an autoencoder, which comprises the steps of firstly, constructing a network space characteristic learning model based on the convolutional neural network, and learning the space topological characteristic of the network to obtain a network space characteristic vector; secondly, fusing a network space characteristic learning model based on a convolutional neural network, taking a network space characteristic vector as the input of the model, constructing a network space-time characteristic learning model based on the recurrent neural network, the convolutional neural network and an autoencoder, and learning the space-time characteristic of the network to obtain a network space-time characteristic vector; and finally, carrying out community discovery on the basis of the network space-time characteristic vector to detect the dynamic community structure of the social network.
The following is a specific implementation of the present invention.
The invention relates to a community discovery model diagram based on recursive convolutional neural network and self-encoder of a dynamic community discovery method, as shown in figure 1, the invention uses an Email-Enron data set of a SNAP project website to verify the performance of a DCAER algorithm on a real network. The self-encoder-based K-means community discovery algorithm and the DCAER method proposed by the research are realized, and the performance of the DCAER method proposed by the research is verified through carrying out experiments.
The invention relates to a dynamic community discovery method based on a recursive convolutional neural network and an autoencoder, which mainly comprises the following steps: (1) network space-time structure reconstruction strategy; (2) extracting a model of network space-time characteristics; (3) a dynamic community discovery algorithm.
1. Network space-time structure reconstruction strategy
Dynamic network data may be considered static network data at multiple points in time. Firstly, performing matrix reconstruction on static network data adjacent matrixes at a first time point through a network space reconstruction strategy and extracting a network space characteristic model to obtain network space characteristic vectors of the static network data at the first time point, and secondly, taking the structure reconstruction matrix of the first time point as the input of a recurrent neural network for the network space characteristic vectors at each time point.
The network space reconstruction strategy is specifically as follows:
1) selecting opinion leaders: transforming the network topological structure of the data set to obtain an adjacency matrix of the network, analyzing the network topological structure based on the adjacency matrix, and finding out the most influential opinion leader node in the group;
constructing an adjacency matrix A according to whether connection exists between nodes in a network space structure of the data set, wherein E is a collection of edges connected between the nodes in the network space structure; if there is a connection e between node i and node ji,jIf so, then the variable ai,j Is 1; if there is no connection e between node i and node ji,jIf (a), the variable ai,jFor 0, the adjacency matrix transformation method is shown in formula (1):
Figure BDA0002371482930000061
wherein
Figure BDA0002371482930000062
Constructing a state transition probability matrix C between nodes according to the adjacency matrix A; if the number of the nodes connected with other nodes is more, the transition probability of each connection is lower; if the number of the nodes connected with other nodes is less, the transition probability of each connection is higher; that is, if node i is connected to node j only and is not connected to other nodes, node j is an important node for node i, so transition probability ci,jThe larger the value of (C), the more the transition probability matrix C is calculated in equation (2):
Figure BDA0002371482930000063
wherein
Figure BDA0002371482930000064
In the initial stage, the influence score of each node is set to be 1, and a node influence score matrix is defined as an S matrix and initialization thereof are shown in a formula (3); then, calculating a limit matrix of the node influence score matrix according to the transition probability matrix C, wherein the node influence score limit matrix is S*,S*See formula (4); finally, according to the node influence score limiting matrix, finding out the node i with the highest influence score, namely the opinion leader node i _ leader, and calculating the node i with the highest influence score according to a formula (5):
S=[s1s2… sn]wherein the initial value si=1 (3)
Figure BDA0002371482930000065
Figure BDA0002371482930000066
2) Selecting adjacent nodes: after finding out the opinion leader nodes, finding out nodes which are highly adjacent to the opinion leader so as to reconstruct a network space structure of the data set according to the indirect connection proximity among the nodes;
when the found opinion node i is found, calculating the non-direct connection proximity between the node i and the node j by adopting an Euclidean distance r (i, j), wherein a calculation formula is shown in (6); after the distances between the node i and other nodes are calculated and compared, the node j with the shortest distance to the node i is obtained, namely the node j _ neighbor closest to the opinion leader node i _ leader is obtained, and the calculation formula is shown in (7);
Figure BDA0002371482930000071
wherein d (i, j, k) ═ xi,k-xj,k
Figure BDA0002371482930000072
Wherein j ≠ i (7)
3) Matrix transformation: according to the found nearest neighbor node j _ neighbor to the opinion leader node i _ leader, taking the opinion leader node i _ leader as a first row of the reconstructed adjacent matrix, and taking the nearest neighbor node j _ neighbor as a second row; if the nearest neighbor node is more than one, repeating the step 1) to select the opinion leader node from the nodes of the rest untransformed column positions, and then repeating the processes of 2) -3); the calculation for reconstructing the adjacency matrix X ', X' by repeating the steps is shown in formula (8):
Figure BDA0002371482930000073
wherein
Figure BDA0002371482930000074
2) Constructing a network space feature learning model based on a convolutional neural network, learning the space topological feature of the network to obtain a network space feature vector: constructing a combined model of a convolutional neural network and an autoencoder, namely a network space feature learning model, wherein the neural network comprises an input layer with n neurons, a convolutional layer with q neurons and an output layer with n neurons; and then, dividing the reconstructed adjacent matrix X' into n data of 1 × n, and taking the n data as the input of a neural network of the network space feature learning model to obtain a network space feature vector.
2. Network space-time feature extraction model
The recurrent neural network (as shown in fig. 2) is used as an important tool for time analysis, and a recurrent neural network method is combined with a self-encoder method to extract network time characteristics. The time characteristic extraction model constructs a neural network in the network, sets the variables of an input layer and an output layer to be the same, and comprises a plurality of hidden layers. In the neural network, a circulation layer is used as a hidden layer behind an input layer, time characteristics are extracted through circulation operation, and the time characteristics can be restored to original variables through an autoencoder; that is, the temporal features after the operation can represent the original input variables and be re-encoded into vectors having temporal features.
1) Description of the principles
In this section, 4 neurons in the input layer and a convolutional layer filter (1x3) are used to perform network data preprocessing, and then the network spatial feature H of the 1 st time point is obtained(1)={h1 (1),h2 (1)H and the cyberspace feature at the 2 nd time point(2)={h1 (2),h2 (2)As input to the recurrent neural network, i.e. the input layer has 2 neurons per time point. Where the loop layer (hidden layer) has 1 neuron, and the output layer is 2 neurons (variables) identical to the input layer, and considering two points in time, the network structure is shown in fig. 3.
In this example, the weights of the input layer and the hidden layer are { upsilon ^12The adjustment variable of hidden layer neuron is { b }1,1The weight between the cyclic layer is { nu }, the weight between the hidden layer and the output layer is { omega }12The adjustment variables of 2 neurons in the output layer are { b }2,1,b2,2}. Among them, neurons in the circulating layer (i.e. { g ](0),g(1),g(2)}) are shown in formula (9) to formula (11); output layer 4 neurons (i.e. { h) }1(1),h2(1),h1(2),h2(2)} see formula (12) to formula (13); and the loss function is shown in equation (14).
g(0)=0 (9)
Figure BDA0002371482930000081
Figure BDA0002371482930000082
hi(1)=ωig(1)+b2,i(12)
hi(2)=ωig(2)+b2,i(13)
Figure BDA0002371482930000083
The invention adopts a gradient descent method for optimization, and the correction modes of each weight and adjustment variable are shown in a formula (15) to a formula (19). When the training is completed, the network space features of the two input time points are circularly operated as { g(1),g(2)And the vector is the network time characteristic.
Figure BDA0002371482930000084
Figure BDA0002371482930000091
Figure BDA0002371482930000092
Figure BDA0002371482930000093
Figure BDA0002371482930000094
(2) General description of the invention
In this study, data of network space features of q dimensions at t time points are obtained as inputs to a neural network. A recurrent neural network combined with a neural network of a self-encoder is constructed, wherein the recurrent neural network comprises q neurons at an input layer, p neurons at a circulation layer (hidden layer) and q neurons at an output layer, and the network structure is shown in figure 4. During the optimization, the loss function will calculate the least squares error between the output layer and the input layer and apply a gradient descent method to modify the weights, as described in the above-mentioned theory. In the operation stage, training can be used to complete recurrent neural network and self-codingThe neural network of the device extracts the network space-time characteristics G of the ith time point(i)See formula (20).
G(i)=[g1 (i)g2 (i)… gp (i)](20)
3. Dynamic community discovery method
The invention adopts the K-means algorithm to group the data, thereby realizing dynamic community discovery. After the space-time characteristics of the network are extracted through a space-time characteristic extraction model based on a recursive convolutional neural network and a self-encoder, n characteristic vectors with t x p dimensions can be obtained, and the n data are clustered by applying a K-means algorithm.
The invention is based on the comparison between the Email-Enron data set of the SNAP project website and the K-means community discovery algorithm of the self-encoder.
An Email-Enron data set is published by William Cohen at the university of Chimmon-Enron-Endoconcha for being downloaded and used by researchers, data from 1 month and 1 day in 2000 to 12 months and 31 days in 2001 are intercepted from the Email-Enron data set in this chapter, and data of every 3 months is used as a time slice to obtain data of 8 time slice networks. The statistical properties of the Email-Enron dataset are shown in Table 1.
TABLE 1 statistical characteristics of the data set
Figure BDA0002371482930000101
Analysis of the number of edges added and subtracted from the selected network data shows that the change in the number of edges from the second moment in the data set compared to the previous moment is shown in fig. 5. Overall, the edges in the network from time 1 to time 8 exhibit a linearly increasing behavior, the change from time 2 to time 5 is relatively smooth, and the change from time 6 is relatively severe. In the real world, the acron company is involved in the ugly during the period from 4 months 2001 to 6 months 2001, which corresponds to the time 6, and the number of mails in the company is increased dramatically, so that the network structure at the time is evolved dramatically.
The experiment starts from a network space-time structure reconstruction strategy and a network space-time characteristic extraction model innovation point of the recurrent convolutional neural network, and the network space-time characteristic extraction model innovation point is respectively compared with a K-means clustering algorithm of a corresponding self-encoder to verify the effectiveness of the algorithm.
(1) Network spatio-temporal structure reconstruction results and analysis
Matrix reconstruction is sequentially carried out on the data set of the Email-Enron real network at eight moments through a network space structure reconstruction strategy, and matrixes before and after network reconstruction at eight moments of the dynamic network Email-Enron are shown in figure 6.
From the matrix diagrams before and after reconstruction in fig. 6t1, the network has 713 edges in the Email-Enron data set t1, the community nodes of the adjacent matrix before reconstruction are very sparse and dispersed, and the community structure cannot be identified.
As time goes by, the number of nodes and connections of the adjacent matrix increases, and the sparsity of the matrix decreases, but the node aggregation continues to become more distributed.
On the network of eight moments of an Email-Enron data set, the adjacent matrix community nodes and the connection t +1 moments before reconstruction are more than the t moments, although the matrix is still very sparse and dispersed and the community structure cannot be identified, as time goes on, the nodes and edges in the network increase, the coloring points in the matrix gradually increase, the number of connections in the representative matrix increases, and the learnable characteristic information also increases; as can be seen from fig. 6, compared with the adjacency matrix, although the reconstructed adjacency matrix still has an aggregation effect on the nodes in each community, the aggregation effect is becoming dispersed over time, and the community structure gradually becomes blurred, so that the reconstruction strategy proposed herein is effective, but the aggregation effect of the reconstruction strategy is reduced over time.
Because the matrix is formed by all nodes on 8 time slices, the matrix becomes more sparse than a network matrix at a single moment, and the opinion leader is selected according to the network on the 1 st time slice, the influence of the opinion leader actually changes and even migrates along with the time, so that the reconstruction strategy of the nearest neighbor nodes around the opinion leader shows the phenomenon that the aggregation effect of the reconstruction strategy is weakened along with the time.
Aiming at the weakening of the aggregation effect of a matrix reconstruction strategy, grouping is adopted to carry out matrix reconstruction, 8 time slice networks are divided into 2 groups, namely, a matrix is constructed on the first 4 time slices, a matrix is constructed on the last 4 time slices, and network preprocessing work of network opinion leader selection and network space structure reconstruction is carried out.
(2) DRAER algorithm performance verification experiment
As can be seen from fig. 7, the modularity of the drer algorithm proposed in this chapter is lower than that of the reference algorithm AE at time t1 and time t 2. The method mainly comprises the following steps that an RNN is used for learning time-dependent characteristics of a network, the number of initial network information nodes at 2 moments is small, the number of connections is small, the information of the network at a single moment is relatively insufficient, more importantly, only 1 time sequence network at the dependent moment is needed, the number of time sequence networks is insufficient, the input information is insufficient, so that the DRAER method cannot learn the time characteristics of the time sequence networks, and therefore the modularity on a t1 time slice and a t2 time slice is low, and the learning effect of the DRAER method is insufficient in the initial stage.
the modularity of the DRAER method is higher than that of the AE algorithm in 6 time slices from the time t3 to the time t 8. Since there are already 2 groups of networks at the time of front-back dependency from time t3, and the RNN introduced by the drar can learn the evolution law of the time-series dynamic network from these 2 groups, the accuracy of the algorithm starts to be stable and higher than that of the AE algorithm from this time.
In summary, as can be seen from fig. 7, experiments on a real network indicate that when the dynamic network time series network is greater than 2 groups, the dynamic community discovery algorithm DRAER based on the recurrent neural network can extract network time characteristics, effectively detect dynamic communities and improve the modularity of community discovery.
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims (5)

1. A dynamic community discovery method based on a recurrent convolutional neural network and an autoencoder is characterized in that firstly, a network space feature learning model based on the convolutional neural network is constructed, and space topological features of the network are learned to obtain network space feature vectors; secondly, fusing a network space characteristic learning model based on a convolutional neural network, taking a network space characteristic vector as the input of the model, constructing a network space-time characteristic learning model based on the recurrent neural network, the convolutional neural network and an autoencoder, and learning the space-time characteristic of the network to obtain a network space-time characteristic vector; and finally, carrying out community discovery on the basis of the network space-time characteristic vector to detect the dynamic community structure of the social network.
2. The dynamic community discovery method based on the recurrent convolutional neural network and the self-encoder as claimed in claim 1, wherein before the network space feature learning model based on the recurrent convolutional neural network is constructed and the spatial topological feature of the network is learned to obtain the network space feature vector, the specific implementation process is as follows:
1) preprocessing a data set:
1.1) selecting opinion leaders: transforming the network topological structure of the data set to obtain an adjacency matrix of the network, analyzing the network topological structure based on the adjacency matrix, and finding out the most influential opinion leader node in the group;
constructing an adjacency matrix A according to whether connection exists between nodes in a network space structure of the data set, wherein E is a collection of edges connected between the nodes in the network space structure; if there is a connection e between node i and node ji,jIf so, then the variable ai,jIs 1; if there is no connection e between node i and node ji,jIf (a), the variable ai,jFor 0, the adjacency matrix transformation method is shown in formula (1):
Figure FDA0002371482920000011
wherein
Figure FDA0002371482920000012
Constructing a state transition probability matrix C between nodes according to the adjacency matrix A; if the number of the nodes connected with other nodes is more, the transition probability of each connection is lower; if the number of the nodes connected with other nodes is less, the transition probability of each connection is higher; that is, if node i is connected to node j only and is not connected to other nodes, node j is an important node for node i, so transition probability ci,jThe larger the value of (C), the more the transition probability matrix C is calculated in equation (2):
Figure FDA0002371482920000013
wherein
Figure FDA0002371482920000014
In the initial stage, the influence score of each node is set to be 1, and a node influence score matrix is defined as an S matrix and initialization thereof are shown in a formula (3); then, calculating a limit matrix of the node influence score matrix according to the transition probability matrix C, wherein the node influence score limit matrix is S*,S*See formula (4); finally, according to the node influence score limiting matrix, finding out the node i with the highest influence score, namely the opinion leader node i _ leader, and calculating the node i with the highest influence score according to a formula (5):
S=[s1s2… sn]wherein the initial value si=1 (3)
Figure FDA0002371482920000021
Figure FDA0002371482920000022
1.2) selecting adjacent nodes: after finding out the opinion leader nodes, finding out nodes which are highly adjacent to the opinion leader so as to reconstruct a network space structure of the data set according to the indirect connection proximity among the nodes;
when the found opinion node i is found, calculating the non-direct connection proximity between the node i and the node j by adopting an Euclidean distance r (i, j), wherein a calculation formula is shown in (6); after the distances between the node i and other nodes are calculated and compared, the node j with the shortest distance to the node i is obtained, namely the node j _ neighbor closest to the opinion leader node i _ leader is obtained, and the calculation formula is shown in (7);
Figure FDA0002371482920000023
wherein d (i, j, k) ═ xi,k-xj,k
Figure FDA0002371482920000024
Wherein j ≠ i (7)
1.3) matrix transformation: according to the found nearest neighbor node j _ neighbor to the opinion leader node i _ leader, taking the opinion leader node i _ leader as a first row of the reconstructed adjacent matrix, and taking the nearest neighbor node j _ neighbor as a second row; if the nearest neighbor node is more than one, repeating the step 1) to select the opinion leader node from the nodes of the rest untransformed column positions, and then repeating the processes of 2) -3); the calculation for reconstructing the adjacency matrix X ', X' by repeating the steps is shown in formula (8):
Figure FDA0002371482920000025
wherein
Figure FDA0002371482920000026
2) Constructing a network space feature learning model based on a convolutional neural network, learning the space topological feature of the network to obtain a network space feature vector: constructing a combined model of a convolutional neural network and an autoencoder, namely a network space feature learning model, wherein the neural network comprises an input layer with n neurons, a convolutional layer with q neurons and an output layer with n neurons; and then, dividing the reconstructed adjacent matrix X' into n data of 1 × n, and taking the n data as the input of a neural network of the network space feature learning model to obtain a network space feature vector.
3. The method for discovering dynamic communities based on the recurrent convolutional neural network and the self-encoder as claimed in claim 1, wherein the network space feature learning model based on the recurrent convolutional neural network is constructed by taking the network space feature vector as the input of the model, and the specific process of learning the network space-time feature to obtain the network space-time feature vector is as follows:
constructing a network space-time characteristic learning model based on a recurrent neural network, a convolutional neural network and an autoencoder, wherein the neural network comprises an input layer with q neurons, a convolutional layer with p neurons and an output layer with q neurons; and obtaining network space feature vectors of t time points through a network space feature learning model, and inputting the network space features of q dimensionalities of each time point as a neural network of the network space-time feature learning model to obtain the network space-time feature vectors.
4. The dynamic community discovery method based on the recursive convolutional neural network and the self-encoder as claimed in claim 1, wherein the algorithm adopted for community discovery on the basis of the network space feature vector is a K-means algorithm, and the network space-time feature vector is clustered through the K-means algorithm, so that community discovery can be realized, and the community structure can be detected.
5. The method of claim 1, applied to analysis of social networks.
CN202010056877.8A 2020-01-17 2020-01-17 Dynamic community discovery method based on recursive convolutional neural network and self-encoder Pending CN111275562A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010056877.8A CN111275562A (en) 2020-01-17 2020-01-17 Dynamic community discovery method based on recursive convolutional neural network and self-encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010056877.8A CN111275562A (en) 2020-01-17 2020-01-17 Dynamic community discovery method based on recursive convolutional neural network and self-encoder

Publications (1)

Publication Number Publication Date
CN111275562A true CN111275562A (en) 2020-06-12

Family

ID=71000758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010056877.8A Pending CN111275562A (en) 2020-01-17 2020-01-17 Dynamic community discovery method based on recursive convolutional neural network and self-encoder

Country Status (1)

Country Link
CN (1) CN111275562A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434437A (en) * 2020-12-02 2021-03-02 大连大学 Equipment guarantee hyper-network dynamic evolution model construction method considering node recombination
CN112925953A (en) * 2021-03-09 2021-06-08 南京航空航天大学 Dynamic network representation method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434437A (en) * 2020-12-02 2021-03-02 大连大学 Equipment guarantee hyper-network dynamic evolution model construction method considering node recombination
CN112434437B (en) * 2020-12-02 2023-08-25 大连大学 Method for constructing equipment support super-network dynamic evolution model by considering node recombination
CN112925953A (en) * 2021-03-09 2021-06-08 南京航空航天大学 Dynamic network representation method and system
CN112925953B (en) * 2021-03-09 2024-02-20 南京航空航天大学 Dynamic network representation method and system

Similar Documents

Publication Publication Date Title
Hassib et al. An imbalanced big data mining framework for improving optimization algorithms performance
Chen et al. Generative adversarial user model for reinforcement learning based recommendation system
CN110119467B (en) Project recommendation method, device, equipment and storage medium based on session
CN109960759B (en) Recommendation system click rate prediction method based on deep neural network
Valdez et al. Modular neural networks architecture optimization with a new nature inspired method using a fuzzy combination of particle swarm optimization and genetic algorithms
CN111881363B (en) Recommendation method based on graph interaction network
Costa et al. Coevolution of generative adversarial networks
CN110263236B (en) Social network user multi-label classification method based on dynamic multi-view learning model
US20220253722A1 (en) Recommendation system with adaptive thresholds for neighborhood selection
CN114036406A (en) Recommendation method and system based on graph contrast learning and social network enhancement
CN109933720B (en) Dynamic recommendation method based on user interest adaptive evolution
Unger Latent context-aware recommender systems
CN111292197A (en) Community discovery method based on convolutional neural network and self-encoder
CN111275562A (en) Dynamic community discovery method based on recursive convolutional neural network and self-encoder
Tian et al. Genetic algorithm based deep learning model selection for visual data classification
CN111309923A (en) Object vector determination method, model training method, device, equipment and storage medium
WO2022166125A1 (en) Recommendation system with adaptive weighted baysian personalized ranking loss
Kulluk A novel hybrid algorithm combining hunting search with harmony search algorithm for training neural networks
Prasad et al. Collaborative fuzzy rule learning for Mamdani type fuzzy inference system with mapping of cluster centers
Yang et al. An academic social network friend recommendation algorithm based on decision tree
CN109697511B (en) Data reasoning method and device and computer equipment
CN107402984B (en) A kind of classification method and device based on theme
Wang et al. Multi‐feedback Pairwise Ranking via Adversarial Training for Recommender
Xue et al. An improved extreme learning machine based on variable-length particle swarm optimization
Alpaydin Multiple neural networks and weighted voting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200612