CN113658012A - Community discovery method based on deep network representation learning - Google Patents

Community discovery method based on deep network representation learning Download PDF

Info

Publication number
CN113658012A
CN113658012A CN202110703377.3A CN202110703377A CN113658012A CN 113658012 A CN113658012 A CN 113658012A CN 202110703377 A CN202110703377 A CN 202110703377A CN 113658012 A CN113658012 A CN 113658012A
Authority
CN
China
Prior art keywords
network
community
node
matrix
community structure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110703377.3A
Other languages
Chinese (zh)
Inventor
潘雨
潘志松
胡谷雨
王帅辉
邹军华
刘鑫
黎维
陶蔚
周星宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Army Engineering University of PLA
Original Assignee
Army Engineering University of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Army Engineering University of PLA filed Critical Army Engineering University of PLA
Priority to CN202110703377.3A priority Critical patent/CN113658012A/en
Publication of CN113658012A publication Critical patent/CN113658012A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • G06F17/13Differential equations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Business, Economics & Management (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)

Abstract

A community discovery method based on deep network representation learning relates to the technical field of mapping segmentation problems. Modeling a network into a graph; constructing a community structure matrix; obtaining a network node representation vector; and running a K-mean strategy on the obtained low-dimensional representation of the network to obtain a final network community structure. The method provided by the invention successfully captures the nonlinear structure of the network by using the deep neural network, learns more accurate and rich node representation, and lays a solid foundation for subsequent community discovery. The task of accurately excavating the community structure in a large-scale, sparse and high-dimensional network is realized.

Description

Community discovery method based on deep network representation learning
Technical Field
The invention relates to the technical field of mapping segmentation problems, in particular to a community discovery method based on deep network representation learning.
Background
The community structure is an important structural feature widely existing in a network, the connection among nodes in the communities is tight, and the connection among the nodes in the communities is sparse. Community discovery is the process of mining the structure of communities hidden in network data from a mesoscopic perspective by analyzing the interactions and potential information between nodes in the network. The community discovery provides an effective tool for exploring the potential characteristics of the complex network, and has important theoretical and practical significance for understanding the network organization structure, analyzing the potential characteristics of the network, discovering the network hiding rule, the interaction mode and the like.
In recent years, with the development of networks and the influx of social media, complex networks have gradually appeared to be large-scale, sparse, and high-dimensional. The conventional topology-based community discovery algorithm has the problems of high computational complexity, low parallelism, incapability of expanding to a large-scale network and incapability of processing sparse data. Therefore, it is imperative to design an extensible community discovery algorithm for large-scale, sparse and high-dimensional networks.
The traditional community discovery algorithm is carried out on an adjacency matrix based on topological representation, and the problems of high computational complexity, incapability of parallel computation, incapability of mining a network nonlinear structure and the like exist. It is not suitable for large-scale, sparse and high-dimensional networks.
Disclosure of Invention
The invention aims to provide a community discovery method based on deep network representation learning, which can realize accurate mining of community structures under large-scale, sparse and high-dimensional networks.
A community discovery method based on deep network representation learning comprises the following steps:
the first step is as follows: modeling a network into a graph;
the second step is that: constructing a community structure matrix;
the third step: obtaining a network node representation vector;
the fourth step: and running a K-mean strategy on the obtained low-dimensional representation of the network to obtain a final network community structure.
The method for constructing the community structure matrix specifically comprises the following steps:
firstly, designing a function R to measure the similarity between community members; then, based on similarity measurement, a Skip-gram model based on negative sampling is adopted to further explore a community structure of a network bottom layer; finally, obtaining a matrix X capable of capturing a network potential community structure;
firstly, designing a function R to measure the similarity between community members; introducing a community relation indication matrix H belonging to Rn×kEach row H of the matrix HiRepresenting the degree of membership of the corresponding node to each community,
Figure BDA0003130283840000021
representing a node viAnd vjHas a probability of an edge existing therebetween, and
Figure BDA0003130283840000022
designing the following node similarity function R to measure the similarity of two nodes belonging to the same community:
Figure BDA0003130283840000023
where σ (·) is a sigmoid function, such that R (i, j) is ∈ [0, 1);
adopting a Skip-gram model based on negative sampling to any two nodes viAnd vjThere is the formula:
Figure BDA0003130283840000031
wherein, k is the number of negative samples; selecting negative samples according to the degree of the node, randomly sampled node samples v in the networknObedience distribution
Figure BDA0003130283840000032
Wherein d isiIs node viThe degree of (a) is greater than (b),
Figure BDA0003130283840000033
D=∑idiis the sum of all the node degrees in the network, equation (6) is rewritten as:
Figure BDA0003130283840000034
then, by
Figure BDA0003130283840000035
Calculating a partial derivative to optimize equation (7):
Figure BDA0003130283840000036
thereby obtaining
Figure BDA0003130283840000037
Comprises the following steps:
Figure BDA0003130283840000038
obtaining a weighting matrix X belonging to R and storing the information of the potential community of the networkn×nElement X of matrix XijComprises the following steps:
Figure BDA0003130283840000039
the weights of the elements in the matrix X are the weights between the edges influenced by the community structure between the nodes, so that the structural proximity between the nodes is quantized, and the potential community structure of the network is reflected.
The specific process of obtaining the network node expression vector comprises the following steps:
the obtained community matrix X is used as the input of a depth self-encoder, the low-dimensional vector representation of the network is obtained, and the community structure of the network is captured, so that the nodes belonging to the same community are ensured to be close to each other in an embedding space;
each row of matrix X is the input to the depth autoencoder, and the loss function is as follows:
Figure BDA0003130283840000041
wherein
Figure BDA0003130283840000042
By training the auto-encoder to minimize reconstruction errors, the similarity between the input vectors in the embedding space can be preserved. Minimizing the loss function of input and output can maximally reserve the characteristics of input data, namely the potential community structure of the network, in a hidden layer. The node output from the last layer of the hidden layer represents the characteristics of the input community structure matrix X which are stored to the maximum extent, and the clear and accurate community structure can be obtained by applying the node to the subsequent community discovery algorithm.
The invention provides a community discovery method based on network representation learning, aiming at the community structure discovery problem of large-scale, sparse and high-dimensional networks. The method utilizes the deep neural network to realize accurate excavation of the community structure in the excavation network. The strong interaction and complex dependency relationship between nodes in a real network result in high nonlinearity of the network structure, and the interaction between different features is often nonlinear. For the nonlinear relation, the deep neural network has strong representation and generalization capability. Recently, deep learning has enjoyed great success in many applications, such as image classification, speech recognition and natural language processing. The self-encoder is an unsupervised deep feature learning model and has good performance in data reduction and feature extraction. Thus, the self-encoder is employed herein to capture complex non-linear relationships in the network.
The node representation of the network is learned by using a depth model, and then the communities are found in an embedding space, so that the method can greatly keep high-efficiency performance and calculation speed, has portability and strong learning characteristic capability, and is more elastic to the problem of network sparsity. Deep learning is applied to community discovery, and the community discovery problem in a large-scale sparse network is successfully solved. Compared with the traditional community discovery algorithm, the method provided by the invention successfully captures the nonlinear structure of the network by using the deep neural network, learns more accurate and rich node representation and lays a solid foundation for subsequent community discovery. The task of accurately excavating the community structure in a large-scale, sparse and high-dimensional network is realized.
Detailed Description
The goal of the self-encoder is to reconstruct the original input so that the output is as close as possible to the input. In this way, the output of the hidden layer can be viewed as a low-dimensional representation of the original data, thereby maximizing the extraction of the features contained in the original data. The self-encoder comprises two symmetrical components: an encoder and a decoder. A basic self-encoder can be seen as a three-layer neural network, consisting of an input layer, a hidden layer and an output layer.
Given an input data xiThe encoder converts the original data xiOutput coding h mapped as hidden layeri,hiCan be regarded as xiRepresents the low-dimensional embedding of:
hi=σ(W(1)xi+b(1)) (1)
the decoder then reconstructs the input data,
Figure BDA0003130283840000051
is the reconstructed output data:
Figure BDA0003130283840000052
the input data is encoded and decoded to obtain a reconstructed representation of the input data. Wherein θ ═ W(1),W(2),b(1),b(2)Is a parameter set, W(1),W(2)Weight matrices for the encoder and decoder, respectively, b(1),b(2)The bias vectors for the encoder and decoder, respectively. σ (-) is a non-linear activation function, e.g. sigmoid function
Figure BDA0003130283840000061
tan h function
Figure BDA0003130283840000062
And the like. The self-encoder derives a characterization of the input data by minimizing the error between the input data and the reconstructed data:
Figure BDA0003130283840000063
in order to capture the high nonlinearity of the topological structure and the node attribute, the section combines a plurality of nonlinear functions into an encoder and a decoder, and learns the characteristics of different abstraction levels by carrying out multi-layer abstraction learning on data.
Figure BDA0003130283840000064
Wherein K represents the number of hidden layers,
Figure BDA0003130283840000065
is a low-dimensional representation of the feature of node i.
The detailed process of the community discovery method based on network representation learning of the invention can be described as follows:
the first step is as follows: modeling a network into a graph;
the second step is that: constructing a community structure matrix;
according to the probabilistic generative model, the formation of edges between nodes is influenced by the structure of potential communities in the network. If two nodes are connected by an edge, they are likely to belong to the same community. Thus, potential community structures in a network can be mined by maximizing the probability that an edge exists between two nodes. Based on this, the node quantifies the structural proximity of the nodes according to the potential community member similarity of the nodes, thereby constructing a community structure matrix X.
First, we design a function R to measure the similarity between community members. And then, based on the similarity measurement, adopting a Skip-gram model based on negative sampling to further explore the community structure of the network bottom layer. And finally obtaining a matrix X capable of capturing the potential community structure of the network.
First, a function R is designed to measure the similarity between community members. Introducing a community relation indication matrix H belonging to Rn×kEach row H of the matrix HiRepresenting the degree of membership of the corresponding node to each community,
Figure BDA0003130283840000071
representing a node viAnd vjHas a probability of an edge existing therebetween, and
Figure BDA0003130283840000072
therefore, the following node similarity function R is designed to measure the similarity of two nodes belonging to the same community:
Figure BDA0003130283840000073
where σ (·) is a sigmoid function, such that R (i, j) E [0, 1). Because the functions R (i, j) and
Figure BDA0003130283840000074
linear relationship, so the main discussion
Figure BDA0003130283840000075
And (4) finishing.
According to the probability generation model, if the probability of edges existing between two nodes is larger, namely R (i, j) is larger, then the two nodes belong to a social societyThe probability of a clique is large. Thus, for two nodes v connected by an edge in the networkiAnd vjBy maximizing
Figure BDA0003130283840000076
To capture the potential community structure of the network. At the same time, we minimize for two nodes randomly selected in the network
Figure BDA0003130283840000077
This is because the network is generally sparse, and most nodes have no connection therebetween, so that for two nodes randomly selected in the network, the probability of having an edge therebetween is low, and the probability of belonging to a community is also low. Based on the above, the chapter adopts a Skip-gram model based on negative sampling, and any two nodes v are subjected toiAnd vjThere is the formula:
Figure BDA0003130283840000078
where κ is the number of negative samples. In this chapter, negative samples are selected according to the degree of nodes, and node samples v sampled randomly in the networknCompliance
Figure BDA0003130283840000079
Wherein d isiIs node viDegree of (d)i=∑jaij,D=∑idiIs the sum of all the node degrees in the network, equation (6) is rewritten as:
Figure BDA00031302838400000710
then, by
Figure BDA0003130283840000081
Calculating a partial derivative to optimize equation (7):
Figure BDA0003130283840000082
thereby obtaining
Figure BDA0003130283840000083
Comprises the following steps:
Figure BDA0003130283840000084
in summary, we obtain the weighting matrix X belonging to R and storing the information of the potential community of the networkn×nElement X of matrix XijComprises the following steps:
Figure BDA0003130283840000085
the weights of the elements in the matrix X are the weights between the edges influenced by the community structure between the nodes, so that the structural proximity between the nodes is quantized, and the potential community structure of the network is reflected.
The third step: obtaining a network node representation vector;
and taking the obtained community matrix X as the input of a depth self-encoder, obtaining the low-dimensional vector representation of the network, and capturing the community structure of the network, thereby ensuring that the nodes belonging to the same community are close to each other in the embedding space.
Each row of matrix X is the input to the depth autoencoder, and the loss function is as follows:
Figure BDA0003130283840000086
by training the auto-encoder to minimize reconstruction errors, the similarity between the input vectors in the embedding space can be preserved. Minimizing the loss function of input and output can maximally reserve the characteristics of input data, namely the potential community structure of the network, in a hidden layer. The node output from the last layer of the hidden layer represents the characteristics of the input community structure matrix X which are stored to the maximum extent, and the clear and accurate community structure can be obtained by applying the node to the subsequent community discovery algorithm.
The fourth step: and running a K-mean strategy on the obtained low-dimensional representation of the network to obtain a final network community structure.
The present invention will be described in further detail with reference to the following experiments. The effectiveness of the algorithm is evaluated by generating 5 power-law networks of different scales by using a Lancitinetti-Fortunato-Radichi (LFR) network reference proposed by Lancitinetti et al. As shown in table 1, the network size is on an increasing trend from Lnetwork1 to Lnetwork 5.
TABLE 1 statistical information of LFR Artificial data set
Figure BDA0003130283840000091
The DNCE of the invention achieves the best community discovery performance on 5 artificial data sets. Compared with the traditional community discovery method, the method has the advantages that on the large-scale data set, the nonlinear structure of the network can be better captured, and the method has stronger learning characteristic capacity, so that an excellent community structure division result is obtained.

Claims (3)

1. A community discovery method based on deep network representation learning is characterized by comprising the following steps:
the first step is as follows: modeling a network into a graph;
the second step is that: constructing a community structure matrix;
the third step: obtaining a network node representation vector;
the fourth step: and running a K-mean strategy on the obtained low-dimensional representation of the network to obtain a final network community structure.
2. The method of claim 1, wherein the constructing the community structure matrix specifically comprises the following steps:
firstly, designing a function R to measure the similarity between community members; then, based on similarity measurement, a Skip-gram model based on negative sampling is adopted to further explore a community structure of a network bottom layer; finally, obtaining a matrix X capable of capturing a network potential community structure;
firstly, designing a function R to measure the similarity between community members; introducing a community relation indication matrix H belonging to Rn×kEach row H of the matrix HiRepresenting the degree of membership of the corresponding node to each community,
Figure FDA0003130283830000011
representing a node viAnd vjHas a probability of an edge existing therebetween, and
Figure FDA0003130283830000012
designing the following node similarity function R to measure the similarity of two nodes belonging to the same community:
Figure FDA0003130283830000013
where σ (·) is a sigmoid function, such that R (i, j) is ∈ [0, 1);
adopting a Skip-gram model based on negative sampling to any two nodes viAnd vjThere is the formula:
Figure FDA0003130283830000021
wherein, k is the number of negative samples; selecting negative samples according to the degree of the node, randomly sampled node samples v in the networknObedience distribution
Figure FDA0003130283830000022
Wherein d isiIs node viDegree of (d)i=∑jaij,D=∑idiIs the sum of all the node degrees in the network, equation (6) is rewritten as:
Figure FDA0003130283830000023
then, by
Figure FDA0003130283830000024
Calculating a partial derivative to optimize equation (7):
Figure FDA0003130283830000025
thereby obtaining
Figure FDA0003130283830000026
Comprises the following steps:
Figure FDA0003130283830000027
obtaining a weighting matrix X belonging to R and storing the information of the potential community of the networkn×nElement X of matrix XijComprises the following steps:
Figure FDA0003130283830000028
the weights of the elements in the matrix X are the weights between the edges influenced by the community structure between the nodes, so that the structural proximity between the nodes is quantized, and the potential community structure of the network is reflected.
3. The method of claim 2, wherein the obtaining of the network node representation vector comprises:
the obtained community matrix X is used as the input of a depth self-encoder, the low-dimensional vector representation of the network is obtained, and the community structure of the network is captured, so that the nodes belonging to the same community are ensured to be close to each other in an embedding space;
each row of matrix X is the input to the depth autoencoder, and the loss function is as follows:
Figure FDA0003130283830000031
wherein
Figure FDA0003130283830000032
The reconstruction error is minimized by training the automatic encoder, the similarity between input vectors in an embedding space is kept, the loss function of input and output is minimized, and the characteristics of input data, namely the potential community structure of the network, can be kept to the maximum degree in a hidden layer. The node output from the last layer of the hidden layer represents the characteristics of the input community structure matrix X which are stored to the maximum extent, and the clear and accurate community structure can be obtained by applying the node to the subsequent community discovery algorithm.
CN202110703377.3A 2021-06-24 2021-06-24 Community discovery method based on deep network representation learning Pending CN113658012A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110703377.3A CN113658012A (en) 2021-06-24 2021-06-24 Community discovery method based on deep network representation learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110703377.3A CN113658012A (en) 2021-06-24 2021-06-24 Community discovery method based on deep network representation learning

Publications (1)

Publication Number Publication Date
CN113658012A true CN113658012A (en) 2021-11-16

Family

ID=78489013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110703377.3A Pending CN113658012A (en) 2021-06-24 2021-06-24 Community discovery method based on deep network representation learning

Country Status (1)

Country Link
CN (1) CN113658012A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115375502A (en) * 2022-08-16 2022-11-22 中国人民解放军海军指挥学院 Intelligent overlapped community mining method and system based on dual-scale graph wavelet neural network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115375502A (en) * 2022-08-16 2022-11-22 中国人民解放军海军指挥学院 Intelligent overlapped community mining method and system based on dual-scale graph wavelet neural network

Similar Documents

Publication Publication Date Title
Popat et al. Cluster-based probability model and its application to image and texture processing
CN107240136B (en) Static image compression method based on deep learning model
CN112464004A (en) Multi-view depth generation image clustering method
Miok et al. Generating data using Monte Carlo dropout
CN115688982A (en) Building photovoltaic data completion method based on WGAN and whale optimization algorithm
Lin et al. A deep clustering algorithm based on gaussian mixture model
CN111461348A (en) Deep network embedded learning method based on graph core
CN113658012A (en) Community discovery method based on deep network representation learning
CN114841296A (en) Device clustering method, terminal device and storage medium
de Castro et al. BAIS: A Bayesian Artificial Immune System for the effective handling of building blocks
CN117056763A (en) Community discovery method based on variogram embedding
Sorwar et al. DCT based texture classification using soft computing approach
CN115587626A (en) Heterogeneous graph neural network attribute completion method
CN113409159A (en) Deep community discovery method fusing node attributes
CN111767825A (en) Face attribute invariant robustness face recognition method and system
CN114610950B (en) Graph network node representation method
Yue Deep learning based image semantic feature analysis and image classification techniques and models
de Ridder et al. The adaptive subspace map for texture segmentation
CN113887591B (en) Multi-view clustering method based on double-layer weighted joint decomposition
Duan et al. Sparsity Regularization Model Based on Network Structure
Ullah et al. Time and memory efficient 3d point cloud classification
Cheng et al. Image color reduction based on self-organizing maps and growing self-organizing neural networks
Liu et al. Partial Mixture-of-Experts Similarity Variational Autoencoder for Clustering on Single Cell Data
Shao et al. Study on the construction of gene regulatory network based on non-homogeneous dynamic Bayesian network
CN116977681A (en) Data clustering method and system based on data diversity enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211116

RJ01 Rejection of invention patent application after publication