CN113378075A - Community discovery method for adaptively fusing network topology and node content - Google Patents

Community discovery method for adaptively fusing network topology and node content Download PDF

Info

Publication number
CN113378075A
CN113378075A CN202110698553.9A CN202110698553A CN113378075A CN 113378075 A CN113378075 A CN 113378075A CN 202110698553 A CN202110698553 A CN 202110698553A CN 113378075 A CN113378075 A CN 113378075A
Authority
CN
China
Prior art keywords
node
network topology
matrix
content
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110698553.9A
Other languages
Chinese (zh)
Inventor
曹金鑫
贾焱鑫
许伟忠
陈翔
丁卫平
张晓峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong University
Original Assignee
Nantong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong University filed Critical Nantong University
Priority to CN202110698553.9A priority Critical patent/CN113378075A/en
Publication of CN113378075A publication Critical patent/CN113378075A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a community discovery method for adaptively fusing network topology and node content, which belongs to the technical field of complex network analysis, and is characterized in that a social network data centralized community structure is mined, the network topology and the node content are respectively modeled by using node community membership and graph regular items, and then adaptive factors are introduced to fuse the network topology and the node content so as to construct an adaptive community detection model based on graph regularization; and finally, normalizing the effectiveness of the mutual information entropy evaluation model by using an evaluation algorithm. The invention has the beneficial effects that: network topology and node content are integrated by utilizing an automatic encoder and graph regularization, and on the other hand, mismatching of the network topology and the node content in a community discovery process is relieved by introducing a self-adaptive factor.

Description

Community discovery method for adaptively fusing network topology and node content
Technical Field
The invention relates to the technical field of complex network analysis, in particular to a community discovery method for adaptively fusing network topology and node content.
Background
In real social network life, there are a large number of networked data sets, such as social networks, communication networks, etc., which can often be formatted as complex networks. For the analysis of complex networks, it is very important to find communities consisting of densely connected nodes. In general, when analyzing social networks, community discovery may help find groups of users with similar interests and purposes, and predict the behavior of users belonging to the community. The complex network contains both network topology and rich content information that can assist the network topology to improve the ability of community discovery. However, in a real network scenario, there is a mismatch between the network topology and the structure of the community described by the node content information. Therefore, the fused node content information and the network topology have a contradictory effect in the community discovery process, and the result of low community discovery execution capacity of the fused network topology and the node content occurs. How to effectively mine the community structure with unmatched content networks, network topologies and node contents, a new community discovery method is urgently needed.
Disclosure of Invention
The invention provides a community discovery method for adaptively fusing network topology and node content, which is mainly provided for the technical problems of mismatching between the network topology and the node content and insufficient representation capability of a community in community discovery; the model constructed based on the automatic encoder framework utilizes the neural network to learn nonlinear representation so as to improve the capability of representing the community structure by the community membership degree, and has good application value for discovering theory expansion and node content communities fusing network topology and node content communities.
The idea of the invention is as follows: firstly, obtaining data such as links among nodes describing network topology and text characteristics on the nodes describing node content information; then, performing matrixing processing on the network topology and the node content respectively, and performing module degree conversion and similarity conversion respectively; then, based on the similarity of an automatic encoder and a non-negative matrix factorization theory and the assumption that the contents of the nodes belonging to the same community have similarity by utilizing graph regularization description, introducing a self-adaptive factor balance topology and a content fusion proportion, and constructing a community discovery model of self-adaptive fusion network topology and node contents; and finally, deducing model parameters through model optimization, clustering the model parameters, and calculating the similarity degree of a clustering result and an original social group structure by using an evaluation algorithm to evaluate the performance of the model.
The technical scheme adopted by the invention is as follows: a community discovery method for adaptively fusing network topology and node content comprises the following steps:
s1, the complex network data with content information is denoted as G ═ (V, E, Q), where V ═ V1,v2,…,vnDenotes a set of nodes, E ═ E1,e2,…,emRepresenting a set of edges, Q representing a set of feature vectors of node contents;
s2, formally constructing network topology and node content information;
s3, respectively constructing a network topology and node contents by using the node community membership and the graph regular terms, respectively corresponding to two sub-models in the model of the method, and constructing an adaptive factor based on the mismatching of the network topology and the node contents;
s4, combining the two sub-models and the self-adaptive factor in the step S3 into a model under a unified framework, verifying the model on a data set, and evaluating the effectiveness of the unified model by using the normalized mutual information entropy as an evaluation algorithm.
The invention provides a further optimization scheme of the community discovery method for adaptively fusing network topology and node content, wherein the step S2 specifically comprises the following steps:
s2.1, formally constructing a network topology, wherein the specific process is as follows: constructing a adjacency matrix of G
Figure BDA0003129488620000021
Wherein a isij1 denotes a node viAnd vjBetween which there is an edge, aijWhen 0 denotes no edge, then a module matrix is constructed based on a
Figure BDA0003129488620000022
Wherein, bijRepresenting a node viAnd vjStrength of the link between, kiRepresenting a node viM represents the total number of edges, kikj2m represents the number of expected edges between two nodes;
s2.2, formally constructing node contents, wherein the specific process is as follows: constructing a feature matrix for Q
Figure BDA0003129488620000023
Each row of the matrix Q represents the content on one node in the form of an r-dimensional feature vector, and then a content similarity matrix between the nodes is constructed based on Q
Figure BDA0003129488620000024
Wherein u isijRepresenting a node viAnd vjCosine similarity of the feature vectors of (1);
the invention provides a further optimization scheme of the community discovery method for adaptively fusing network topology and node content, wherein the step S3 specifically comprises the following steps:
and S3.1, constructing a first sub-model based on the topology information. Using an autoencoder block to achieve modularity maximization tr (H) based on autoencoder and non-Negative Matrix Factorization (NMF) theoretical similarityl TBHl) The sub-model parameter is a network topology characterization matrix in the hidden layer of the automatic encoder is HlAnd B is the data of the automatic encoder reconstruction B, so that the objective function of the first sub-model is obtained as follows:
Figure BDA0003129488620000025
s3.2, constructing a second sub-model based on the node content according to the similarity matrix U and the graph regular term tr (H)c TLHc) The sub-model parameters are node contentsCharacterization matrix HcWherein the laplace matrix L ═ D-U, D ═ diag (D)1,d2…, dn) is a diagonal matrix, where diIs the sum of the elements of each row of the similarity matrix U, the objective function of the second submodel is as follows:
Figure BDA0003129488620000026
s3.3, constructing a network topology and a node content mismatch to construct an adaptive factor, wherein the adaptive factor is H based on the network topology characterization matrix designed in the steps S3.1 and S3.2lAnd a node content characterization matrix HcUsing a mapping H between matricesc≈HlP obtains a mapping matrix P representing the relationship between the network topology and the node content, and utilizes a characterization matrix H based on a generation framelAnd HcFitting P describing the matching degree and carrying out normalization processing to describe the matching degree between the network topology and the node content:
Figure BDA0003129488620000031
based on the matching degree function, constructing a self-adaptive factor fusing network topology and content information:
Figure BDA0003129488620000032
where arctan (×) is an arctangent function, and κ represents the community number of the network G.
The invention provides a further optimization scheme of the community discovery method for adaptively fusing network topology and node content, wherein the step S4 specifically comprises the following steps:
the first sub-model O (H) based on the network topology in the step S3l) And a second sub-model O (H) based on the node contentc) Organically integrating, utilizing self-adaptive factors constructed based on network topology and node content mismatching to adjust the proportion of different sub-models, and then, based on Hc≈HlP is constructed as a unified model, setting HlFor H, the adaptation factor changes to:
Figure BDA0003129488620000033
wherein
Figure BDA0003129488620000034
Taking the matrixes H and P as model parameters of a final unified model, wherein the target function of the unified model is as follows:
Figure BDA0003129488620000035
and finally, obtaining a parameter matrix H, namely a community membership matrix of the nodes, and clustering based on the community membership matrix H to detect the communities.
Compared with the prior art, the invention has the beneficial effects that: according to the method, on one hand, the network topology and the node content are fused by utilizing an automatic encoder and graph regular, on the other hand, the sensitivity of the community discovery method to topology and content mismatching is relieved, and the two kinds of information are fused in a self-adaptive manner; meanwhile, the capability of representing communities of community membership is enhanced based on the neural network structure of the automatic encoder, and the quality of the discovery of the fusion topology and the content communities is further improved; when the network topology is matched with the node content, the execution force of community discovery is improved; when the network topology is not matched with the node content, the contradictory effects of the node content and the network topology are relieved, and the execution force of community discovery can be still improved. Meanwhile, the community membership obtained by the new method is required to have better representation community capability.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
Fig. 1 is a schematic overall flow chart of the community discovery method for adaptively fusing network topology and node content according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. Of course, the specific embodiments described herein are merely illustrative of the invention and are not intended to be limiting.
Example 1
Referring to fig. 1, table 1 and table 2, the present invention provides a social group discovery method for adaptively fusing network topology and node content,
(1) and obtaining community information using the data set. Community information is acquired using the common data set, the community information G ═ V, E, Q: v is a point { V1,v2,…,vnSet of E ═ E1,e2,…,emThe experimental data set of the community discovery method for adaptively fusing network topology and node content is shown in table 1:
table 1 experimental data information
Figure BDA0003129488620000041
Wherein Citeser is a citation network, which is composed of 3312 scientific publications in 6 sub-domains and relates to 4732 citation relationships, and WebKB network is composed of 4 sub-networks, which are respectively web page (with 1703-dimensional binary word attributes) datasets collected from Cornell, Texas, Washington, and Wisconsin university, each sub-network containing 5 communities.
(2) The steps for acquiring the topology information and the content of the node according to the step (1) are as follows:
constructing a adjacency matrix of G according to the data in the table 1
Figure BDA0003129488620000042
Wherein a isij1 denotes a node viAnd vjBetween which there is an edge, aijWhen 0 denotes no edge, then a module matrix is constructed based on a
Figure BDA0003129488620000043
Then, for the node content, a feature matrix of Q is constructed
Figure BDA0003129488620000044
Each row of the matrix Q represents the content on one node and is in the form of an r-dimensional characteristic vector, and then a content similarity matrix between the nodes is constructed based on Q
Figure BDA0003129488620000045
(3) Constructing a topology first model based on network topology and a second sub-model based on node content, and introducing a self-adaptive factor based on the matching degree of the topology and the content to reduce the sensitivity of a fusion model to the mismatching of the two kinds of information; unifying the two sub models into a final model by using the self-adaptive factor, and obtaining a target function as follows:
Figure BDA0003129488620000046
the adaptation factor is formalized as:
Figure BDA0003129488620000051
wherein
Figure BDA0003129488620000052
Wherein arctan (—) is an arctangent function, κ represents the community number of the network G,
Figure BDA0003129488620000053
representing community membership matrix representing a representative community, mapping matrix
Figure BDA0003129488620000054
Display netThe relationship between the network topology and the node contents.
(4) And continuously iterating and updating H and P based on a gradient descent method of an automatic encoder framework until the H and P are converged to obtain a characterization matrix H, and finally obtaining community affiliation of all nodes.
(5) The method comprises the following steps of using normalized mutual information entropy (NMI) as an evaluation index of a model, normalizing mutual information to be between [0 and 1] through the normalized mutual information entropy, based on a confusion matrix C, enabling each list of predicted values of the matrix to be in a predicted value, enabling each row to represent an actual category, and generally presenting a visualization effect of algorithm performance, wherein a specific expression is as follows:
Figure BDA0003129488620000055
(6) the experiment tests the proposed model on five public data sets, and the experiment result of the community discovery method for adaptively fusing network topology and node content is shown in table 2:
TABLE 2 test results based on common data set
Figure BDA0003129488620000056
From the analysis of table 2, it can be seen that the method provided by the present invention performs model validity verification on 5 common data sets with content information and topology information, and calculates the mean value of the method based on standard mutual information entropy (NMI) on different data sets. On the data sets Citeseer, Texas and Wisconsin, based on NMI calculation, the similarity of the community division obtained by the method and the real community division of the data set is respectively 35.11, 39.40 and 36.67, and both exceed the average level 31.39. On the data Wisconsin, the community structure and the real community division of the data set obtained by the method are closer to 40. Although the method only appears to be 18.93 on a data set Cornell, in general, the method provided by the invention adaptively fuses network topology and content information and shows good community detection performance on a real data set.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (4)

1. A community discovery method for adaptively fusing network topology and node content is characterized by comprising the following steps:
s1, the complex network data with content information is denoted as G ═ (V, E, Q), where V ═ V1,v2,…,vnDenotes a set of nodes, E ═ E1,e2,…,emRepresenting a set of edges, Q representing a set of feature vectors of node contents;
s2, formally constructing network topology and node content information;
s3, respectively constructing a network topology and node contents by using the node community membership and the graph regular terms, respectively corresponding to two sub-models in the model of the method, and constructing an adaptive factor based on the mismatching of the network topology and the node contents;
s4, combining the two sub-models and the self-adaptive factor in the step S3 into a model under a unified framework, verifying the model on a data set, and evaluating the effectiveness of the unified model by using the normalized mutual information entropy as an evaluation algorithm.
2. The method of claim 1, wherein the step S2 specifically includes:
s2.1, formally constructing a network topology, wherein the specific process is as follows: constructing a adjacency matrix of G
Figure FDA0003129488610000011
Wherein a isij1 denotes a node viAnd vjBetween which there is an edge, aijWhen 0 denotes no edge, then a module matrix is constructed based on a
Figure FDA0003129488610000012
Wherein, bijRepresenting a node viAnd vjStrength of the link between, kiRepresenting a node viM represents the total number of edges, kikj2m represents the number of expected edges between two nodes;
s2.2, formally constructing node contents, wherein the specific process is as follows: constructing a feature matrix for Q
Figure FDA0003129488610000013
Each row of the matrix Q represents the content on one node and is in the form of an r-dimensional characteristic vector, and then a content similarity matrix between the nodes is constructed based on Q
Figure FDA0003129488610000014
Wherein u isijRepresenting a node viAnd vjCosine similarity of the feature vectors of (1).
3. The method for discovering adaptively fusing network topology and node content community according to claim 1 or 2, wherein the step S3 specifically includes:
s3.1, constructing a first sub-model based on topology information: using an autoencoder block to achieve modularity maximization tr (H) based on the autoencoder and non-Negative Matrix Factorization (NMF) theoretical similarityl TBHl) The sub-model parameter is a network topology characterization matrix in the hidden layer of the automatic encoder is HlWherein
Figure FDA0003129488610000015
Reconstructing the data of B for the auto-encoder, so that the objective function of the first sub-model is:
Figure FDA0003129488610000016
s3.2, structureEstablishing a second sub-model based on the node content: according to the similarity matrix U and the graph regular term tr (H)c TLHc) The sub-model parameter is a node content characterization matrix HcWherein the laplace matrix L ═ D-U, D ═ diag (D)1,d2…, dn) is a diagonal matrix, where diIs the sum of the elements of each row of the similarity matrix U, the objective function of the second submodel is as follows:
Figure FDA0003129488610000017
s3.3, constructing a network topology and a node content mismatch to construct an adaptive factor: the network topology characterization matrix based on the design in steps S3.1 and S3.2 is HlAnd a node content characterization matrix HcUsing a mapping H between matricesc≈HlP obtains a mapping matrix P representing the relationship between the network topology and the node content, and utilizes a characterization matrix H based on a generation framelAnd HcFitting P describing the matching degree and carrying out normalization processing to describe the matching degree between the network topology and the node content:
Figure FDA0003129488610000021
based on the matching degree function, constructing a self-adaptive factor fusing network topology and content information:
Figure FDA0003129488610000022
where arctan (×) is an arctangent function, and κ represents the community number of the network G.
4. The method for discovering adaptively fusing network topology and node content community according to any one of claims 1-3, wherein the step S4 specifically comprises:
will be described in detailFirst submodel O (H) based on network topology in S3l) And a second sub-model O (H) based on the node contentc) Organically merging, utilizing self-adaptive factors constructed based on network topology and node content mismatching to adjust the proportion of different sub-models, and then, based on Hc≈HlP is constructed as a unified model, setting HlFor H, the adaptation factor changes to:
Figure FDA0003129488610000023
wherein
Figure FDA0003129488610000024
Taking the matrixes H and P as model parameters of a final unified model, wherein the target function of the unified model is as follows:
Figure FDA0003129488610000025
and iteratively updating the model parameter matrixes H and P based on the minimization of the target function until the value of the target function is converged to obtain a parameter matrix H, namely a community membership matrix of the nodes, and clustering based on the community membership matrix H to detect the communities.
CN202110698553.9A 2021-06-23 2021-06-23 Community discovery method for adaptively fusing network topology and node content Pending CN113378075A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110698553.9A CN113378075A (en) 2021-06-23 2021-06-23 Community discovery method for adaptively fusing network topology and node content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110698553.9A CN113378075A (en) 2021-06-23 2021-06-23 Community discovery method for adaptively fusing network topology and node content

Publications (1)

Publication Number Publication Date
CN113378075A true CN113378075A (en) 2021-09-10

Family

ID=77578637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110698553.9A Pending CN113378075A (en) 2021-06-23 2021-06-23 Community discovery method for adaptively fusing network topology and node content

Country Status (1)

Country Link
CN (1) CN113378075A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115048436A (en) * 2022-06-01 2022-09-13 优米互动(北京)科技有限公司 High-dimensional financial time sequence stage division method based on visual principle

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115048436A (en) * 2022-06-01 2022-09-13 优米互动(北京)科技有限公司 High-dimensional financial time sequence stage division method based on visual principle

Similar Documents

Publication Publication Date Title
Ipsen Evolutionary reconstruction of networks
CN108009710A (en) Node test importance appraisal procedure based on similarity and TrustRank algorithms
CN115983984A (en) Multi-model fusion client risk rating method
CN115311205A (en) Industrial equipment fault detection method based on pattern neural network federal learning
CN113792110A (en) Equipment trust value evaluation method based on social networking services
CN116843400A (en) Block chain carbon emission transaction anomaly detection method and device based on graph representation learning
CN113378075A (en) Community discovery method for adaptively fusing network topology and node content
CN116010813A (en) Community detection method based on influence degree of fusion label nodes of graph neural network
CN115734274A (en) Cellular network fault diagnosis method based on deep learning and knowledge graph
CN115577283A (en) Entity classification method and device, electronic equipment and storage medium
CN113744072A (en) Fusion topology and content community detection method based on deep neural network
CN114679372A (en) Node similarity-based attention network link prediction method
CN115174421B (en) Network fault prediction method and device based on self-supervision unwrapping hypergraph attention
CN116596574A (en) Power grid user portrait construction method and system
CN112465253B (en) Method and device for predicting links in urban road network
CN114265954B (en) Graph representation learning method based on position and structure information
Bruno et al. Community detection in the hyperbolic space
CN116541792A (en) Method for carrying out group partner identification based on graph neural network node classification
Qin et al. [Retracted] Enterprise Performance Management following Big Data Analysis Technology under Multisource Information Fusion
Nasirzadeh et al. Linear regression analysis for interval‐valued functional data
CN113962748A (en) Method for aligning users of heterogeneous e-commerce platform by using holomorphic information representation based on meta-path
CN113902091A (en) Community discovery method based on nonlinear non-negative matrix decomposition
CN113961821A (en) Community detection method based on graph regular fusion heterogeneous topology and node content
Cho et al. Multiresolution community analysis of international trade networks
Wang et al. A method of social network node preference evaluation based on the topology potential

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination