CN113378075A - Community discovery method for adaptively fusing network topology and node content - Google Patents
Community discovery method for adaptively fusing network topology and node content Download PDFInfo
- Publication number
- CN113378075A CN113378075A CN202110698553.9A CN202110698553A CN113378075A CN 113378075 A CN113378075 A CN 113378075A CN 202110698553 A CN202110698553 A CN 202110698553A CN 113378075 A CN113378075 A CN 113378075A
- Authority
- CN
- China
- Prior art keywords
- node
- network topology
- matrix
- content
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a community discovery method for adaptively fusing network topology and node content, which belongs to the technical field of complex network analysis, and is characterized in that a social network data centralized community structure is mined, the network topology and the node content are respectively modeled by using node community membership and graph regular items, and then adaptive factors are introduced to fuse the network topology and the node content so as to construct an adaptive community detection model based on graph regularization; and finally, normalizing the effectiveness of the mutual information entropy evaluation model by using an evaluation algorithm. The invention has the beneficial effects that: network topology and node content are integrated by utilizing an automatic encoder and graph regularization, and on the other hand, mismatching of the network topology and the node content in a community discovery process is relieved by introducing a self-adaptive factor.
Description
Technical Field
The invention relates to the technical field of complex network analysis, in particular to a community discovery method for adaptively fusing network topology and node content.
Background
In real social network life, there are a large number of networked data sets, such as social networks, communication networks, etc., which can often be formatted as complex networks. For the analysis of complex networks, it is very important to find communities consisting of densely connected nodes. In general, when analyzing social networks, community discovery may help find groups of users with similar interests and purposes, and predict the behavior of users belonging to the community. The complex network contains both network topology and rich content information that can assist the network topology to improve the ability of community discovery. However, in a real network scenario, there is a mismatch between the network topology and the structure of the community described by the node content information. Therefore, the fused node content information and the network topology have a contradictory effect in the community discovery process, and the result of low community discovery execution capacity of the fused network topology and the node content occurs. How to effectively mine the community structure with unmatched content networks, network topologies and node contents, a new community discovery method is urgently needed.
Disclosure of Invention
The invention provides a community discovery method for adaptively fusing network topology and node content, which is mainly provided for the technical problems of mismatching between the network topology and the node content and insufficient representation capability of a community in community discovery; the model constructed based on the automatic encoder framework utilizes the neural network to learn nonlinear representation so as to improve the capability of representing the community structure by the community membership degree, and has good application value for discovering theory expansion and node content communities fusing network topology and node content communities.
The idea of the invention is as follows: firstly, obtaining data such as links among nodes describing network topology and text characteristics on the nodes describing node content information; then, performing matrixing processing on the network topology and the node content respectively, and performing module degree conversion and similarity conversion respectively; then, based on the similarity of an automatic encoder and a non-negative matrix factorization theory and the assumption that the contents of the nodes belonging to the same community have similarity by utilizing graph regularization description, introducing a self-adaptive factor balance topology and a content fusion proportion, and constructing a community discovery model of self-adaptive fusion network topology and node contents; and finally, deducing model parameters through model optimization, clustering the model parameters, and calculating the similarity degree of a clustering result and an original social group structure by using an evaluation algorithm to evaluate the performance of the model.
The technical scheme adopted by the invention is as follows: a community discovery method for adaptively fusing network topology and node content comprises the following steps:
s1, the complex network data with content information is denoted as G ═ (V, E, Q), where V ═ V1,v2,…,vnDenotes a set of nodes, E ═ E1,e2,…,emRepresenting a set of edges, Q representing a set of feature vectors of node contents;
s2, formally constructing network topology and node content information;
s3, respectively constructing a network topology and node contents by using the node community membership and the graph regular terms, respectively corresponding to two sub-models in the model of the method, and constructing an adaptive factor based on the mismatching of the network topology and the node contents;
s4, combining the two sub-models and the self-adaptive factor in the step S3 into a model under a unified framework, verifying the model on a data set, and evaluating the effectiveness of the unified model by using the normalized mutual information entropy as an evaluation algorithm.
The invention provides a further optimization scheme of the community discovery method for adaptively fusing network topology and node content, wherein the step S2 specifically comprises the following steps:
s2.1, formally constructing a network topology, wherein the specific process is as follows: constructing a adjacency matrix of GWherein a isij1 denotes a node viAnd vjBetween which there is an edge, aijWhen 0 denotes no edge, then a module matrix is constructed based on aWherein, bijRepresenting a node viAnd vjStrength of the link between, kiRepresenting a node viM represents the total number of edges, kikj2m represents the number of expected edges between two nodes;
s2.2, formally constructing node contents, wherein the specific process is as follows: constructing a feature matrix for QEach row of the matrix Q represents the content on one node in the form of an r-dimensional feature vector, and then a content similarity matrix between the nodes is constructed based on QWherein u isijRepresenting a node viAnd vjCosine similarity of the feature vectors of (1);
the invention provides a further optimization scheme of the community discovery method for adaptively fusing network topology and node content, wherein the step S3 specifically comprises the following steps:
and S3.1, constructing a first sub-model based on the topology information. Using an autoencoder block to achieve modularity maximization tr (H) based on autoencoder and non-Negative Matrix Factorization (NMF) theoretical similarityl TBHl) The sub-model parameter is a network topology characterization matrix in the hidden layer of the automatic encoder is HlAnd B is the data of the automatic encoder reconstruction B, so that the objective function of the first sub-model is obtained as follows:
s3.2, constructing a second sub-model based on the node content according to the similarity matrix U and the graph regular term tr (H)c TLHc) The sub-model parameters are node contentsCharacterization matrix HcWherein the laplace matrix L ═ D-U, D ═ diag (D)1,d2…, dn) is a diagonal matrix, where diIs the sum of the elements of each row of the similarity matrix U, the objective function of the second submodel is as follows:
s3.3, constructing a network topology and a node content mismatch to construct an adaptive factor, wherein the adaptive factor is H based on the network topology characterization matrix designed in the steps S3.1 and S3.2lAnd a node content characterization matrix HcUsing a mapping H between matricesc≈HlP obtains a mapping matrix P representing the relationship between the network topology and the node content, and utilizes a characterization matrix H based on a generation framelAnd HcFitting P describing the matching degree and carrying out normalization processing to describe the matching degree between the network topology and the node content:
based on the matching degree function, constructing a self-adaptive factor fusing network topology and content information:
where arctan (×) is an arctangent function, and κ represents the community number of the network G.
The invention provides a further optimization scheme of the community discovery method for adaptively fusing network topology and node content, wherein the step S4 specifically comprises the following steps:
the first sub-model O (H) based on the network topology in the step S3l) And a second sub-model O (H) based on the node contentc) Organically integrating, utilizing self-adaptive factors constructed based on network topology and node content mismatching to adjust the proportion of different sub-models, and then, based on Hc≈HlP is constructed as a unified model, setting HlFor H, the adaptation factor changes to:
whereinTaking the matrixes H and P as model parameters of a final unified model, wherein the target function of the unified model is as follows:
and finally, obtaining a parameter matrix H, namely a community membership matrix of the nodes, and clustering based on the community membership matrix H to detect the communities.
Compared with the prior art, the invention has the beneficial effects that: according to the method, on one hand, the network topology and the node content are fused by utilizing an automatic encoder and graph regular, on the other hand, the sensitivity of the community discovery method to topology and content mismatching is relieved, and the two kinds of information are fused in a self-adaptive manner; meanwhile, the capability of representing communities of community membership is enhanced based on the neural network structure of the automatic encoder, and the quality of the discovery of the fusion topology and the content communities is further improved; when the network topology is matched with the node content, the execution force of community discovery is improved; when the network topology is not matched with the node content, the contradictory effects of the node content and the network topology are relieved, and the execution force of community discovery can be still improved. Meanwhile, the community membership obtained by the new method is required to have better representation community capability.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
Fig. 1 is a schematic overall flow chart of the community discovery method for adaptively fusing network topology and node content according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. Of course, the specific embodiments described herein are merely illustrative of the invention and are not intended to be limiting.
Example 1
Referring to fig. 1, table 1 and table 2, the present invention provides a social group discovery method for adaptively fusing network topology and node content,
(1) and obtaining community information using the data set. Community information is acquired using the common data set, the community information G ═ V, E, Q: v is a point { V1,v2,…,vnSet of E ═ E1,e2,…,emThe experimental data set of the community discovery method for adaptively fusing network topology and node content is shown in table 1:
table 1 experimental data information
Wherein Citeser is a citation network, which is composed of 3312 scientific publications in 6 sub-domains and relates to 4732 citation relationships, and WebKB network is composed of 4 sub-networks, which are respectively web page (with 1703-dimensional binary word attributes) datasets collected from Cornell, Texas, Washington, and Wisconsin university, each sub-network containing 5 communities.
(2) The steps for acquiring the topology information and the content of the node according to the step (1) are as follows:
constructing a adjacency matrix of G according to the data in the table 1Wherein a isij1 denotes a node viAnd vjBetween which there is an edge, aijWhen 0 denotes no edge, then a module matrix is constructed based on aThen, for the node content, a feature matrix of Q is constructedEach row of the matrix Q represents the content on one node and is in the form of an r-dimensional characteristic vector, and then a content similarity matrix between the nodes is constructed based on Q
(3) Constructing a topology first model based on network topology and a second sub-model based on node content, and introducing a self-adaptive factor based on the matching degree of the topology and the content to reduce the sensitivity of a fusion model to the mismatching of the two kinds of information; unifying the two sub models into a final model by using the self-adaptive factor, and obtaining a target function as follows:
the adaptation factor is formalized as:
whereinWherein arctan (—) is an arctangent function, κ represents the community number of the network G,representing community membership matrix representing a representative community, mapping matrixDisplay netThe relationship between the network topology and the node contents.
(4) And continuously iterating and updating H and P based on a gradient descent method of an automatic encoder framework until the H and P are converged to obtain a characterization matrix H, and finally obtaining community affiliation of all nodes.
(5) The method comprises the following steps of using normalized mutual information entropy (NMI) as an evaluation index of a model, normalizing mutual information to be between [0 and 1] through the normalized mutual information entropy, based on a confusion matrix C, enabling each list of predicted values of the matrix to be in a predicted value, enabling each row to represent an actual category, and generally presenting a visualization effect of algorithm performance, wherein a specific expression is as follows:
(6) the experiment tests the proposed model on five public data sets, and the experiment result of the community discovery method for adaptively fusing network topology and node content is shown in table 2:
TABLE 2 test results based on common data set
From the analysis of table 2, it can be seen that the method provided by the present invention performs model validity verification on 5 common data sets with content information and topology information, and calculates the mean value of the method based on standard mutual information entropy (NMI) on different data sets. On the data sets Citeseer, Texas and Wisconsin, based on NMI calculation, the similarity of the community division obtained by the method and the real community division of the data set is respectively 35.11, 39.40 and 36.67, and both exceed the average level 31.39. On the data Wisconsin, the community structure and the real community division of the data set obtained by the method are closer to 40. Although the method only appears to be 18.93 on a data set Cornell, in general, the method provided by the invention adaptively fuses network topology and content information and shows good community detection performance on a real data set.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (4)
1. A community discovery method for adaptively fusing network topology and node content is characterized by comprising the following steps:
s1, the complex network data with content information is denoted as G ═ (V, E, Q), where V ═ V1,v2,…,vnDenotes a set of nodes, E ═ E1,e2,…,emRepresenting a set of edges, Q representing a set of feature vectors of node contents;
s2, formally constructing network topology and node content information;
s3, respectively constructing a network topology and node contents by using the node community membership and the graph regular terms, respectively corresponding to two sub-models in the model of the method, and constructing an adaptive factor based on the mismatching of the network topology and the node contents;
s4, combining the two sub-models and the self-adaptive factor in the step S3 into a model under a unified framework, verifying the model on a data set, and evaluating the effectiveness of the unified model by using the normalized mutual information entropy as an evaluation algorithm.
2. The method of claim 1, wherein the step S2 specifically includes:
s2.1, formally constructing a network topology, wherein the specific process is as follows: constructing a adjacency matrix of GWherein a isij1 denotes a node viAnd vjBetween which there is an edge, aijWhen 0 denotes no edge, then a module matrix is constructed based on aWherein, bijRepresenting a node viAnd vjStrength of the link between, kiRepresenting a node viM represents the total number of edges, kikj2m represents the number of expected edges between two nodes;
s2.2, formally constructing node contents, wherein the specific process is as follows: constructing a feature matrix for QEach row of the matrix Q represents the content on one node and is in the form of an r-dimensional characteristic vector, and then a content similarity matrix between the nodes is constructed based on QWherein u isijRepresenting a node viAnd vjCosine similarity of the feature vectors of (1).
3. The method for discovering adaptively fusing network topology and node content community according to claim 1 or 2, wherein the step S3 specifically includes:
s3.1, constructing a first sub-model based on topology information: using an autoencoder block to achieve modularity maximization tr (H) based on the autoencoder and non-Negative Matrix Factorization (NMF) theoretical similarityl TBHl) The sub-model parameter is a network topology characterization matrix in the hidden layer of the automatic encoder is HlWhereinReconstructing the data of B for the auto-encoder, so that the objective function of the first sub-model is:
s3.2, structureEstablishing a second sub-model based on the node content: according to the similarity matrix U and the graph regular term tr (H)c TLHc) The sub-model parameter is a node content characterization matrix HcWherein the laplace matrix L ═ D-U, D ═ diag (D)1,d2…, dn) is a diagonal matrix, where diIs the sum of the elements of each row of the similarity matrix U, the objective function of the second submodel is as follows:
s3.3, constructing a network topology and a node content mismatch to construct an adaptive factor: the network topology characterization matrix based on the design in steps S3.1 and S3.2 is HlAnd a node content characterization matrix HcUsing a mapping H between matricesc≈HlP obtains a mapping matrix P representing the relationship between the network topology and the node content, and utilizes a characterization matrix H based on a generation framelAnd HcFitting P describing the matching degree and carrying out normalization processing to describe the matching degree between the network topology and the node content:
based on the matching degree function, constructing a self-adaptive factor fusing network topology and content information:
where arctan (×) is an arctangent function, and κ represents the community number of the network G.
4. The method for discovering adaptively fusing network topology and node content community according to any one of claims 1-3, wherein the step S4 specifically comprises:
will be described in detailFirst submodel O (H) based on network topology in S3l) And a second sub-model O (H) based on the node contentc) Organically merging, utilizing self-adaptive factors constructed based on network topology and node content mismatching to adjust the proportion of different sub-models, and then, based on Hc≈HlP is constructed as a unified model, setting HlFor H, the adaptation factor changes to:
Taking the matrixes H and P as model parameters of a final unified model, wherein the target function of the unified model is as follows:
and iteratively updating the model parameter matrixes H and P based on the minimization of the target function until the value of the target function is converged to obtain a parameter matrix H, namely a community membership matrix of the nodes, and clustering based on the community membership matrix H to detect the communities.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110698553.9A CN113378075A (en) | 2021-06-23 | 2021-06-23 | Community discovery method for adaptively fusing network topology and node content |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110698553.9A CN113378075A (en) | 2021-06-23 | 2021-06-23 | Community discovery method for adaptively fusing network topology and node content |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113378075A true CN113378075A (en) | 2021-09-10 |
Family
ID=77578637
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110698553.9A Pending CN113378075A (en) | 2021-06-23 | 2021-06-23 | Community discovery method for adaptively fusing network topology and node content |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113378075A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115048436A (en) * | 2022-06-01 | 2022-09-13 | 优米互动(北京)科技有限公司 | High-dimensional financial time sequence stage division method based on visual principle |
-
2021
- 2021-06-23 CN CN202110698553.9A patent/CN113378075A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115048436A (en) * | 2022-06-01 | 2022-09-13 | 优米互动(北京)科技有限公司 | High-dimensional financial time sequence stage division method based on visual principle |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ipsen | Evolutionary reconstruction of networks | |
CN108009710A (en) | Node test importance appraisal procedure based on similarity and TrustRank algorithms | |
CN115983984A (en) | Multi-model fusion client risk rating method | |
CN115311205A (en) | Industrial equipment fault detection method based on pattern neural network federal learning | |
CN113792110A (en) | Equipment trust value evaluation method based on social networking services | |
CN116843400A (en) | Block chain carbon emission transaction anomaly detection method and device based on graph representation learning | |
CN113378075A (en) | Community discovery method for adaptively fusing network topology and node content | |
CN116010813A (en) | Community detection method based on influence degree of fusion label nodes of graph neural network | |
CN115734274A (en) | Cellular network fault diagnosis method based on deep learning and knowledge graph | |
CN115577283A (en) | Entity classification method and device, electronic equipment and storage medium | |
CN113744072A (en) | Fusion topology and content community detection method based on deep neural network | |
CN114679372A (en) | Node similarity-based attention network link prediction method | |
CN115174421B (en) | Network fault prediction method and device based on self-supervision unwrapping hypergraph attention | |
CN116596574A (en) | Power grid user portrait construction method and system | |
CN112465253B (en) | Method and device for predicting links in urban road network | |
CN114265954B (en) | Graph representation learning method based on position and structure information | |
Bruno et al. | Community detection in the hyperbolic space | |
CN116541792A (en) | Method for carrying out group partner identification based on graph neural network node classification | |
Qin et al. | [Retracted] Enterprise Performance Management following Big Data Analysis Technology under Multisource Information Fusion | |
Nasirzadeh et al. | Linear regression analysis for interval‐valued functional data | |
CN113962748A (en) | Method for aligning users of heterogeneous e-commerce platform by using holomorphic information representation based on meta-path | |
CN113902091A (en) | Community discovery method based on nonlinear non-negative matrix decomposition | |
CN113961821A (en) | Community detection method based on graph regular fusion heterogeneous topology and node content | |
Cho et al. | Multiresolution community analysis of international trade networks | |
Wang et al. | A method of social network node preference evaluation based on the topology potential |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |