CN110851732A - Attribute network semi-supervised community discovery method based on non-negative matrix three-factor decomposition - Google Patents
Attribute network semi-supervised community discovery method based on non-negative matrix three-factor decomposition Download PDFInfo
- Publication number
- CN110851732A CN110851732A CN201911033689.7A CN201911033689A CN110851732A CN 110851732 A CN110851732 A CN 110851732A CN 201911033689 A CN201911033689 A CN 201911033689A CN 110851732 A CN110851732 A CN 110851732A
- Authority
- CN
- China
- Prior art keywords
- model
- matrix
- supervised
- semi
- community
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 239000011159 matrix material Substances 0.000 title claims abstract description 45
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 20
- 238000012549 training Methods 0.000 claims abstract description 9
- 238000011478 gradient descent method Methods 0.000 claims abstract description 6
- 238000012545 processing Methods 0.000 claims abstract description 4
- 238000010276 construction Methods 0.000 claims description 2
- 238000009795 derivation Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 241000689227 Cora <basidiomycete fungus> Species 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Abstract
The invention discloses a method for discovering an attribute network semi-supervised community based on non-negative matrix three-factor decomposition. Then, optimizing the model, calculating the partial derivative of the unknown variable in the model, and making the partial derivative be 0 to obtain the updating rule of each variable; collecting and processing data, and extracting a required adjacency matrix, prior information and a content matrix from an attribute network; randomly initializing parameters and unknown variables thereof, carrying out a training process by using the updating rules about the unknown variables obtained in the step 2) by adopting a gradient descent method, putting the processed data set into the model in the step 1) for training, and continuously iterating until the updating of the parameters is converged.
Description
Technical Field
The invention belongs to the field of machine learning, complex networks and natural language processing, mainly relates to fusion of network information, provides a method for reducing dimension and fusing information by adopting a non-negative matrix factorization technology, and particularly relates to a method for discovering an attribute network semi-supervised community based on non-negative matrix three-factor factorization.
Background
With the development of the internet, data generated by an online social network is more and more, and the data has links and semantic contents, such as user blogs, research papers and the like. These data are typically modeled as a network of attributes, with links forming the topology of the graph and content modeled as attributes of the nodes in the graph. The discovery of semantic communities of these networks is of great significance. For example, in a paper citation network, each node represents a paper, the papers are cited with each other, each paper has its content, and the community to which the papers belong is determined according to the links and the content, so that researchers can be helped to know the frontier of the current research field. Therefore, how to integrate links and content in a network to determine a more accurate semantic community structure is a very challenging and meaningful problem.
Nowadays, many community discovery methods for studying attribute networks are also proposed. Depending on the type of data used for clustering, they can be classified into four categories: 1) topology-based methods, 2) attribute-based methods, 3) integration methods, and 4) model-based methods. The first type translates community discovery on an attribute network into graph clustering on a new reconstructed network, where the attributes of the nodes are also modeled as topological information. The second type converts community discovery on attribute networks into traditional vector data clustering, where links and content are merged to compute the magnitude of similarity between pairs of nodes. The integration method combines the results of different clustering, i.e. links and content are modeled jointly by a NMF model-based method or a probabilistic model. Particularly, the model-based method can make full use of links and contents to resolve the clustering problem into a probabilistic reasoning process, and compared with other methods, the method has a solid theoretical basis, so that the method is generally considered to have good performance. However, when the network is very sparse and the community structure is too fuzzy, these methods often cannot accurately identify the community structure and semantic information thereof due to the uniqueness of the information. And, the prior and the heterogeneity of the degree are two key factors influencing the network clustering result. In view of the limitations of the current methods, the present document aims to research a new method for fusing attribute network features based on non-negative matrix tri-factorization to solve the defects of the above methods.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a model for effectively combining attribute network links and contents, so that more accurate semantic communities and the interrelation between the communities are obtained.
In order to achieve the purpose, the technical scheme adopted by the invention is based on the prior information assisted by the link and node content of a non-negative matrix three-factor decomposition fusion attribute network so as to improve the performance of community discovery, and the method comprises the following steps:
1) constructing a matrix decomposition model combining links and contents, wherein the matrix decomposition model comprises three parts of topology, prior and a content matrix, and describing the meaning of each variable in the model in detail, and the specific steps are as follows:
(1) construction of a non-negative matrix factorization model
Constructing a non-negative matrix three-factor decomposition model based on the link information, describing the membership of nodes and networks and the interrelation between communities, and expressing the model as follows:
the meanings of the characters in the formula can be referred to in table 1.
Table 1 is an explanation of the corresponding identification in the matrix decomposition model
(2) Semi-supervised model with embedded prior information
The invention adopts the must-link constraint as prior information to strengthen the community structure representation. The distance between two nodes belonging to the same community in the high-dimensional space in the low-dimensional space should be similar, the Euler distance is adopted to measure the distance between the two nodes, and the semi-supervised community discovery model is constructed by combining the link information:
(3) semi-supervised model for introducing node popularity
Since heterogeneity of degrees tends to increase the euler distance between two nodes belonging to a community, the incoming node popularity matrix W can eliminate this influence, and the semi-supervised model of incoming node popularity is defined as:
(4) semi-supervised model combining links with content
In order to better combine links and contents, the invention adopts the same potential space to approximate the potential space of the connection between nodes, and adopts a bag-of-words method to define a content matrix C, and the content matrix decomposition is defined as:
therefore, the invention finally constructs a semi-supervised community semantic discovery model combining links and contents and simultaneously considering the heterogeneity of prior and degree:
2) optimizing the model, performing partial derivation on unknown variables in the model, and setting the partial derivation as 0 to obtain an updating rule of each variable;
3) collecting and processing data, and extracting a required adjacency matrix, prior information and a content matrix from an attribute network;
4) randomly initializing parameters and unknown variables thereof, carrying out a training process by using the updating rule about each unknown variable obtained in the step 2 by adopting a gradient descent method, putting the processed data set into the model in the step 1 for training, and continuously iterating until the parameter updating is converged;
5) and recording the obtained parameter results and variable results thereof into related documents, namely the membership between nodes and communities, the relationship between communities and the semantic interpretation of communities, and visualizing experimental results.
Has the advantages that:
1. by introducing prior information in the attribute network and considering the problem of heterogeneity of node degrees, the community structure is strengthened, and the community discovery capability is improved.
2. The method of the invention combines the link relation information and the content information in the same low-dimensional potential space, effectively solves the problems of the community discovery result and the poor semantic interpretation effect of the conventional community model caused by fuzzy semantics (such as word ambiguity), and ensures that the interpretability of the method is stronger. The gradient descent method is utilized to enable the variables and the parameters to be updated simply, quickly and short in convergence time, and the method can be applied to a large-scale network.
3. The invention adopts a non-negative matrix three-factor decomposition method, not only obtains the membership relation between the nodes and the communities in the network, but also obtains the relation between the communities and the communities, and simultaneously explains the semantic information of each community.
The characteristics are as follows:
a. besides traditional link information, prior information and content information in the network are effectively and fully combined;
b. establishing a model with stronger interpretability by a non-negative matrix factorization method;
c. the updating rule is simple and quick;
d. and the expandability is strong.
Description of the drawings:
FIG. 1 is a diagram of a model framework established by the method of the present invention.
Detailed Description
The invention will be further illustrated by means of a specific example. The examples of the present invention are for better understanding of the present invention by those skilled in the art, and do not limit the present invention in any way.
In order to better solve the problems of network sparseness and semantic ambiguity and fully utilize the information of the network to discover more information about communities, the invention establishes a model which has strong interpretability and effectively combines network links and contents by utilizing matrix decomposition, and adopts a gradient descent method to ensure that the method is easy to understand and has high operation speed. Through the training model of the invention, the user can obtain more information (node membership, community between communities and community semantic information) about the communities in the network. The method can be widely applied to the fields of text classification clustering, commodity recommendation, information retrieval and the like.
The modeling process (i.e., the required information and corresponding matrices, the bonding process, and the effects that can be produced) established by the present invention is shown in FIG. 1.
The technical scheme adopted by the invention integrates the link and node contents of the attribute network and adds auxiliary prior information to improve the performance of community discovery, and comprises the following steps:
1. constructing a matrix decomposition model combining links and contents, wherein the matrix decomposition model comprises three parts of a topology, a priori and a content matrix, and the meaning of each variable in the model is described in detail;
2. optimizing the model, performing partial derivation on unknown variables in the model, and setting the partial derivation as 0 to obtain an updating rule of each variable;
3. collecting and processing data, taking Cora data in Table 2 as an example, and extracting a required adjacency matrix, prior information and a content matrix from an attribute network;
table 2 is a detailed description of the test data
4. Randomly initializing parameters and unknown variables thereof, carrying out a training process by using the updating rule about each unknown variable obtained in the step 2 by adopting a gradient descent method, putting the processed data set into the model in the step 1 for training, and continuously iterating until the parameter updating is converged;
5. and recording the obtained parameter results and variable results thereof into related documents, namely the membership between nodes and communities, the relationship between communities and the semantic interpretation of communities, and visualizing experimental results.
Table 3 shows the classification Accuracy (AC) and the standard mutual information index (NMI) of the experimental results of the comparison with other community discovery model methods according to the present invention
Method of producing a composite material | SSNMF | FSSNMF | PSSNMF | WSCDSM | The invention |
AC | 0.3027 | 0.6012 | 0.8224 | 0.7216 | 0.8470 |
NMI | 0.4270 | 0.7615 | 0.9062 | 0.7531 | 0.9401 |
The invention provides an information fusion method based on non-negative matrix three-factor decomposition. The main contributions of the present invention include three aspects: 1. the method combines various information: the link information, the prior information and the content information are combined, and the information in the network is effectively and fully utilized; 2. the method adopts a non-negative matrix three-factor decomposition based method, so that the model has stronger interpretability and obtains more information about the community; 3. the method provided by the invention can solve the problems of network sparsity and semantic ambiguity more simply and quickly.
The experimental results of the sample data are shown, and are shown in table 3. It should be understood that the embodiments and examples discussed herein are illustrative only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims.
The method has better performance for improving the community discovery capability and exploring more community information.
Claims (3)
1. The attribute network semi-supervised community discovery method based on non-negative matrix three-factor decomposition is characterized by comprising the following steps of:
1) constructing a matrix decomposition model combining links and contents, wherein the matrix decomposition model comprises three parts of a topology matrix, a prior matrix and a content matrix, describing the meaning of each variable in the model in detail, optimizing the model, solving the partial derivative of an unknown variable in the model, and setting the partial derivative as 0 to obtain an update rule of each variable;
2) collecting and processing data, and extracting a required adjacency matrix, prior information and a content matrix from an attribute network;
3) randomly initializing parameters and unknown variables thereof, carrying out a training process by using the updating rules about the unknown variables obtained in the step 2) by adopting a gradient descent method, putting the processed data set into the model in the step 1) for training, and continuously iterating until the updating of the parameters is converged.
2. The method for discovering the attribute network semi-supervised community based on the non-negative matrix tri-factorization as recited in claim 1, wherein the step 1) is specifically as follows:
(1) construction of a non-negative matrix factorization model
Constructing a non-negative matrix three-factor decomposition model based on the link information, describing the membership of nodes and networks and the interrelation between communities, and expressing the model as follows:
(2) semi-supervised model with embedded prior information
According to the invention, a must-link constraint is used as prior information to strengthen community structure representation;
the distance between two nodes belonging to the same community in the high-dimensional space in the low-dimensional space should be similar, the Euler distance is adopted to measure the distance between the two nodes, and the semi-supervised community discovery model is constructed by combining the link information:
(3) semi-supervised model for introducing node popularity
Since heterogeneity of degrees tends to increase the euler distance between two nodes belonging to a community, the incoming node popularity matrix W eliminates this influence, and the semi-supervised model of incoming node popularity is defined as:
(4) semi-supervised model combining links with content
In order to better combine links and contents, the invention adopts the same potential space to approximate the potential space of the connection between nodes, and adopts a bag-of-words method to define a content matrix C, and the content matrix decomposition is defined as:
finally, a semi-supervised community semantic discovery model combining links and contents and considering heterogeneity of prior and degree is constructed:
3. the method for discovering attribute network semi-supervised community based on non-negative matrix tri-factorization as recited in claim 1, wherein the obtained parameter results and their variable results are recorded into related documents, namely, membership between nodes and communities, relationships between communities and semantic interpretation of communities, and experimental results are visualized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911033689.7A CN110851732A (en) | 2019-10-28 | 2019-10-28 | Attribute network semi-supervised community discovery method based on non-negative matrix three-factor decomposition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911033689.7A CN110851732A (en) | 2019-10-28 | 2019-10-28 | Attribute network semi-supervised community discovery method based on non-negative matrix three-factor decomposition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110851732A true CN110851732A (en) | 2020-02-28 |
Family
ID=69598731
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911033689.7A Pending CN110851732A (en) | 2019-10-28 | 2019-10-28 | Attribute network semi-supervised community discovery method based on non-negative matrix three-factor decomposition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110851732A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114118094A (en) * | 2021-11-12 | 2022-03-01 | 国网天津市电力公司 | Semantic community discovery method based on non-negative matrix factorization |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104035978A (en) * | 2014-05-26 | 2014-09-10 | 南京泰锐斯通信科技有限公司 | Association discovering method and system |
US20180165554A1 (en) * | 2016-12-09 | 2018-06-14 | The Research Foundation For The State University Of New York | Semisupervised autoencoder for sentiment analysis |
CN108334580A (en) * | 2018-01-25 | 2018-07-27 | 重庆邮电大学 | A kind of community discovery method of combination link and attribute information |
CN109165743A (en) * | 2018-07-17 | 2019-01-08 | 东南大学 | A kind of semi-supervised network representation learning algorithm based on depth-compression self-encoding encoder |
CN109299464A (en) * | 2018-10-12 | 2019-02-01 | 天津大学 | Based on the insertion of the theme of network linking and document content, document representing method |
CN109583562A (en) * | 2017-09-28 | 2019-04-05 | 西门子股份公司 | SGCNN: the convolutional neural networks based on figure of structure |
CN110264372A (en) * | 2019-05-16 | 2019-09-20 | 西安交通大学 | A kind of theme Combo discovering method indicated based on node |
-
2019
- 2019-10-28 CN CN201911033689.7A patent/CN110851732A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104035978A (en) * | 2014-05-26 | 2014-09-10 | 南京泰锐斯通信科技有限公司 | Association discovering method and system |
US20180165554A1 (en) * | 2016-12-09 | 2018-06-14 | The Research Foundation For The State University Of New York | Semisupervised autoencoder for sentiment analysis |
CN109583562A (en) * | 2017-09-28 | 2019-04-05 | 西门子股份公司 | SGCNN: the convolutional neural networks based on figure of structure |
CN108334580A (en) * | 2018-01-25 | 2018-07-27 | 重庆邮电大学 | A kind of community discovery method of combination link and attribute information |
CN109165743A (en) * | 2018-07-17 | 2019-01-08 | 东南大学 | A kind of semi-supervised network representation learning algorithm based on depth-compression self-encoding encoder |
CN109299464A (en) * | 2018-10-12 | 2019-02-01 | 天津大学 | Based on the insertion of the theme of network linking and document content, document representing method |
CN110264372A (en) * | 2019-05-16 | 2019-09-20 | 西安交通大学 | A kind of theme Combo discovering method indicated based on node |
Non-Patent Citations (1)
Title |
---|
罗国华,金弟等: ""一种基于高斯混合模型、结合拓扑与内容的大规模社团发现方法"", 《小型微型计算机系统》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114118094A (en) * | 2021-11-12 | 2022-03-01 | 国网天津市电力公司 | Semantic community discovery method based on non-negative matrix factorization |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111488734B (en) | Emotional feature representation learning system and method based on global interaction and syntactic dependency | |
CN107766933B (en) | Visualization method for explaining convolutional neural network | |
CN108681557B (en) | Short text topic discovery method and system based on self-expansion representation and similar bidirectional constraint | |
Yang et al. | Co-embedding network nodes and hierarchical labels with taxonomy based generative adversarial networks | |
CN112650929B (en) | Graph neural network recommendation method integrating comment information | |
CN112364161B (en) | Microblog theme mining method based on dynamic behaviors of heterogeneous social media users | |
Huang et al. | Large-scale heterogeneous feature embedding | |
CN112559764A (en) | Content recommendation method based on domain knowledge graph | |
CN112417289A (en) | Information intelligent recommendation method based on deep clustering | |
CN113204643B (en) | Entity alignment method, device, equipment and medium | |
CN110851733A (en) | Community discovery and emotion interpretation method based on network topology and document content | |
CN111210307A (en) | Scientific and technological service chain intelligent recommendation system and method with response user preference as core | |
CN110830291A (en) | Node classification method of heterogeneous information network based on meta-path | |
CN110851732A (en) | Attribute network semi-supervised community discovery method based on non-negative matrix three-factor decomposition | |
CN112286996A (en) | Node embedding method based on network link and node attribute information | |
CN112685440A (en) | Structural query information expression method for marking search semantic role | |
CN111209611A (en) | Hyperbolic geometry-based directed network space embedding method | |
CN110765276A (en) | Entity alignment method and device in knowledge graph | |
Wu et al. | MDAL: Multi-task dual attention LSTM model for semi-supervised network embedding | |
CN111291182A (en) | Hotspot event discovery method, device, equipment and storage medium | |
Jian et al. | An improved memory networks based product model classification method | |
Peng et al. | TH-SLP: Web Service Link Prediction Based on Topic-aware Heterogeneous Graph Neural Network | |
Menendez | Clustering: finding patterns in the darkness | |
Zhang et al. | Personalized web page ranking based graph convolutional network for community detection in attribute networks | |
Kumari et al. | An Experimental Method for Clustering Sentiment Using Data with Emoticons |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200228 |