CN110851732A - Attribute network semi-supervised community discovery method based on non-negative matrix three-factor decomposition - Google Patents

Attribute network semi-supervised community discovery method based on non-negative matrix three-factor decomposition Download PDF

Info

Publication number
CN110851732A
CN110851732A CN201911033689.7A CN201911033689A CN110851732A CN 110851732 A CN110851732 A CN 110851732A CN 201911033689 A CN201911033689 A CN 201911033689A CN 110851732 A CN110851732 A CN 110851732A
Authority
CN
China
Prior art keywords
model
matrix
supervised
semi
community
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911033689.7A
Other languages
Chinese (zh)
Inventor
金弟
何静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201911033689.7A priority Critical patent/CN110851732A/en
Publication of CN110851732A publication Critical patent/CN110851732A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention discloses a method for discovering an attribute network semi-supervised community based on non-negative matrix three-factor decomposition. Then, optimizing the model, calculating the partial derivative of the unknown variable in the model, and making the partial derivative be 0 to obtain the updating rule of each variable; collecting and processing data, and extracting a required adjacency matrix, prior information and a content matrix from an attribute network; randomly initializing parameters and unknown variables thereof, carrying out a training process by using the updating rules about the unknown variables obtained in the step 2) by adopting a gradient descent method, putting the processed data set into the model in the step 1) for training, and continuously iterating until the updating of the parameters is converged.

Description

Attribute network semi-supervised community discovery method based on non-negative matrix three-factor decomposition
Technical Field
The invention belongs to the field of machine learning, complex networks and natural language processing, mainly relates to fusion of network information, provides a method for reducing dimension and fusing information by adopting a non-negative matrix factorization technology, and particularly relates to a method for discovering an attribute network semi-supervised community based on non-negative matrix three-factor factorization.
Background
With the development of the internet, data generated by an online social network is more and more, and the data has links and semantic contents, such as user blogs, research papers and the like. These data are typically modeled as a network of attributes, with links forming the topology of the graph and content modeled as attributes of the nodes in the graph. The discovery of semantic communities of these networks is of great significance. For example, in a paper citation network, each node represents a paper, the papers are cited with each other, each paper has its content, and the community to which the papers belong is determined according to the links and the content, so that researchers can be helped to know the frontier of the current research field. Therefore, how to integrate links and content in a network to determine a more accurate semantic community structure is a very challenging and meaningful problem.
Nowadays, many community discovery methods for studying attribute networks are also proposed. Depending on the type of data used for clustering, they can be classified into four categories: 1) topology-based methods, 2) attribute-based methods, 3) integration methods, and 4) model-based methods. The first type translates community discovery on an attribute network into graph clustering on a new reconstructed network, where the attributes of the nodes are also modeled as topological information. The second type converts community discovery on attribute networks into traditional vector data clustering, where links and content are merged to compute the magnitude of similarity between pairs of nodes. The integration method combines the results of different clustering, i.e. links and content are modeled jointly by a NMF model-based method or a probabilistic model. Particularly, the model-based method can make full use of links and contents to resolve the clustering problem into a probabilistic reasoning process, and compared with other methods, the method has a solid theoretical basis, so that the method is generally considered to have good performance. However, when the network is very sparse and the community structure is too fuzzy, these methods often cannot accurately identify the community structure and semantic information thereof due to the uniqueness of the information. And, the prior and the heterogeneity of the degree are two key factors influencing the network clustering result. In view of the limitations of the current methods, the present document aims to research a new method for fusing attribute network features based on non-negative matrix tri-factorization to solve the defects of the above methods.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a model for effectively combining attribute network links and contents, so that more accurate semantic communities and the interrelation between the communities are obtained.
In order to achieve the purpose, the technical scheme adopted by the invention is based on the prior information assisted by the link and node content of a non-negative matrix three-factor decomposition fusion attribute network so as to improve the performance of community discovery, and the method comprises the following steps:
1) constructing a matrix decomposition model combining links and contents, wherein the matrix decomposition model comprises three parts of topology, prior and a content matrix, and describing the meaning of each variable in the model in detail, and the specific steps are as follows:
(1) construction of a non-negative matrix factorization model
Constructing a non-negative matrix three-factor decomposition model based on the link information, describing the membership of nodes and networks and the interrelation between communities, and expressing the model as follows:
the meanings of the characters in the formula can be referred to in table 1.
Table 1 is an explanation of the corresponding identification in the matrix decomposition model
Figure BDA0002250851940000022
(2) Semi-supervised model with embedded prior information
The invention adopts the must-link constraint as prior information to strengthen the community structure representation. The distance between two nodes belonging to the same community in the high-dimensional space in the low-dimensional space should be similar, the Euler distance is adopted to measure the distance between the two nodes, and the semi-supervised community discovery model is constructed by combining the link information:
Figure BDA0002250851940000023
(3) semi-supervised model for introducing node popularity
Since heterogeneity of degrees tends to increase the euler distance between two nodes belonging to a community, the incoming node popularity matrix W can eliminate this influence, and the semi-supervised model of incoming node popularity is defined as:
Figure BDA0002250851940000031
(4) semi-supervised model combining links with content
In order to better combine links and contents, the invention adopts the same potential space to approximate the potential space of the connection between nodes, and adopts a bag-of-words method to define a content matrix C, and the content matrix decomposition is defined as:
Figure BDA0002250851940000032
therefore, the invention finally constructs a semi-supervised community semantic discovery model combining links and contents and simultaneously considering the heterogeneity of prior and degree:
Figure BDA0002250851940000033
2) optimizing the model, performing partial derivation on unknown variables in the model, and setting the partial derivation as 0 to obtain an updating rule of each variable;
3) collecting and processing data, and extracting a required adjacency matrix, prior information and a content matrix from an attribute network;
4) randomly initializing parameters and unknown variables thereof, carrying out a training process by using the updating rule about each unknown variable obtained in the step 2 by adopting a gradient descent method, putting the processed data set into the model in the step 1 for training, and continuously iterating until the parameter updating is converged;
5) and recording the obtained parameter results and variable results thereof into related documents, namely the membership between nodes and communities, the relationship between communities and the semantic interpretation of communities, and visualizing experimental results.
Has the advantages that:
1. by introducing prior information in the attribute network and considering the problem of heterogeneity of node degrees, the community structure is strengthened, and the community discovery capability is improved.
2. The method of the invention combines the link relation information and the content information in the same low-dimensional potential space, effectively solves the problems of the community discovery result and the poor semantic interpretation effect of the conventional community model caused by fuzzy semantics (such as word ambiguity), and ensures that the interpretability of the method is stronger. The gradient descent method is utilized to enable the variables and the parameters to be updated simply, quickly and short in convergence time, and the method can be applied to a large-scale network.
3. The invention adopts a non-negative matrix three-factor decomposition method, not only obtains the membership relation between the nodes and the communities in the network, but also obtains the relation between the communities and the communities, and simultaneously explains the semantic information of each community.
The characteristics are as follows:
a. besides traditional link information, prior information and content information in the network are effectively and fully combined;
b. establishing a model with stronger interpretability by a non-negative matrix factorization method;
c. the updating rule is simple and quick;
d. and the expandability is strong.
Description of the drawings:
FIG. 1 is a diagram of a model framework established by the method of the present invention.
Detailed Description
The invention will be further illustrated by means of a specific example. The examples of the present invention are for better understanding of the present invention by those skilled in the art, and do not limit the present invention in any way.
In order to better solve the problems of network sparseness and semantic ambiguity and fully utilize the information of the network to discover more information about communities, the invention establishes a model which has strong interpretability and effectively combines network links and contents by utilizing matrix decomposition, and adopts a gradient descent method to ensure that the method is easy to understand and has high operation speed. Through the training model of the invention, the user can obtain more information (node membership, community between communities and community semantic information) about the communities in the network. The method can be widely applied to the fields of text classification clustering, commodity recommendation, information retrieval and the like.
The modeling process (i.e., the required information and corresponding matrices, the bonding process, and the effects that can be produced) established by the present invention is shown in FIG. 1.
The technical scheme adopted by the invention integrates the link and node contents of the attribute network and adds auxiliary prior information to improve the performance of community discovery, and comprises the following steps:
1. constructing a matrix decomposition model combining links and contents, wherein the matrix decomposition model comprises three parts of a topology, a priori and a content matrix, and the meaning of each variable in the model is described in detail;
2. optimizing the model, performing partial derivation on unknown variables in the model, and setting the partial derivation as 0 to obtain an updating rule of each variable;
3. collecting and processing data, taking Cora data in Table 2 as an example, and extracting a required adjacency matrix, prior information and a content matrix from an attribute network;
table 2 is a detailed description of the test data
Figure BDA0002250851940000041
Figure BDA0002250851940000051
4. Randomly initializing parameters and unknown variables thereof, carrying out a training process by using the updating rule about each unknown variable obtained in the step 2 by adopting a gradient descent method, putting the processed data set into the model in the step 1 for training, and continuously iterating until the parameter updating is converged;
5. and recording the obtained parameter results and variable results thereof into related documents, namely the membership between nodes and communities, the relationship between communities and the semantic interpretation of communities, and visualizing experimental results.
Table 3 shows the classification Accuracy (AC) and the standard mutual information index (NMI) of the experimental results of the comparison with other community discovery model methods according to the present invention
Method of producing a composite material SSNMF FSSNMF PSSNMF WSCDSM The invention
AC 0.3027 0.6012 0.8224 0.7216 0.8470
NMI 0.4270 0.7615 0.9062 0.7531 0.9401
The invention provides an information fusion method based on non-negative matrix three-factor decomposition. The main contributions of the present invention include three aspects: 1. the method combines various information: the link information, the prior information and the content information are combined, and the information in the network is effectively and fully utilized; 2. the method adopts a non-negative matrix three-factor decomposition based method, so that the model has stronger interpretability and obtains more information about the community; 3. the method provided by the invention can solve the problems of network sparsity and semantic ambiguity more simply and quickly.
The experimental results of the sample data are shown, and are shown in table 3. It should be understood that the embodiments and examples discussed herein are illustrative only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims.
The method has better performance for improving the community discovery capability and exploring more community information.

Claims (3)

1. The attribute network semi-supervised community discovery method based on non-negative matrix three-factor decomposition is characterized by comprising the following steps of:
1) constructing a matrix decomposition model combining links and contents, wherein the matrix decomposition model comprises three parts of a topology matrix, a prior matrix and a content matrix, describing the meaning of each variable in the model in detail, optimizing the model, solving the partial derivative of an unknown variable in the model, and setting the partial derivative as 0 to obtain an update rule of each variable;
2) collecting and processing data, and extracting a required adjacency matrix, prior information and a content matrix from an attribute network;
3) randomly initializing parameters and unknown variables thereof, carrying out a training process by using the updating rules about the unknown variables obtained in the step 2) by adopting a gradient descent method, putting the processed data set into the model in the step 1) for training, and continuously iterating until the updating of the parameters is converged.
2. The method for discovering the attribute network semi-supervised community based on the non-negative matrix tri-factorization as recited in claim 1, wherein the step 1) is specifically as follows:
(1) construction of a non-negative matrix factorization model
Constructing a non-negative matrix three-factor decomposition model based on the link information, describing the membership of nodes and networks and the interrelation between communities, and expressing the model as follows:
Figure FDA0002250851930000011
(2) semi-supervised model with embedded prior information
According to the invention, a must-link constraint is used as prior information to strengthen community structure representation;
the distance between two nodes belonging to the same community in the high-dimensional space in the low-dimensional space should be similar, the Euler distance is adopted to measure the distance between the two nodes, and the semi-supervised community discovery model is constructed by combining the link information:
Figure FDA0002250851930000012
(3) semi-supervised model for introducing node popularity
Since heterogeneity of degrees tends to increase the euler distance between two nodes belonging to a community, the incoming node popularity matrix W eliminates this influence, and the semi-supervised model of incoming node popularity is defined as:
Figure FDA0002250851930000013
(4) semi-supervised model combining links with content
In order to better combine links and contents, the invention adopts the same potential space to approximate the potential space of the connection between nodes, and adopts a bag-of-words method to define a content matrix C, and the content matrix decomposition is defined as:
Figure FDA0002250851930000021
finally, a semi-supervised community semantic discovery model combining links and contents and considering heterogeneity of prior and degree is constructed:
Figure FDA0002250851930000022
3. the method for discovering attribute network semi-supervised community based on non-negative matrix tri-factorization as recited in claim 1, wherein the obtained parameter results and their variable results are recorded into related documents, namely, membership between nodes and communities, relationships between communities and semantic interpretation of communities, and experimental results are visualized.
CN201911033689.7A 2019-10-28 2019-10-28 Attribute network semi-supervised community discovery method based on non-negative matrix three-factor decomposition Pending CN110851732A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911033689.7A CN110851732A (en) 2019-10-28 2019-10-28 Attribute network semi-supervised community discovery method based on non-negative matrix three-factor decomposition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911033689.7A CN110851732A (en) 2019-10-28 2019-10-28 Attribute network semi-supervised community discovery method based on non-negative matrix three-factor decomposition

Publications (1)

Publication Number Publication Date
CN110851732A true CN110851732A (en) 2020-02-28

Family

ID=69598731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911033689.7A Pending CN110851732A (en) 2019-10-28 2019-10-28 Attribute network semi-supervised community discovery method based on non-negative matrix three-factor decomposition

Country Status (1)

Country Link
CN (1) CN110851732A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118094A (en) * 2021-11-12 2022-03-01 国网天津市电力公司 Semantic community discovery method based on non-negative matrix factorization

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035978A (en) * 2014-05-26 2014-09-10 南京泰锐斯通信科技有限公司 Association discovering method and system
US20180165554A1 (en) * 2016-12-09 2018-06-14 The Research Foundation For The State University Of New York Semisupervised autoencoder for sentiment analysis
CN108334580A (en) * 2018-01-25 2018-07-27 重庆邮电大学 A kind of community discovery method of combination link and attribute information
CN109165743A (en) * 2018-07-17 2019-01-08 东南大学 A kind of semi-supervised network representation learning algorithm based on depth-compression self-encoding encoder
CN109299464A (en) * 2018-10-12 2019-02-01 天津大学 Based on the insertion of the theme of network linking and document content, document representing method
CN109583562A (en) * 2017-09-28 2019-04-05 西门子股份公司 SGCNN: the convolutional neural networks based on figure of structure
CN110264372A (en) * 2019-05-16 2019-09-20 西安交通大学 A kind of theme Combo discovering method indicated based on node

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035978A (en) * 2014-05-26 2014-09-10 南京泰锐斯通信科技有限公司 Association discovering method and system
US20180165554A1 (en) * 2016-12-09 2018-06-14 The Research Foundation For The State University Of New York Semisupervised autoencoder for sentiment analysis
CN109583562A (en) * 2017-09-28 2019-04-05 西门子股份公司 SGCNN: the convolutional neural networks based on figure of structure
CN108334580A (en) * 2018-01-25 2018-07-27 重庆邮电大学 A kind of community discovery method of combination link and attribute information
CN109165743A (en) * 2018-07-17 2019-01-08 东南大学 A kind of semi-supervised network representation learning algorithm based on depth-compression self-encoding encoder
CN109299464A (en) * 2018-10-12 2019-02-01 天津大学 Based on the insertion of the theme of network linking and document content, document representing method
CN110264372A (en) * 2019-05-16 2019-09-20 西安交通大学 A kind of theme Combo discovering method indicated based on node

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
罗国华,金弟等: ""一种基于高斯混合模型、结合拓扑与内容的大规模社团发现方法"", 《小型微型计算机系统》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118094A (en) * 2021-11-12 2022-03-01 国网天津市电力公司 Semantic community discovery method based on non-negative matrix factorization

Similar Documents

Publication Publication Date Title
CN111488734B (en) Emotional feature representation learning system and method based on global interaction and syntactic dependency
CN107766933B (en) Visualization method for explaining convolutional neural network
CN108681557B (en) Short text topic discovery method and system based on self-expansion representation and similar bidirectional constraint
Yang et al. Co-embedding network nodes and hierarchical labels with taxonomy based generative adversarial networks
CN112650929B (en) Graph neural network recommendation method integrating comment information
CN112364161B (en) Microblog theme mining method based on dynamic behaviors of heterogeneous social media users
Huang et al. Large-scale heterogeneous feature embedding
CN112559764A (en) Content recommendation method based on domain knowledge graph
CN112417289A (en) Information intelligent recommendation method based on deep clustering
CN113204643B (en) Entity alignment method, device, equipment and medium
CN110851733A (en) Community discovery and emotion interpretation method based on network topology and document content
CN111210307A (en) Scientific and technological service chain intelligent recommendation system and method with response user preference as core
CN110830291A (en) Node classification method of heterogeneous information network based on meta-path
CN110851732A (en) Attribute network semi-supervised community discovery method based on non-negative matrix three-factor decomposition
CN112286996A (en) Node embedding method based on network link and node attribute information
CN112685440A (en) Structural query information expression method for marking search semantic role
CN111209611A (en) Hyperbolic geometry-based directed network space embedding method
CN110765276A (en) Entity alignment method and device in knowledge graph
Wu et al. MDAL: Multi-task dual attention LSTM model for semi-supervised network embedding
CN111291182A (en) Hotspot event discovery method, device, equipment and storage medium
Jian et al. An improved memory networks based product model classification method
Peng et al. TH-SLP: Web Service Link Prediction Based on Topic-aware Heterogeneous Graph Neural Network
Menendez Clustering: finding patterns in the darkness
Zhang et al. Personalized web page ranking based graph convolutional network for community detection in attribute networks
Kumari et al. An Experimental Method for Clustering Sentiment Using Data with Emoticons

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200228