CN113869404B

CN113869404B - Self-adaptive graph roll accumulation method for paper network data

Info

Publication number: CN113869404B
Application number: CN202111136030.1A
Authority: CN
Inventors: 尹宝才; 贺霞霞; 王博岳; 霍光煜; 孙艳丰
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2024-05-28
Anticipated expiration: 2041-09-27
Also published as: CN113869404A

Abstract

The self-adaptive graph volume accumulation method for paper network data is suitable for the field of data mining. The method comprises the steps of firstly applying a self-adaptive graph convolution network to a depth graph convolution clustering task, adaptively updating a graph structure and learning optimal data representation; secondly, the method creatively provides a fusion module based on an attention mechanism, and the data representations of two parallel networks are fused layer by layer in a weighting mode, and meanwhile the problem of overcomplete of a graph rolling network is effectively relieved. The method mainly solves the technical problems of mining internal structures among all samples, ensuring that a model can capture more complete data structure information, avoiding negative influence of an inaccurate graph structure on clustering performance and effectively fusing heterogeneous information.

Description

Self-adaptive graph roll accumulation method for paper network data

Technical Field

The method is suitable for the fields of data mining, machine learning, pattern recognition and the like, and particularly suitable for clustering tasks of paper networks containing noise and abnormal values.

Background

With the development of social media, a large number of images, videos and microblogs are widely spread on the internet, but most of the data are unlabeled, so that the classification task driven by the data is difficult to realize, and the nature of the underlying structure attribute among the data can provide more remarkable difference information, which motivates the development of depth map convolution clustering.

Wang Chun et al propose an end-to-end graph annotation meaning self-coding clustering model, effectively fuses attribute information and structure information of data, and simultaneously guides the optimization process of a network by utilizing a self-supervision mechanism. Pan Shirui et al propose an anti-regularization graph convolution self-encoder that reconstructs the original data and graph structure, and the anti-training model enhances the robustness of the data representation. However, embedding these graphs into the network creates an excessive smoothing problem, which compromises clustering performance. Bo Deyu et al devised a transfer operator that transferred the data representation learned from the encoder module to the corresponding atlas, while utilizing a self-supervision mechanism to unify two different deep neural architectures.

The existing clustering method based on graph convolution mainly depends on the quality of an initial graph structure, and the graph structure is kept unchanged in the model optimization process, but in actual situations, the graph structure contains noise and abnormal values, and it is difficult to accurately describe the connection relation between data, so that the clustering performance is affected. These methods do not effectively fuse attribute information and structure information of data.

In order to solve the problem, a paper clustering method based on a graph convolution network is provided, and a self-adaptive graph is used for replacing a fixed graph to capture more complete structural information in the process of model optimization; a fusion module based on an attention mechanism is designed, more key difference information is extracted, and the problem that a graph rolling network is too smooth is effectively avoided.

In order to solve the problem that the prior deep picture volume accumulation type method clusters paper network data containing noise, the invention provides a paper clustering method based on a picture volume network. The method comprises the steps of firstly applying a self-adaptive graph convolution network to a depth graph convolution clustering task, adaptively updating a graph structure and learning optimal data representation; secondly, the method creatively provides a fusion module based on an attention mechanism, and the data representations of two parallel networks are fused layer by layer in a weighting mode, and meanwhile the problem of overcomplete of a graph rolling network is effectively relieved. The method mainly solves the technical problems of mining of internal structures among all samples, ensuring that a model can capture more complete data structure information and effectively fusing heterogeneous information.

Disclosure of Invention

The self-adaptive graph-volume accumulation method for the paper network data can effectively solve the defects of the existing deep clustering method, and the self-adaptive graph-volume network is provided, and the self-adaptive graph structure is used for replacing a fixed graph structure in the graph-volume process, so that the model is facilitated to mine more complete internal structure information, and negative influence of an inaccurate graph structure on clustering performance is avoided; the fusion module based on the attention mechanism is provided, heterogeneous information is selectively weighted to extract key information, and the problem that a graph rolling network is too smooth is effectively solved. Fig. 1 shows the overall framework of the proposed method.

The invention is realized by the following technical scheme:

(1) Attribute information is first extracted from the input data using a self-encoder,

H^(l)＝σ(W^(l)H^(l-1)+b^(l))，l＝1，2，…，L

Where H ^(l) represents the data representation learned from the encoder layer I, W ^(l) and b ^(l) represent the weight matrix and bias, respectively, of the layer I that can be learned, L represents the number of network layers of the model, σ (·) represents the nonlinear activation function, where RELU is selected as the activation function.

At the same time, to preserve the characteristics of the original data as much as possible, the reconstructed data is minimizedAnd a reconstruction error between the original input data X, X representing bag-of-word characteristics of the keywords of the sample in the dataset.

Wherein N is the number of samples, and the Frobenius norm is defined as

(2) High-level structural information of the data is captured by an adaptive graph convolution module.

Z^(l+1)＝σ(A^(l+1)F^(l)U^(l+1))，l＝1，2，…，L

Where U ^(l+1) represents the learnable weight matrix of the (l+1) th layer of the adaptive graph convolution module, Z ^(l+1) is the updated node representation of the (l+1) th layer of the module, A ^(l+1) is the learned adaptive graph structure, more accurately reflects the intrinsic structure between samples, and F ^(l) is the fused representation obtained from the attention-mechanism-based fusion module.

Specifically, the adjacency matrix is constructed by computing the inner product of the fused representation F ^(l), mining potential similarities between samples,

The learned adaptive map is then usedAdded to the original map/>The quality of the initial graph structure is enhanced,

Wherein, E is the balance coefficient, and E is set to be 0.5 in the invention.

Finally, minimizing the reconstructed structure in order for the learned intermediate layer data representation Z ^(L/2) to more reflect the dependencies between the dataAnd reconstruction errors between the original input map structure a,

Wherein,Is an adjacency matrix constructed from the inner product of the data representation Z ^(L) of the last layer of the adaptive graph convolution module.

(3) A fusion module based on an attention mechanism is presented to efficiently fuse data representations extracted from an encoder module and an adaptive graph convolution module. Specifically, for the first layer of the network, concatenates the data representations H ^(l) and Z ^(l) learned from the self-encoding module and the adaptive graph convolution module, respectively,

Y^(l)＝[H^(l),Z^(l)] (5)

Where [. Cndot ] is a cascading operation.

From the cascade characteristics Y ^(l), different weights are assigned to H ^(l) and Z ^(l), respectively, according to their relative importance, the fusion representation F ^(l) is finally obtained,

a＝f(Y^(l))

e＝softmax(sigmoid(a)/τ)

W＝mean(e)

F^(l)＝W₁·Z^(l)+W₂·H^(l)

Wherein W ₁ is the weight coefficient assigned to Z ^(l), W ₂ is the weight coefficient assigned to H ^(l), f (·) is a network consisting of three fully connected layers, τ is the calibration coefficient, and in the present invention, τ is set to 10, the sigmoid (·) function acts together with the calibration coefficient to avoid assigning a score close to "1" to the most relevant data representation.

(4) The self-supervised clustering module is referenced to train the end-to-end model.

Where q _ij represents the probability of assigning the ith sample to the jth cluster in the feature representation H ^(L/2) learned from the encoder, and the target distribution p _ij.t_ij represents the probability of assigning the ith sample to the jth cluster in the intermediate layer feature representation Z ^(L/2) learned in the adaptive graph convolution module by amplifying q _ij and normalizing it.

Finally, the proposed overall objective function is:

Where lambda ₁,λ₂ and lambda ₃ are hyper-parameters which balance the importance of the different losses, 1.0,0.01,0.1 respectively.

Randomly initializing weights and deviations in the model, including W ^(l),b^(l) and U ^(l+1), optimally solving the model by minimizing the loss function, learning the weight and deviation parameters in the model, obtaining an optimal data representation Z ^(L/2) when the training times of the model reach 700 times or the value of the loss function fluctuates between +/-1%, then feeding it into the softmax function to obtain a final clustering result C ^*,

C^*＝softmax(Z^(L/2))。

ACC, NMI, ARI and F1 were chosen as standard measurements, with higher values of the index reflecting better performance.

Drawings

Fig. 1 is a frame diagram of the present invention.

Detailed Description

The invention has proved that the above-mentioned method has obvious effect.

The method evaluates on six public datasets, including USPS, HHAR, REUT, DBLP, ACM and CITE datasets.

In order to verify the superiority of the clustering performance of the proposed method, the proposed paper clustering method (AGCC) based on the graph rolling network is compared with several existing most advanced clustering methods of K-means, AE, IDEC, GAE, DAEGC, SDCN.

The clustering results shown in table 1 indicate that in most cases, the clustering performance of the proposed self-adaptive graph roll accumulation method for paper network data is significantly better than that of other comparison methods.

Clustering performance of DAEGCs is superior to IDEC for paper dataset ACM and CITE, which directly provide graph structures. Whereas clustering performance for datasets USPS, HHAR and REUT, GAE and DAEGC, which built the initial graph structure by the K-nearest neighbor method, was not as good as AE and IDEC. It is believed that the graph constructed from K-nearest neighbors does not accurately describe the relationship between the data, resulting in poor clustering performance of GAE and DAEGC. Therefore, a superior adaptive graph learning method is necessary.

The approach in CITE and ACM datasets is greatly improved over the most important baseline approach SDCN, SDCN uses a fixed graph structure during graph convolution, but the structural information between the samples contains noise and outliers, thus negatively impacting clustering performance. The continuously updated graph structure in the proposed method can more accurately reflect the similarity between samples, thereby enhancing the performance of the graph rolling network. Furthermore, AGCC proposes a fusion module based on an attention mechanism, which fully fuses attribute information and structure information of data. These heterogeneous information complements each other with a characteristic representation of efficient learning data, resulting in a significant improvement in clustering performance. And effectively relieves the problem of too smooth of the graph roll-up network.

Table 1: clustering performance contrast on six datasets

Claims

1. The self-adaptive graph volume accumulation method for paper network data is characterized by comprising the following steps of:

H^(l)＝σ(W^(l)H^(l-1)+b^(l))，l＝1，2，…，L

Wherein H ^(l) represents the data representation learned from the encoder layer i, W ^(l) and b ^(l) represent the weight matrix and bias of the layer i, respectively, which can be learned, L represents the number of network layers of the model, σ (·) represents the nonlinear activation function, and RELU is selected as the activation function;

At the same time, to preserve the characteristics of the original data as much as possible, the reconstructed data is minimized And a reconstruction error between the original input data X, X representing bag-of-word characteristics of keywords of the sample in the dataset;

Wherein N is the number of samples, and the Frobenius norm is defined as

(2) Capturing high-order structural information of data through an adaptive graph convolution module;

Z^(l+1)＝σ(A^(l+1)F^(l)U^(l+1))，l＝1，2，…，L

Wherein U ^(l+1) represents a learnable weight matrix of the (l+1) th layer of the adaptive graph rolling module, Z ^(l+1) is a node representation updated by the (l+1) th layer of the module, A ^(l+1) is a learned adaptive graph structure, and F ^(l) is a fusion representation obtained from a fusion module based on an attention mechanism;

The learned adaptive map is then usedAdded to normalized original graph structure/>The quality of the original graph structure is enhanced,

Wherein, E is the balance coefficient, set E to 0.5;

Finally, minimizing the reconstructed structure in order for the learned intermediate layer data representation Z ^(L/2) to more reflect the dependencies between the data And reconstruction errors between the original graph structure a,

Wherein,An adjacency matrix constructed by the inner product of the data representation Z ^(L) of the last layer of the adaptive graph rolling module;

(3) The method comprises the steps of providing a fusion module based on an attention mechanism to efficiently fuse data representations extracted from an encoder module and an adaptive graph convolution module; specifically, for the first layer of the network, concatenates the data representations H ^(l) and Z ^(l) learned from the self-encoding module and the adaptive graph convolution module, respectively,

Y^(l)＝[H^(l),Z^(l)]，

Wherein [. Cndot ] is a cascading operation;

a＝f(Y^(l))

e＝softmax(sigmoid(a)/τ)

W＝mean(e)

F^(l)＝W₁·Z^(l)+W₂·H^(l)，

Wherein W ₁ is a weight coefficient assigned to Z ^(l), W ₂ is a weight coefficient assigned to H ^(l), f (·) is a network consisting of three fully connected layers, τ is a calibration coefficient, and τ is set to 10;

(4) Training an end-to-end model by referring to a self-supervision clustering module;

Wherein q _ij represents the probability of assigning the ith sample to the jth cluster in the middle layer feature representation H ^(L/2) learned by the self-encoder, and the target distribution p _ij;t_ij is obtained by amplifying q _ij and normalizing it to represent the probability of assigning the ith sample to the jth cluster in the middle layer feature representation Z ^(L/2) learned by the adaptive graph convolution module;

Finally, the proposed overall objective function is:

wherein lambda ₁,λ₂ and lambda ₃ are hyper-parameters for balancing the importance of different losses, and 1.0,0.01,0.1 are taken respectively;

C^*＝softmax(Z^(L/2))。