CN113870041A

CN113870041A - Microblog topic detection method based on message passing and graph prior distribution

Info

Publication number: CN113870041A
Application number: CN202111052898.3A
Authority: CN
Inventors: 贺瑞芳; 王浩成; 刘焕宇
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2021-09-07
Filing date: 2021-09-07
Publication date: 2021-12-31
Anticipated expiration: 2041-09-07

Abstract

The invention discloses a microblog topic detection method based on message passing and graph prior distribution, which comprises the following steps of: (1) constructing a user-level social network according to the interactive relation among users on the basis of microblog linguistic data; (2) message-passing based user node embedded representation: integrating content information and structural information of posts in social media by using a graph volume network, and embedding the interactive relation among users into a user node embedded representation; (3) generating topics from an encoder based on a graph prior variation: the embedded representation of the user nodes integrating the user interaction relationship is used as input, the standard Gaussian prior distribution of the variational self-encoder is replaced by the graph prior distribution containing the user interaction, and the correlation among users is considered in the topic inference process. In general, two-stage integrated user interaction is inferred from user node embedded representation and topics. The topic detected by the method better pays attention to the correlation among users, and higher consistency is obtained.

Description

Microblog topic detection method based on message passing and graph prior distribution

Technical Field

The invention relates to the technical field of natural language processing and social media data mining, in particular to a microblog topic detection method based on message passing and graph prior distribution.

Background

The rapid development of the internet brings great progress to our lives. The popularity of social media has enabled everyone to have a platform that can post their opinions and views. Thus, a large amount of short texts are generated every day, and analyzing topics in the short texts is an important task, but the analysis is time-consuming and labor-consuming manually. The topic model can automatically extract document-topic distribution and topic-word distribution, and helps people to analyze texts and master text information quickly.

Traditional topic models, such as LDA, are widely used to find potential topics from a text corpus. Essentially, these methods reveal underlying topics by implicitly capturing word co-occurrence patterns. However, they face a severe data sparseness problem (i.e., sparse post-level word co-occurrence patterns) when applied to short posts.

In order to solve the above problems, there have been some successful studies: (1) polymerization-based methods: some studies have aggregated multiple posts based on heuristic strategies. Aggregation policies include author relationship based aggregation, dialog relationship based aggregation, and the like; BTM and other methods directly model the generation process of biterms (namely word pairs). (2) Representation-based learning methods: some methods reveal topics by modeling co-occurrence patterns of potential concepts, and others effectively fuse context information of words. (3) Method based on social context: such methods jointly model textual information and social network structure information. It models the social network structure and divides messages into leader messages or follower messages. However, standard methods for learning probabilistic generative models, such as Variational techniques (Variational techniques) and Gibbs sampling (Gibbs sampling), have high computational complexity in the posterior reasoning, which prevents the methods from being applied to complex social media scenarios.

A Variational auto-encoder (VAE) is a common parameter inference framework for topic detection, which can identify the structure of data and learn its potential distribution. NVDM is a typical VAE-based topic model. It inputs each document independently into the inference network, calculates the mean and variance of the topic posterior distribution. And then extracting a potential topic vector from the posterior variation distribution. And finally reconstructing the input document by generating a network. It is designed for long documents and IATM is a classical VAE-based neural topic model for social media topic detection. It inputs a plurality of short posts, learns the edge-embedded representation in the social network by mining user dynamic interactions. The edge-embedded representation is also independently input into the VAE to infer the topic-word distribution at the corpus level. In essence, the IATM integrates presentation learning and social context on a VAE basis.

While the previous approach embeds user interactions into the edge representation, the VAE assumes that each data point is independent. Thus, the relevance between users or posts is attenuated when computing the potential semantic vectors. In a social network, interactions may mean related relationships or interests. The latent semantic vector is crucial for topic reasoning. Therefore, the interaction features among the users are more reasonably integrated into the latent semantic vector.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides a microblog topic detection method based on message passing and graph prior distribution. User interaction information in social media is considered from both the user node embedded representation and topic inference stages. In the user node embedded representation stage, the graph volume network learning is utilized to integrate the user node embedded representation of the social network structure information and the post message content information, and meanwhile, the interaction relation of the user is embedded into the user node embedded representation. In the topic reasoning stage, graph prior distribution is introduced, and the interaction relationship is blended into the prior distribution of the VAE, so that the potential topic vector of the user contains the interaction relationship. Finally, the VAE reasoning obtains topic distribution considering user relevance, and obtains topics with higher consistency.

The purpose of the invention is realized by the following technical scheme:

a microblog topic detection method based on message passing and graph prior distribution is characterized by comprising the following steps:

(1) constructing a user-level social network: taking a user as a network node and an interactive relation as an edge in a network;

(2) encoding user interactions through a message passing mechanism: introducing a graph neural network, integrating content information and structural information of posts in social media by using a message transfer mechanism, and embedding interactive relations among users into a user node embedded representation;

(3) generating topics from an encoder based on a graph prior variation: the user node embedded representation integrated with the user interaction relationship is used as input, a standard Gaussian prior in a variational auto-encoder (VAE) adopting the standard Gaussian distribution as prior distribution is replaced by graph prior distribution containing the user interaction relationship, and the correlation among users is considered in the topic inference process.

Further, the step (1) specifically comprises:

constructing a user-level social network G (V, E, T) according to the forwarding and comment relations among users; wherein V ═ { V ═ V_iI 1. ltoreq. i.ltoreq.n is a node set, v_iRepresenting the ith user in the social network, wherein n represents the number of the users; e ═ E_ijI1. ltoreq. i, j. ltoreq. n is the set of edges if v_iRepresented users i and v_jThe represented user j has an interaction, then e_ij1 is ═ 1; if v is_iRepresented users i and v_jThe represented user j has never interacted, then e _ij0; the post published by the user is used as the attribute information of the user node; t ═ T₁，t₂，...，t_nIs a collection of posts, where each post t_iContent representing a post of an ith user; in order to relieve data sparsity, a user-based aggregation strategy is adopted to aggregate all posts of a user, including a source post, a forwarding post and a reply message; obtaining an adjacency matrix A of the user-level social network according to the interactive relation among the users; replacing each word in the post with a corresponding word embedding representation according to the post set and the T to obtain an attribute vector of each user so as to obtain an attribute matrix X of the social network; the word embedding representation corresponding to each word is obtained by random initialization.

Further, the step (2) specifically comprises:

learning a user node embedded representation using network embedding techniques; each post in a social network is short and informal in expression, so the representation of the post is important for learning. Using only Bag of Words (BoW) vectors as a representation of user nodes may face the data sparseness problem, affecting the performance of topic inference. According to the social relevance theory, more similar content is concerned among friends. Thus, sparsity of data is mitigated by modeling user interactions in a social network; the method comprises the steps that the capability of a graph convolution network GCN for aggregating information of surrounding nodes is considered, the interaction behavior among friends is modeled by the graph convolution network GCN, and user node embedding expression is learned; specifically, the microblog topic detection method adopts two layers of GCNs, and the following formula is shown:

wherein

I represents a diagonal matrix, and all diagonal elements are 1;

a degree matrix representing a adjacency matrix; x represents an attribute matrix; w¹And W²Is a parameter of the graph convolution network; using ReLU as activation function, H²The method comprises the following steps of (1) forming a matrix by embedding and representing all user nodes in a social network;

topic detection is an unsupervised approach, so the graph-convolution network has no labels for training. The microblog topic detection method uses an unsupervised loss function, and is shown in the following formula:

given user v_iWith the aim of enabling the user v_iWith its associated user node v_j∈N_iThe similarity of (2) is maximized; the related user nodes refer to a first-order neighbor set N in which edges are directly connected in the social network_iA user node of (1); in the formula (5), h_iIs an embedded representation of user node i in H2, H_jIs an embedded representation of user node j in H2, H_uIs H²Embedded representation of a user node u, v_ue.V represents all user nodes in the social network.

Based on a GCN message transmission mechanism, related contents of first-order neighbor users are spread to the attributes of the connected users, and the data sparsity of a single user is made up; meanwhile, the similarity between the embedded representations of the user nodes of the connected nodes is higher, and the relevance of friends in the social network is further kept.

Further, the step (3) specifically comprises:

step (2) the interactive relation between users is coded into the user node embedded representation and is used as the input of the graph prior variation self-coder in step (3); the variational self-encoder adopting standard Gaussian distribution as prior distribution comprises an encoder and a decoder, wherein the encoder calculates the mean and variance of topic posterior distribution, samples from the topic posterior distribution through a heavy parameter skill to obtain a topic vector, and obtains the topic distribution through softmax; each user node is embedded and represented and then reconstructed by a decoder;

the variational self-encoder adopting the standard Gaussian distribution as the prior distribution can deduce potential topics from independent long documents; for the case of multi-user input, it assumes that the users are independent, which impairs the relevance between users in the topic inference process. The prior distribution in VAE takes a standard gaussian distribution, which results in independence of data points. According to the microblog topic detection method, a graph prior distribution is constructed to replace a standard Gaussian distribution; the graph prior distribution contains user interaction relationships, so that topic vectors of each user obey corresponding interaction relationships among users in a social network. The graph prior distribution is shown in the following formula:

wherein z is_iAnd z_jIs user v_i，v_jVector of potential topics of, p_s(z_i) Using a standard gaussian distribution;

the following form is adopted:

alpha is a hyper-parameter, I represents a diagonal matrix; based on the graph prior distribution, a new lower bound of the graph prior variation of the self-encoder is obtained, and the formula is as follows:

wherein the variation distribution q (z)_i，z_j|h_i，h_j) The following form is adopted:

wherein, mu_i，μ_jAnd

is the mean and variance of the variational distribution; c. C_ijIs z_iAnd z_jThe correlation coefficient of (a); the final loss function is formulated as follows:

the graph prior variation self-encoder obtained from the loss function consists of the following three parts: 1) a variational network which is represented by a user node embedding [ h ]_i]As an input, the mean value μ is calculated_iSum variance

2) Correlation coding of networks with pairs of user nodes [ h ]_i，h_j]As input, calculating the correlation coefficient c of the potential topic vectors of the two users_ij(ii) a 3) Generating a network, as with a variational autocoder using a standard Gaussian distribution as the prior distribution, with a latent variable z_iReconstructing the original user node embedding representation for input yields h'_i. In general, the method preserves the interaction between users from the two stages of user node embedded representation and topic reasoning, and considers the correlation between friends, thereby obtaining a more coherent topic.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

1. in order to relieve the problem of data sparseness in social media topic detection, the method provided by the invention simultaneously considers the post text content and the social network structure and integrates the user interaction relationship. The structural information is used as supplement, so that the context information in the social network is enriched;

2. in order to introduce user correlation, user interaction relationship is integrated from two stages of user node embedded representation and topic reasoning. Comprehensively considering the user relevance in the whole period of social media topic detection;

3. in the user node embedding and representing stage, by utilizing a message transmission mechanism of a graph convolution network, on one hand, information of friend users around each user can be aggregated, and sparsity is relieved; on the other hand, the social network structure can be integrated into the user node embedded representation, and user interaction relations are reserved in the user node embedded representation;

4. in the topic inference stage, topic inference is carried out by using a variational self-encoder based on graph priors. Unlike conventional variational autocoder VAE, the inventive method replaces the standard gaussian distribution with a prior distribution. The graph prior distribution takes user interaction into account, and potential topic vectors of the users obey the interaction structure among the users. The final inferred topic has better consistency.

5. The experimental result of the Sina microblog data set in three months fully shows the effectiveness of the method, and proves the effectiveness of the introduced graph prior distribution on microblog topic mining.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention

FIG. 2a is a visualization of a topic vector inferred from a coder using a standard Gaussian distribution as a standard variation of a priori; fig. 2b is a topic vector visualization graph inferred from the encoder using the variance of the graph prior.

Fig. 3 is a variation of the continuity of the evaluation index topic with the parameter α of the prior distribution of the graph in the specific embodiment.

Detailed Description

The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The specific implementation method of the invention is given by taking a real microblog data set of 3 months as an example. The whole system algorithm process comprises three steps of constructing a user-level social network, representing user node embedding based on message passing and generating topics based on a graph prior variation self-encoder, and is shown in figure 1. The method comprises the following specific steps:

s1, constructing a user-level social network:

the predecessors worked on the Sina microblog and collected related microblogs covering 50 hot topics in three months of 5 months, 6 months and 7 months in 2014. In this embodiment, a user-level social network is constructed based on the microblog corpus. The method comprises the following specific steps: 1) filtering users without forwarding or commenting relations; 2) splicing all posts of a user together to serve as attribute information of the user; 3) according to the interactive relationship between the users, if the interactive relationship exists between the two users, an edge exists between the two user nodes, and otherwise, the edge does not exist. The post text of the user is used as attribute information of the user node in the social network.

Table 1 shows the statistical information for three monthly data sets, as follows: the month 5 dataset comprised 8907 users, 10435 interactions in total, with a vocabulary size of 5914; the 6 month dataset comprised 19293 users in total, 35962 interactions, with a vocabulary size of 9368; the 7-month dataset consists of 16990 users, 20971 interactions, and a vocabulary size of 9663.

TABLE 1 microblog data set statistics

S2, user node embedding representation based on message transmission:

using only Bag of Words (BoW) vectors as a representation of user nodes may face the data sparseness problem, affecting the performance of topic inference. Because each post is short and expressive informally, representation learning of posts in social networks is important. And in consideration of the capability of the graph convolution network for aggregating the information of the surrounding nodes, modeling the interaction behavior between friends by using two layers of GCNs, and learning the embedded representation of the user nodes. Based on the GCN message transmission mechanism, the related content of the neighbor users is spread to the attributes of the connected users, and the data sparsity of a single user is made up. Meanwhile, the similarity between the embedded representations of the user nodes of the connected nodes is higher, and the relevance of friends in the social network is further kept. The loss function for this step is shown in the following equation:

s3, generating topics based on a graph prior variation self-encoder:

with the user node embedded representation as input, topics are inferred from the encoder using the variational score. The variational self-encoder comprises an encoder and a decoder, wherein the encoder calculates the mean and variance of the posterior distribution of the topic, and the formula is as follows:

μ_i＝MLP(h_i)

wherein h is_iRepresenting the i-th user node embedding representation, mu_i，

Mean and variance are indicated, respectively. MLP stands for Multi-Layer Perceptron (MLP). By heavy parameter technique z_i＝μ_i+∈*σ_iPotential topic vector z can be sampled from posterior distribution_i. Topic distribution θ ═ p (t)₁|h)，p(t₂|h)，...，p(t_k| h)) can be obtained by the softmax function:

θ_i＝softmax(z_i)

where h represents the user representation of the input, t₁Representing a first topic, p (t)₁Ih) represents the probability of the first topic appearing. K represents the total number of topics. Each user node embedded representation is reconstructed by the decoder network, and the decoder also selects the MLP. The parameter W of the decoder is the topic-word distribution phi of the corpus_word＝(p(w|t₁)，p(w|t₂)，...，p(w|t_K)). The specific formula is as follows:

d_i＝softmax(θ_iW)

h′_i＝f(W_dd_i+b_d)

wherein, p (w | t)₁) Representing the probability of each word appearing under the first topic. d_iAnd representing the probability value of each word appearing in the attribute information of each user node. W_dParameters representing a neural network, b_dRepresenting the deviation of the neural network. h'_iThe user node representing the decoder reconstruction embeds the representation.

The variational auto-encoder with the standard gaussian distribution as the prior distribution can deduce the potential topics from the independent long documents. For the case of multi-user input, it assumes that the users are independent, which impairs the relevance between users in the topic inference process. The invention first constructs a graph prior distribution to replace the standard gaussian. The prior distribution includes user interaction relationships such that the topic vector of each user obeys the correspondence between users in the social network. And then calculating a loss function according to the new lower bound of the variation, wherein the specific formula of the loss function is as follows:

the meaning of the symbols in the formula is as described above.

In the specific implementation process, the post text of each user node is preprocessed firstly. After aggregation, the post text of each user will contain 50 words. Randomly initialize word embedding and set its dimension to 200. In the GCN, the dimension of the hidden layer is set to 200. In the variational autocoder, the dimension of the first layer encoder is set to 200. The learning rate is set to 0.001. The method employs dropout in both the GCN and the associated VAE to avoid overfitting. Adam is used to optimize the penalty function for each module.

To verify the validity of the method of the invention, the method of the invention (MGTM) is compared with the currently advanced and representative method (BAT)^[1]、BTM^[2]、LCTM^[3]、LeadLDA^[4]、AdjEnc^[5]、IATM^[6]) And variants of the method of the invention (MGTM (S)tandard Gaussian)).

BAT explores the application of bi-directional countermeasure training in neural topic models. It is designed for long documents and faces severe data sparsity problems when applied to short documents.

BTMs learn topics by directly modeling the generation of word pairs throughout a corpus.

LCTM reveals topics by modeling co-occurrence patterns of potential concepts that are used to capture conceptual similarities of words.

The LeadLDA distinguishes posts as leader posts and follower posts to varying degrees where leader information and follower posts contain key topic words.

AdjEnc introduces topic reasoning for network structure in structured long documents such as academic papers, web pages, etc.

The IATM models the dynamic interaction of the user to learn edge embedding of interaction perception, and generates topics using neuro-variational reasoning.

MGTM (Standard Gaussian) degenerates to a standard Gaussian distribution as a prior, verifying the effect of the prior distribution of the map.

The evaluation index of the model performance adopts topic coherence (topic coherence), and the formula is as follows:

tables 2, 3 and 4 show topic consistency results of the method and all comparison methods on the three-month microblog data set respectively. For each data set, consistency scores of top 10(N is 10), 15(N is 15), and 20(N is 20) words of the inferred topic when the topic number K is 50 and 100 are recorded. Higher topic continuity indicates better performance of the model.

TABLE 2 comparison of Performance of the inventive and comparative methods on a 5 month dataset

TABLE 3 comparison of Performance of the inventive and comparative methods on a 6 month dataset

TABLE 4 comparison of Performance of the inventive and comparative methods on a 7 month dataset

As can be seen from the topic consistency results in tables 2, 3 and 4, the interaction relationship of modeling users at two stages of embedding representation and topic reasoning in the user nodes can enable topics to be embedded into certain user relevance, and the topic consistency is further improved. In order to study whether graph prior distribution promotes user interaction in storing user latent topic vectors, fig. 2a and 2b show visual images of the latent topic vectors. Where FIG. 2a is a topic vector inferred from the encoder using a standard Gaussian distribution as a prior variation; fig. 2b is a graph a priori variational inferred topic vector from the encoder. The part marked on the circle can be seen, and the user topic vector with better aggregative property can be obtained by the method. In order to further study the influence of the parameter α on topic continuity in graph prior distribution, fig. 3 shows the relevant change of topic continuity scores and the parameter α on a three-month microblog data set by the method of the present invention.

The above contents are intended to schematically illustrate the technical solution of the present invention, and the present invention is not limited to the above described embodiments. Those skilled in the art can make many changes and modifications to the invention without departing from the spirit and scope of the invention as defined in the appended claims.

Reference documents:

[1]Rui Wang，Xuemeng Hu，Deyu Zhou，Yulan He，Yuxuan Xiong，Chenchen Ye，and Haiyang Xu.2020.Neural Topic Modeling with Bidirectional Adversarial Training.In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.340-350.

[2]Xiaohui Yan，Jiafeng Guo，Yanyan Lan，and Xueqi Cheng.2013.A biterm topic model for short texts.In In Proceedings of the 22nd international conference on World Wide Web.ACM.1445-1456.

[3]Weihua Hu and Jun’ichi Tsujii.2016.A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings.In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics(Volume 2：Short Papers).380-386.

[4]Jing Li，Ming Liao，Wei Gao，Yulan He，and Kam-Fai Wong.2016.Topic Extraction from Microblog Posts Using Conversation Structures.In Proceedings of the 54^th Annual Meeting of the Association for Computational Linguistics(Volume 1：Long Papers).2114-2123.

[5]Ce Zhang and Hady W.Lauw.2020.Topic Modeling on Document Networks with Adjacent-Encoder.Proceedings of the AAAI Conference on Artificial Intelligence 34，04(2020)，6737-6745.

[6]Ruifang He，Xuefei Zhang，Di Jin，Longbiao Wang，Jianwu Dang，and Xiangang Li.2018.Interaction-Aware Topic Model for Microblog Conversations through Network Embedding and User Attention.In Proceedings of the 27th International Conference on Computational Linguistics.1398-1409.

the present invention is not limited to the above-described embodiments. The foregoing description of the specific embodiments is intended to describe and illustrate the technical solutions of the present invention, and the above specific embodiments are merely illustrative and not restrictive. Those skilled in the art can make many changes and modifications to the invention without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A microblog topic detection method based on message passing and graph prior distribution is characterized by comprising the following steps:

2. The microblog topic detection method based on message passing and graph prior distribution as claimed in claim 1, wherein the step (1) specifically comprises:

constructing a user-level social network G (V, E, T) according to the forwarding and comment relations among users; wherein V ═ { V ═ V_iI 1. ltoreq. i.ltoreq.n is a node set, v_iRepresenting the ith user in the social network, wherein n represents the number of the users; e ═ E_ijI1. ltoreq. i, j. ltoreq. n is the set of edges if v_iRepresented users i and v_jThe represented user j has an interaction, then e_ij1 is ═ 1; if v is_iRepresented users i and v_jThe represented user j has never interacted, then e_ij0; the post published by the user is used as the attribute information of the user node; t ═ T₁,t₂,…,t_nIs a collection of posts, where each post t_iContent representing a post of an ith user; in order to relieve data sparsity, a user-based aggregation strategy is adopted for aggregationAll posts of the user comprise a source post, a forwarding post and a reply message; obtaining an adjacency matrix A of the user-level social network according to the interactive relation among the users; replacing each word in the post with a corresponding word embedding representation according to the post set and the T to obtain an attribute vector of each user so as to obtain an attribute matrix X of the social network; the word embedding representation corresponding to each word is obtained by random initialization.

3. The microblog topic detection method based on message passing and graph prior distribution as claimed in claim 1, wherein the step (2) specifically comprises:

learning a user node embedded representation using network embedding techniques; mitigating sparsity of data by modeling user interactions in a social network; the method comprises the steps that the capability of a graph convolution network GCN for aggregating information of surrounding nodes is considered, the interaction behavior among friends is modeled by the graph convolution network GCN, and user node embedding expression is learned; specifically, the microblog topic detection method adopts two layers of GCNs, and the following formula is shown:

wherein

I represents a diagonal matrix, and all diagonal elements are 1;

a degree matrix representing a adjacency matrix; x representsAn attribute matrix; w¹And W²Is a parameter of the graph convolution network; using ReLU as activation function, H²The method comprises the following steps of (1) forming a matrix by embedding and representing all user nodes in a social network;

the microblog topic detection method uses an unsupervised loss function, and is shown in the following formula:

given user v_iWith the aim of enabling the user v_iWith its associated user node v_j∈N_iThe similarity of (2) is maximized; the related user nodes refer to a first-order neighbor set N in which edges are directly connected in the social network_iA user node of (1); in the formula (5), h_iIs H²Embedded representation of user node i, h_jIs H²Embedded representation of user node j, h_uIs H²Embedded representation of a user node u, v_uE, representing all user nodes in the social network by V;

4. The microblog topic detection method based on message passing and graph prior distribution as claimed in claim 1, wherein the step (3) specifically comprises:

according to the microblog topic detection method, a graph prior distribution is constructed to replace a standard Gaussian distribution; the graph prior distribution comprises user interaction relations, so that topic vectors of all users obey corresponding interaction relations among the users in the social network; the graph prior distribution is shown in the following formula:

wherein z is_iAnd z_jIs user v_i,v_jVector of potential topics of, p_s(z_i) Is a monomodal edge distribution, where a standard gaussian distribution is used;

is a two-state edge distribution, which adopts the following form:

wherein the variation distribution q (z)_i,z_j|h_i,h_j) The following form is adopted:

wherein, mu_i，μ_jAnd

is the mean and variance of the variational distribution; c. C_ijIs x_iAnd z_jThe correlation coefficient of (a); the final loss function is formulated as follows:

2) Correlation coding of networks with pairs of user nodes [ h ]_i,h_j]As input, calculating the correlation coefficient c of the potential topic vectors of the two users_ij(ii) a 3) Generating a network, as with a variational autocoder using a standard Gaussian distribution as the prior distribution, with a latent variable z_iReconstructing the original user node embedding representation for input yields h'_i。