CN113688600B

CN113688600B - Information propagation prediction method based on topic perception attention network

Info

Publication number: CN113688600B
Application number: CN202111049168.8A
Authority: CN
Inventors: 杨成; 石川; 王浩
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-09-08
Filing date: 2021-09-08
Publication date: 2023-07-28
Anticipated expiration: 2041-09-08
Also published as: CN113688600A

Abstract

The invention discloses an information propagation prediction method based on a topic-aware attention network, which integrates topic context and propagation history context into user representation for prediction. The topic context supports propagation mode modeling for a particular topic, while the propagation history context can be further decomposed into user-dependent modeling and location-dependent modeling. Subsequently, we can use the encoded user context to construct a user representation under multiple topics. We then further integrate the user representation by a time decay aggregation module, thereby obtaining a cascade representation. Wherein all of these modules are driven by the characteristics of the information propagation. Therefore, the information propagation prediction method based on the topic-aware attention network can be better fit with the real-world diffusion data and can be predicted more accurately. Furthermore, a predefined topic distribution is required to be used in the traditional topic sensing model, and the topic can be automatically learned by adopting the method.

Description

Information propagation prediction method based on topic perception attention network

Technical Field

The invention relates to the technical field of networks, in particular to an information propagation prediction method based on a topic awareness attention network.

Background

Internet social platforms such as twitter, newwave microblogs and the like attract millions of users, and a large amount of information is transmitted among the users every day. The process of information dissemination, also known as cascading, modeling of dissemination patterns and user behavior is widely used in many fields, such as popularity predictions, epidemiology and personalized recommendations. The next user prediction has been widely studied in recent years as a popular micro-cascade prediction task. This problem is defined as the sequence of user infections ordered over time for a given information item, predicting the next infected user (conventionally, a researcher will use "infection", "activation" or "impact" to describe the user's interaction with the information item).

Conventional micro-cascade prediction methods include independent cascade model (IC) based methods and embedded based methods. Independent cascading models assign an independent diffusion probability between each user pair, and many cascading diffusion models are built on the basic assumption of the model, and the independent cascading models are extended by additionally considering more information, such as continuous time stamps and user attributes. There are also some studies that discuss the impact of topic information on cascade modeling. TIC first studied the information propagation prediction task from the perspective of topic perception, the main idea being by setting a specific topic probability between each user pair.

As research advances, researchers have proposed embedding-based methods for cascade prediction by improving the expressive power of models using representation learning techniques, embedding users into a continuous potential space, and calculating the propagation probability between each user pair by user embedding functions instead of directly estimating a real-valued parameter. However, neither the IC-based approach nor the embedded-based approach takes into account modeling of cascading historical sequence information. Recent work has shown that these models are not as efficient as deep learning models based on consideration of cascading sequences.

With the success of deep learning, recurrent Neural Networks (RNNs) exhibit a strong ability to model information propagation. TopoLSTM extends the standard LSTM model and builds hidden states from Directed Acyclic Graphs (DAGs) extracted from social graphs. The CYAN-RNN and deep Diffuse combine the recurrent neural network with a attentional mechanism to take into account the propagation structure. RecCTIC proposes a Bayesian topology RNN model for capturing tree dependencies. The Diffuse-LSTM uses image information to assist in prediction, and builds a Tree-LSTM model to infer the propagation path. The Forest expands the GRU model and designs an additional structural context extraction strategy to utilize the underlying social graph information.

Recently, some attention networks have been proposed for better capturing propagation dependencies in cascading sequences. The HiDAN builds a hierarchical attention network, adopts an attention mechanism to capture a non-sequence structure in the cascade, digs a real dependency relationship from the cascade, combines time stamp information of a user to design a time attenuation module, combines modeling of user dependency and time attenuation, and greatly improves the expression capacity and the interpretability of the model.

With the development of deep learning technology, some documents model information transmission cascade as an infection sequence, and a good effect is achieved by adopting a cyclic neural network. While cascading is typically represented as a sequence of users, ordered by infection time stamp, the actual propagation process is typically not strictly ordered, which depends on unobserved user connection graphs. Thus, other studies have employed an attention mechanism to capture non-sequential long-term propagation dependencies.

However, existing neural network-based approaches assume that the propagation behavior and pattern of all information items is homogenous. This assumption may not hold in the real world. In a real information dissemination scenario, the user may have different patterns of behavior for information items of different subjects. Intuitively, users 'interests are often diverse, and users' propagation behaviors may be diverse depending on the subject matter of the information item. For example, a user may focus on different people under different topics and then forward different information, respectively, and thus have topic-specific dependencies. The existing neural network-based method rarely utilizes information text, does not consider the propagation mode and user behavior perceived by modeling topics, cannot model the propagation mode and the dependency relationship under specific topics, and limits the expression capacity of the model. Whereas traditional non-neural methods have demonstrated the impact of the subject on the user.

The next user prediction has been widely studied in recent years as a popular micro-cascade prediction task. Traditional modeling typically ignores the textual content of the propagated information item, resulting in learning mixed dependencies from different topics. In contrast, topic awareness modeling aims to explicitly decouple propagation dependencies under a particular topic, enabling more accurate predictions. In fact, traditional non-neural network approaches based on independent cascading models have demonstrated the advantages of topic-aware modeling, which can model behavior from information items under different topics separately. These early approaches, however, were based on strong independent assumptions, which limit the generalization performance of the model, and have proven suboptimal by recent deep learning-based approaches. To our knowledge, no previous study has proposed a neural network-based topic awareness model to mine propagation dependencies under different topics.

Disclosure of Invention

In view of the above, the present invention is to propose an information propagation prediction method based on a topic-aware attention network, which starts from formalized propagation prediction problem and introduces our embedding strategy to encode user/location/text information into vectors. We will then propose a topic-aware layer of attention aimed at capturing the historic propagation dependence and time-decay effects of different topics. Finally, our model will obtain a multi-topic cascade representation through a given topic-aware attention layer, and then predict the next infected user.

In order to achieve the above object, the present invention provides the following technical solutions:

the invention provides an information propagation prediction method based on a topic awareness attention network, which comprises the following steps of S1, integrating topic context and propagation history context into user representation to predict;

s2, supporting propagation mode modeling aiming at a specific topic by the topic context, and further decomposing the propagation history context into user-dependent modeling and position-dependent modeling;

s3, constructing user representations under multiple topics by using the user context obtained by encoding;

s4, further integrating user representations through a time attenuation aggregation module so as to obtain multi-theme cascade representations, and then predicting the next infected user;

wherein each module is driven by the characteristics of the information propagation.

Further, the specific method in step S1 is as follows:

given a set of users U, a set of cascades V and a set of propagated information M, the propagated sequence of the ith information item in M is defined as a cascadeWherein tuple->Representing user +.>At->The moments are forwarded and the sequences are ordered by infection time, the propagation prediction task is defined as a given cascade c _i Is spread text and previous infected user sequence +.>Predicting the next infected user as +.>Where n=1, 2, …, |c _i |-1。

Further, the propagation mode modeling in step S2 is to encode semantic information of the propagation information text by using the pre-trained language model BERT.

Further, propagation mode modeling in step S2 is to embed text encoded by BERT through a fully connected layerConversion to propagation text embedding->

y _i ＝W _x x _i +b _x (1)

Wherein W is _x And b _x Respectively a weight matrix and a bias vector.

Further, the user-dependent modeling in step S2 uses the embedding matrixThe users are encoded, wherein |U| represents the number of users, K, d represents the number of topics and the embedding dimension, respectively.

Further, for cascade sequencesEach user of (a)>The user is embedded as +.>Wherein->Is the user's user embedding under the kth topic.

Further, the position dependent modeling in step S2 is to set a leachable position embedding pos for each position _j Wherein pos _j Shared among all cascades.

Further, the encoding method in step S3 is:

theme context:

computing user embeddings for each topic kAnd propagate text embedding y _i Cosine similarity between them, and normalize it with a softmax function:

where k=1, 2, …, K, andrepresenting user +.>Weights under the kth topic; user embedding of aggregated theme contextThe ingress is denoted->

Propagation history context:

in cascade sequenceIs +.>Is calculated from the following formula:

wherein the method comprises the steps ofFor subject-specific linear mapping of the target user and the previous user, respectively;

user' sIs->Complete attention fraction between->And weight->To describe the propagation history context, and is calculated by the following formula:

wherein the method comprises the steps ofA position dependent score from position m to position j;

complete context-aware multi-topic user representation:

users in the kth topicExpressed as a weighted sum of previously infected users:

weighting of topic contextsAnd position dependent score +.>Shared between the different layers.

Further, the modeling method of the time attenuation aggregation module in step S4 is as follows:

converting the continuous time decay into discrete time intervals:

wherein t is _l By setting the time range [0, T _max ]Divided into L subintervals { [0, t ] ₁ ),..,[t _L-1 ,T _max ) }, T therein _max Is the largest timestamp in the dataset, with each time interval having a corresponding learnable weight for each topic

Further, the method for obtaining the multi-topic cascade representation in step S4 is as follows:

the complete aggregate weights are calculated according to equation (8):

thenJ=1, 2, …, n will be normalized by the softmax function;

for each topic k, calculateIs +.>And using a feedforward neural network with a ReLU activation function to impart model nonlinearity, the output of the subject perceived attention layer is a cascade representation, expressed as

Further, the method for predicting the next infected user in step S4 is as follows:

given cascade sequenceBy measuring user embedment->And cascade embedding->Is used to parameterize the similarity of the next infected user +.>Probability of cascade and user->The interaction probability of (2) is expressed as:

wherein Θ represents all parameters to be learned;

the training goal for predicting an infected user is defined as equation (10):

setting K topic prototype embeddingsAnd encourages user embedding under the k theme +.>Prototype m with corresponding topic _k Similarly, the goal is to maximize:

this term is taken as an additional training target and summed for all users:

the complete training objective function isWhere η is the equilibrium coefficient.

Compared with the prior art, the invention has the beneficial effects that:

the information propagation prediction method based on the topic-aware attention network combines the advantages of topic-specific propagation modeling and deep learning technology, designs a novel and effective topic-aware attention mechanism, and integrates topic context and propagation history context into user representation for prediction. The topic context supports propagation mode modeling for a particular topic, while the propagation history context can be further decomposed into user-dependent modeling and location-dependent modeling. Subsequently, we can use the encoded user context to construct a user representation under multiple topics. We then further integrate the user representation by a time decay aggregation module, thereby obtaining a cascade representation. Wherein all of these modules are driven by the characteristics of the information propagation. Therefore, the information propagation prediction method based on the topic-aware attention network can be better fit with the real-world diffusion data and can be predicted more accurately. Furthermore, a predefined topic distribution is required to be used in the traditional topic sensing model, and the topic can be automatically learned by adopting the method.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

Fig. 1 is a schematic diagram of a topic-aware attention network according to an embodiment of the present invention.

Detailed Description

For a better understanding of the present technical solution, the method of the present invention is described in detail below with reference to the accompanying drawings.

In this section, we will start with formalized propagation prediction problems and introduce our embedding strategy to encode user/location/text information into the vector. We will then propose a topic-aware layer of attention aimed at capturing the historic propagation dependence and time-decay effects of different topics. Finally, our model will obtain a multi-topic cascade representation through a given topic-aware attention layer, and then predict the next infected user. The complete structure of the TAN we propose is shown in fig. 1.

1. Problem definition

Given a set of users U, a set of cascades V and a set of propagated information M, the propagated sequence of the ith information item in M can be defined as a cascadeWherein tuple->Representing user +.>At->The time of day is forwarded and the sequence is ordered by infection time. Following the previous task settings, the propagation prediction task is defined as a given cascade c _i Is spread text and previous infected user sequence +.>Predicting the next infected user +.>Where n=1, 2, …, |c _i |-1。

2. Embedding layer

1) User embedding:

to capture user interests and dependencies under different topics, we use an embedding matrix The users are encoded, wherein |U| represents the number of users, K, d represents the number of topics and the embedding dimension, respectively. For cascade sequencesEach user of (a)>His user is embedded as +.>Wherein->Is the user's user embedding under the kth topic.

2) Position embedding:

to use cascading infection sequence information, we set a leachable location-embedded pos for each location _j Wherein pos _j Shared among all cascades.

3) Text embedding:

we encode semantic information of the text of the propagation information using a pre-trained language model BERT. To measure topic similarity between user-embedded and text-embedded of a particular topic, we embed text encoded by BERT through a fully connected layerConversion to->

y _i ＝W _x x _i +b _x (1)

Wherein W is _x And b _x Respectively a weight matrix and a bias vector.

3. Topic aware attention layer

In this section, we will further encode various context information into a user representation, and then aggregate the user representations in conjunction with time-decay weights, generating a cascading representation for each topic.

3.1 user presentation enhancement

We incorporate the topic context and the propagation history context into the multi-topic user representation, respectively. The propagation history context can be further broken down into user dependence and location dependence. Inspired by the multi-headed attention mechanism, we treat one topic as one specific head (head), and execute the attention mechanism in each topic separately to extract the user and location dependencies.

1) Theme context

Based on the propagation text y _i We propose that under the kth topic, if there is a higher similarity between the user's embedding and the text embedding, the user is enhancedIs embedded in the memory. Specifically, we calculate +/for each topic k>And y _i Cosine similarity between them, and normalize it with a softmax function:

where k=1, 2, …, K, andrepresenting user +.>Weights under the kth topic. User embedding of the aggregate topic context can then be expressed as +.>We can find that when the user corresponding to the kth topic is embedded +.>And y is _i The greater the cosine similarity of (c), the greater the assigned weight, and the further enhanced the user embedding under the topic.

2) Propagation history context

Intuitively, users are infected typically due to propagating text, while there are only a few previously infected users in the propagating sequence. Thus, the goal of propagating history context is to extract and characterize and userInfecting the relevant user. In particular, we employ an attention mechanism to model user dependencies and give these users more attention weights that may affect infection. Formally, there is +.>Is +.>The dependent attention weight of (2) can be calculated by the following formula:

wherein the method comprises the steps ofRespectively for toThe target user as well as the previous user make a topic-specific linear mapping.

Intuitively, we should also focus on the source user as well as the most recently infected user. Note that such dependencies are independent of the particular user, so we propose to model the positional dependencies under each topic. Unlike the previous direct addition of predefined location embeddings and user embeddings, we calculate the location dependent score using a method similar to user dependent modeling. In this way our approach can better capture user independent location dependencies to obtain better predictive performance.

User' sIs->Complete attention fraction between->And weight->May be used to describe the propagation history context and is calculated by the following formula:

wherein the method comprises the steps ofIs a position dependent score from position m to position j.

3) Complete context-aware multi-topic user representations

To fully exploit the topic and propagation history context, we will kthUsers in the themeRepresented as a weighted sum of previously infected users.

Note that we can also superimpose multiple layers of the above operations to obtain a more accurate representation. In this case, the weight of the topic contextAnd position dependent score +.>Shared between the different layers.

3.2 acquisition of a Cascade representation based on time-decaying aggregation

After extracting multi-user representations under multiple topics, we need to aggregate them to obtain a cascading representation under multiple topics. We assume that the user's impact will decay over time and jointly consider the time decay and propagation dependent weights in equation 4.

1) Modeling of time decay effects

Specifically, we have adopted a non-parametric time decay modeling strategy for each topic, inspired by deepehawkes. Formally, a cascade of sequences giving historic infectionFirst converting the continuous time decay into discrete time intervals:

wherein t is _l By setting the time range [0, T _max ]Divided into L subintervals { [0, t ] ₁ ),..,[t _L-1 ,T _max ) }, whereinT _max Is the maximum timestamp in the dataset. For each topic, each time interval has a corresponding learnable weight

2) Computing cascade representations under multiple topics

The complete aggregate weight is to add an additional term to equation 4:

thenJ=1, 2, …, n will be normalized by the softmax function. Finally, for each topic k, we calculate +.>Is +.>And a feedforward neural network with a ReLU activation function is used to impart non-linearities to the model. The output of the topic-aware attention layer is a cascade representation, which can be expressed as +.>

3.3 training goals and model details

Given cascade sequenceBy measuring user embedment->And cascade embedding->Similarity of (3)To parameterize the next infected user +.>Is a probability of (2). As shown below, cascade and user->The interaction probability of (2) can be expressed as:

where Θ represents all parameters that need to be learned.

The training goals we predict to infect a user can then be defined as the following formula:

in addition, we want each topic subspace to reflect different semantics, and the embedding of different users under the same topic should be as similar as possible. Therefore, we set K topic prototype embeddingsAnd encourages user embedding under the k theme +.>Prototype m with corresponding topic _k Similarly. Formalization, we aim to maximize:

we therefore consider this term as an additional training goal and sum all users:

the complete training objective function isWhere η is the equilibrium coefficient. The parameters are optimized by adopting a gradient descent method and an Adam optimizer. To avoid the training process instability, we also apply layer normalization and dropout techniques to the user embedding.

In the present invention we propose model TAN modeling subject specific propagation dependencies. In particular, we combine the textual content information of the modeling propagation item with the user propagation history sequence, and then propose a topic-aware attention mechanism for capturing the history propagation dependency and time decay effects under different topics. TAN can automatically learn topics and benefit from deep learning compared to traditional topic-aware models. Compared with the current model based on the neural network, the TAN not only can effectively model the specific propagation mode of the theme, but also can better capture the user dependence and the position dependence. Meanwhile, the explanatory capacity of the model is improved by extracting the topic information, so that the model makes predictions and gives prediction reasons at the same time, namely, the information item of a specific topic is more likely to be forwarded by who, and the user can be influenced by which user to make forwarding operation according to the attention weight.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may be modified or some technical features may be replaced with others, which may not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An information propagation prediction method based on a topic-aware attention network is characterized by comprising the following steps:

s1, integrating a theme context and a propagation history context into a user representation for prediction;

s3, constructing user representations under multiple topics by using the user context obtained by encoding; the encoding method in step S3 is:

theme context:

where k=1, 2,..k, andrepresenting user +.>Weights under the kth topic; user embedding of the aggregate topic context is denoted +.>

Propagation history context:

in cascade sequenceIs +.>Is calculated from the following formula:

complete context-aware multi-topic user representation:

users in the kth topicExpressed as a weighted sum of previously infected users:

weighting of topic contextsAnd position dependent score +.>Shared between different layers;

2. The method for predicting information dissemination based on a topic-aware attention network according to claim 1, wherein the specific method of step S1 is:

given a set of users U, a set of cascades V and a set of propagated information M, the propagated sequence of the ith information item in M is defined as a cascadeWherein tuple->Representing user +.>At->The time of day is forwarded and the sequence is ordered according to the infection time, and the propagation prediction task is defined as a given levelUnion c _i Is spread text and previous infected user sequence +.>Predicting the next infected user as +.>Where n=1, 2, …, |c _i |-1。

3. The method for predicting information dissemination based on a topic-aware attention network according to claim 2 wherein the modeling of the dissemination pattern in step S2 is to encode semantic information of the text of the dissemination information using a pre-trained language model BERT, in particular embedding the text encoded by BERT through a fully connected layerConversion to propagated text embedding

y _i ＝W _x x _i +b _x (1)

Wherein W is _x And b _x Respectively a weight matrix and a bias vector.

4. The method for predicting information dissemination based on a topic-aware attention network of claim 3 wherein the user-dependent modeling in step S2 uses an embedding matrixThe users are encoded, wherein |U| represents the number of users, K, d represents the number of topics and the embedding dimension, respectively.

5. The method for topic-aware attention network based information propagation prediction of claim 4, wherein for a cascading sequenceEach user of (a)>User embedding asWherein->Is the user's user embedding under the kth topic.

6. The method of claim 5, wherein the modeling of position dependence in step S2 is to set a learnable position embedding pos for each position _j Wherein pos _j Shared among all cascades.

7. The method for predicting information dissemination based on a topic-aware attention network according to claim 1 wherein the modeling method of the time-decay aggregation module in step S4 is:

converting the continuous time decay into discrete time intervals:

wherein t is _l By setting the time range [0, T _max ]Divided into L subintervals { [0, t ] ₁ )，...，[t _L-1 ，T _max ) }, T therein _max Is the largest timestamp in the dataset, with each time interval having a corresponding learnable weight for each topic

8. The method for predicting information dissemination based on a topic-aware attention network of claim 7 wherein the method for obtaining the multi-topic cascade representation in step S4 is:

the complete aggregate weights are calculated according to equation (8):

thenJ=1, 2, n will be normalized by a softmax function;

9. The method for predicting information dissemination of a topic-aware attention network based on claim 1 wherein the method for predicting the next infected user in step S4 is:

wherein Θ represents all parameters to be learned;

the training goal for predicting an infected user is defined as equation (10):

this term is taken as an additional training target and summed for all users: