CN115409155A - Information cascade prediction system and method based on Transformer enhanced Hooke process - Google Patents

Information cascade prediction system and method based on Transformer enhanced Hooke process Download PDF

Info

Publication number
CN115409155A
CN115409155A CN202210983826.9A CN202210983826A CN115409155A CN 115409155 A CN115409155 A CN 115409155A CN 202210983826 A CN202210983826 A CN 202210983826A CN 115409155 A CN115409155 A CN 115409155A
Authority
CN
China
Prior art keywords
node
embedding
user
global
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210983826.9A
Other languages
Chinese (zh)
Inventor
周帆
余柳
钟婷
徐增
匡平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210983826.9A priority Critical patent/CN115409155A/en
Publication of CN115409155A publication Critical patent/CN115409155A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention discloses an information cascade prediction system and method based on a Transformer enhanced Hox process, which comprises a user embedding module, a global dependence module, a local dependence module, a strength function acquisition module and a popularity prediction module, wherein the position-wise embedding and time coding of a user are firstly acquired as the user embedding, then a path perception hypothesis is introduced according to the view angle of a topological structure, two attention layers are designed according to the two angles of the global embedding and the local mode, the strength function of the Hox process is parameterized, and the coupled time and topological random characteristics of the information cascade diffusion process are learned by combining the Hox process, the global embedding and the local mode so as to predict the popularity; the method expands the traditional Hox process, effectively acquires knowledge from a continuous time domain, and improves the popularity prediction accuracy.

Description

Information cascade prediction system and method based on Transformer enhanced Hooke process
Technical Field
The invention belongs to the technical field of Information processing, relates to Information diffusion (Information diffusion), information Cascade (Information Cascade) and Popularity Prediction (Popularity Prediction) based on Deep Learning (Deep Learning), and particularly relates to a method and a system (Hawkesformer) for introducing Transformer into a Hooke Process (Hawkes Process) to perform coupling modeling on time characteristics and topological structure of Cascade and finally realize Popularity Prediction (Cascade Prediction).
Background
Content sharing through social media platforms such as Twitter, facebook and microblog has become a main channel for expression opinions and reaction opinions of individuals on various topics. This trend has produced a large amount of data describing the manner in which information is disseminated, and has brought unprecedented convenience for generating, communicating, and disseminating information items. The initial distribution of information (e.g., tweet) and subsequent sharing and forwarding (e.g., retweet) forms a concatenation of information, representing what is commonly referred to as the information dissemination process. Similar phenomena are found in other (non-social media) settings, paper quotations, blog spaces, and email forwarding. The large number of user sharing behaviors promotes the rapid and massive spread of information concatenation. A typical task of information concatenation is to predict the scale of a potentially affected user after a certain period of time (tweet, microblog, etc.), i.e., popularity prediction (size of information concatenation), which brings enormous economic and social value in many downstream applications. For example, knowing which type of advertisement has maximized influence may make ad placement more accurate; predicting rumors' potential impact users so that administrators can intervene early to avoid serious consequences, etc.
The existing methods are mainly divided into three categories: (i) Feature-based models that mainly explore feature sets extracted from structures, time series, user content and profiles; however, they rely on a large number of hand-made feature projects and cannot be generalized from one field to another; (ii) Probabilistic generative models, aimed at modeling events by classical examples of point-in-time processes (TPPs), including poisson and hokes processes; however, the existing method makes a strong assumption on a cascading diffusion mechanism, and has limited learning capacity on large-scale cascading data; (iii) The deep learning model focuses more on static time cascade diffusion in a specific snapshot in a discrete time domain, has little focus on the whole evolution process and fully utilizes all cascade dynamics, and mainly uses a simple event sequence model (such as RNN) to model cascade events or is assisted by GNN to separately model time characteristics and topological characteristics; but the two are coupled in nature, and the mode of separated modeling loses cross-domain information and reduces the expression and prediction capability of the model.
Disclosure of Invention
Aiming at the problems of limited learning capacity, cross-domain information loss and the like of the current information cascade prediction method, the invention aims to provide a novel modeling mode for the cascade diffusion Process, generalize the cascade diffusion Process into a Directed Acyclic Graph (DAG) which is continuously diffused in a continuous time domain, introduce a Transformer (attention mechanism) into a Hawkes Process, introduce an attention mechanism into the Hawkes Process to model time characteristics and topological characteristics in the continuous time domain in a coupling manner, realize effective modeling of the cascade diffusion Process and improve the accuracy of popularity prediction.
The general idea of the invention is as follows: two levels of attention modules are connected together to parameterize the intensity function of the hokes process. The strength function ensures that the coupling time and topological dependence of the cascade is modeled in the continuous time domain. Specifically, at a first level, a global (global) dependent module is designed to dynamically capture long-term diffusion processes of events that have occurred in the past in the cascade, and main/non-main path assumptions are proposed to adaptively integrate the diffusion processes onto the underlying DAG. This allows each node participating in the cascade anywhere to update its current hidden state. At the second level, a local module is designed to capture the short-term evolution rate of the information cascade by encoding local patterns in a fixed time slice window. Through the design of two levels, the representation of the nodes on the global and local is taken, and finally, a full connection layer is fed for popularity prediction.
Before a particular flow is given, some basic terms and definitions are given. Given a concatenation of information C k And it is at observation time t o Forwarding history within a network
Figure BDA0003801224340000021
Wherein t is j ∈(0,t o ]L is within the observation time window (t) 0 ,t 0 +t o ) Number of forwarding of t 0 Indicating the start time. Each triplet (t) j ,v j ,u j ) Is represented at t j Time of day, user u j Forwards user v j . For example, the forwarding history of FIG. 1 is:
Figure BDA0003801224340000022
(t 4 ,u 1 ,u 4 ),(t 5 ,u 3 ,u 5 ) For simplicity, the following is omitted
Figure BDA0003801224340000023
In addition, we use
Figure BDA0003801224340000024
Figure BDA0003801224340000025
Representing the set of timestamps before the current time.
Based on the above inventive concept, the information cascade prediction system based on the Transformer enhanced hokes process provided by the invention comprises:
the User embedding module (Initial User indexes) is used for acquiring User position-wise embedding and time coding and taking the User position-wise embedding and the time coding as User embedding;
a Global Dependency Module (Global Dependency Module) for distinguishing all predecessor nodes as main path nodes and non-main path nodes for each user node in user embedding, and obtaining attention probability matrixes of the main path nodes and the non-main path nodes, further obtaining Global Dependency of each node, and obtaining Global embedding of all nodes through full connection;
a Local Dependency Module (Local Dependency Module) for obtaining a Local mode of each node according to a set time slice window, and then performing pooling operation on the Local mode of each node to obtain a pooled representation of each node; finally, generating a hidden vector of each node local mode based on a mask attention mechanism;
the intensity function acquisition module is used for parameterizing an intensity function of the Hox process according to the global embedding vector and the local mode hiding vector;
and the popularity prediction module is used for generating a popularity prediction result by combining the integral of the Hox process and the full-connection processing result of the global embedding and local mode hiding vectors.
The user embedding module, the user embedding X, is composed of position-wise embedding and time coding and is used as the input of a subsequent module (a global dependency module).
The user embedded module comprises a Position-wise embedded acquisition submodule and a time code acquisition submodule.
Position-wise Embedded acquisition submodule, including shared matrix
Figure BDA0003801224340000031
Where D is the user-embedded dimension and L is the maximum number of hops. The jth column of U represents the jth forwarding user, and each cascade of nodes in the same order on the time axis share an initial embedding. The shared matrix U is first randomly initialized and then subsequent context embeddings related to the underlying cascade are dynamically updated by subsequent global dependency modules. The shared matrix U greatly reduces the computational overhead, making the system more efficient. In a cascade, for the j-th forwarded user u j Let v j Representing its one-hot index vector on the shared matrix U, then user U j Position-wise embedding of (A) is determined by (U) v j The index is obtained, wherein
Figure BDA0003801224340000032
Representing the set of one-hot indexes embedded by all users. Thus, the Position-wise embedding of the user, which is derived by the Position-wise embedding derivation submodule, is UV based on the shared matrix U.
2. The time code acquisition submodule (Temporal Encoding), a key component of the present invention, is a two-level attention module, and since the attention mechanism does not consider the sequence dependency, the time code acquisition submodule characterizes the sequence position by using the time code z, which is defined as follows:
Figure BDA0003801224340000033
the temporal coding is a variant from the original transform absolute position coding. Wherein
Figure BDA0003801224340000034
Is time-coded, D is the dimension of the time-coding, [ z (t) j )] i Represents the value of the ith dimension of the code, i ∈ [1, D ]]。
Figure BDA0003801224340000035
Figure BDA0003801224340000036
Is a collection of temporal codes.
Thus, the forwarding sequence
Figure BDA0003801224340000037
The initial user embedding of (a) can eventually be written as: x = UV + Z. The initial user embedding X consists of a learnable shared position-wise embedding UV together with a fixed time coding Z.
The global dependency module is referred to as a global module for short; the global dependency of the information cascade can be seen as an indication that a certain user may promote or suppress another user during long-term diffusion.
The global dependency module includes:
the path perception attention layer is used for distinguishing all the precursor nodes into a main path node and a non-main path node for each user node in user embedding, acquiring long-term dependence attention scores of the main path node and the non-main path node and constructing an attention probability matrix of the main path and the non-main path; acquiring global dependence of each node according to the attention probability matrix of the main path and the non-main path of each node;
and the first output network is used for carrying out full connection processing on the global dependence of each node to obtain global embedding of all nodes.
After the user embedding is obtained, a path perception attention layer is designed in the global module to parameterize the intensity function of the Hawkesformer, the long-term diffusion dependence between the input cascade nodes is captured, and the time topological dependence of the coupling of the main path node and the non-main path node is extracted by using the user complete diffusion dependence extraction sub-module.
The theory of the hokes process states that: the final popularity of information concatenation is related to the past participation of the user, which means that the current node affects not only its immediate (immedate) forwarder, but also non-immediate (non-immedate) forwarders by transitivity. In order to indicate the influence of each node on the future popularity of the cascade from a topological point of view, a path-aware assumption, i.e. a primary path/non-primary path assumption, is therefore proposed in the global module: consider a node on the main path to be current node u j The contribution of (b) is larger, the node on the non-main path is next to u j The arrival contribution of the node is small but still has an impact on the final popularity, which also follows the assumptions of the hokes process. For the current node u j All predecessor nodes
Figure BDA0003801224340000041
Figure BDA0003801224340000042
Are divided into two categories: (1) Nodes on the main path, denoted
Figure BDA0003801224340000043
It includes a front node u j Backtracking to root node u 0 All previous nodes on the entire forwarding path; (2) Nodes on the non-primary path, i.e. all nodes except the primary path node, are marked as
Figure BDA0003801224340000044
An example is given in fig. 2, where for the current node u 4 The nodes on the primary path are represented by (1) and (2), and the nodes on the non-primary path are represented by (3).
Based on the above explanation, the implementation steps of the global dependency module are as follows:
given a current node u j To obtain a node between it and a main path
Figure BDA0003801224340000045
Long term dependent attention score of (c):
Figure BDA0003801224340000046
wherein a, b and
Figure BDA0003801224340000051
respectively representing user embedded matrix X main path Q (Query) matrix
Figure BDA0003801224340000052
Non-main path Q (Query) matrix
Figure BDA0003801224340000053
And K (Key) matrix
Figure BDA0003801224340000054
The transformed users embed vectors, and the three transformation matrices are used for distinguishing roles of the users in long-term dependence. Wherein the content of the first and second substances,<·,·>representing the inner product function. Similarly, current node u j A node in the non-main path
Figure BDA0003801224340000055
Long term dependence attention scores of (a) are:
Figure BDA0003801224340000056
then, attention weighting and calculation are performed through the following formula to obtain the user u j Complete diffusion dependence e j
Figure BDA0003801224340000057
Finally, the context embedding of each node is obtained according to the attention scores of the primary path and the non-primary path of each node: e = { E = { E) 0 ,e 1 ,…,e j ,…,e L Each line in E represents the global dependency of a certain node in the cascade;
Figure BDA0003801224340000058
embedding matrix X via V (Value) matrix on behalf of user
Figure BDA0003801224340000059
The converted user embeds the vector.
Above for the current single user u j Is calculated. In practical operation, the above attention is relied on using matrix parallel to speed up the operation. The matrix calculation flow of global dependency can refer to fig. 3.
Firstly, based on the path perception attention layer, the attention scores of the current node on the main path and the non-main path are calculated through the following formulas:
Figure BDA00038012243400000519
Figure BDA00038012243400000520
wherein M is P
Figure BDA00038012243400000510
One-hot matrix for respectively reserving attention scores of nodes on the main path and the non-main path if
Figure BDA00038012243400000511
Then element
Figure BDA00038012243400000512
Accordingly, if
Figure BDA00038012243400000513
Finally, the attention probability matrix for the primary and non-primary paths can be derived as:
Figure BDA00038012243400000514
where D is the dimension of the input embedding. Mask matrix
Figure BDA00038012243400000515
To prevent the system from peeping into future data if i<j,M i,j =0, otherwise M i,j And = - ∞. Matrix M forces the softmax function to be only at u j Distributing valid long-term dependent attention on previous forwarders to subsequent users (i.e., later than t) j Active user) is assigned 0. Each column vector in matrix A represents u j To its precursor node
Figure BDA00038012243400000516
Including the main path
Figure BDA00038012243400000517
And a non-main path
Figure BDA00038012243400000518
A node of (c).
Then, a matrix form of the global dependency of each node is obtained according to the following formula:
E=(W V X)A
in order to learn more semantic information, the path perception attention layer can be designed as a module based on a multi-head attention mechanism, context embedding and mapping are carried out on different subspaces, the subspaces can use the matrix multiplication parallel operation given in the foregoing, and then a plurality of heads E are calculated 1 ,E 2 ,…,E H Then connecting them into E = W O Concat(E 1 ,E 2 ,…,E H ) Wherein
Figure BDA0003801224340000061
Is an aggregation matrix. Finally, we feed E into the first output network (here the fully connected neural network FCNN) resulting in a global embedding of all cascades:
Figure BDA0003801224340000062
h(t j )=H(:,j)
wherein the content of the first and second substances,
Figure BDA0003801224340000063
is a parameter of the neural network; h (t) j ) Is that
Figure BDA0003801224340000064
J-th column in (1) represents user u j Is indicative of long term dependence.
The local dependency module may be referred to as a local module for short; the rate of increase of the forwarding number in the information concatenation is not stable, and the final popularity is greatly influenced by short-term outbreaks in the propagation process. To better embed the evolution rate of the information cascade, pooled attention is used here to capture local patterns. And performing a pooling operation on the local mode of each current node based on the learned user global embedded representation in the global dependency module.
The local dependency module includes:
the local mode acquisition submodule is used for intercepting a part positioned in front of each node from the global dependence according to a set time slice window to obtain a local mode of each node;
the pooling submodule is used for performing pooling operation on the local mode of each node to obtain pooling representation of each node;
and the mask attention layer is used for dynamically aggregating the local mode pooling representation of each node and the nodes before the node, and generating the hidden vector of the local mode of each node.
Each current node has a local mode, and a fixed time slice window t is set s Utilizing a local mode acquisition submodule to intercept a part positioned in front of each node from the global dependency to obtain a local mode of each node; thus, the local mode of the current node is determined by the current node u j And a global dependency component of some predecessor nodes, denoted as
Figure BDA0003801224340000066
Representing the set of all local patterns of a piece of information concatenation.
For each current node u j Local mode to it by pooling submodules
Figure BDA0003801224340000067
Performing a pooling operation to polymerize a time slice t s Generating a new representation of the feature in (1)
Figure BDA0003801224340000065
Therefore, all local modes
Figure BDA0003801224340000068
The total pooled representation of (a) may be formalized as:
Figure BDA0003801224340000071
each column in (b) corresponds to a pooled representation of the local pattern.
To capture the evolutionary rate associated with cascading short-term outbreaks/falls, the present invention masks by designThe code attention layer incorporates the impact of the local patterns. Dynamically aggregating current time t with a mask attention layer based on a mask attention mechanism j All previous local mode embedding (i.e. pooling representation), generating a concealment vector to summarize the evolution process of all previous modes:
Figure BDA0003801224340000072
g (-) is a linear transformation function, γ = { v (t) = 0 ),v(t 1 ),…,v(t j ),…,v(t L ) Is the set of local dependencies of the evolution rate. The similarity function f (·, ·) is specified as:
Figure BDA0003801224340000073
the intensity function obtaining module parameterizes the intensity function of the hokes process through the global module and the local module to obtain the intensity function lambda (t) of the Hawkesformer. Given a concatenation of information C k And it is at observation time t o Forwarding history within
Figure BDA0003801224340000074
Representing a time stamp t j Before but not including time t j The sequence of (a). Here, the intensity function obtaining module models the continuous dynamics of the cascade diffusion using the following conditional intensity functions:
Figure BDA0003801224340000075
wherein, theta represents the parameter of the model, and the forwarding time t is defined in the interval [ t ] j ,t j+1 ) Where f (x) = β lg (1 + exp (x/β)) is the softplus function, and the parameter β is used to constrain the strength function to be positive; the first current represents the evolution process of the continuous time domain, and alpha controls the importance degree of interpolation nodes in two timestamps; w is a 1 、w 2 The learning weights of the global dependency and the local mode at the time t are respectively, and the strength function lambda (t) represents the inferred text arrival rate at the time t. Both global and local dependencies of the Hawkesformer have expressive power and flexibility, and the design of the intensity function enables the Hawkesformer to capture coupled spatio-temporal dynamics from the continuous time domain.
The popularity prediction module is mainly used for predicting the popularity of the information. According to the Hawkes theory, the survival function at time t is:
Figure BDA0003801224340000076
the function is represented at t n ,t]Without conditional probability of an event occurring in between. Then, we can deduce the probability of observing an event at any time t':
Figure BDA0003801224340000081
finally, the final popularity is predicted through a popularity prediction module by combining the time point process and the output of two levels of attention mechanism modules (a global module and a local module) according to the following formula
Figure BDA0003801224340000082
Figure BDA0003801224340000083
The first term Λ represents the integral of the hokes process, representing the expected magnitude of popularity; t is t o Represents the observation time; t is t p Representing a predicted time; the second term represents the output result of the global embedding and local pattern hidden vector splicing result after passing through the second output network (here, the fully connected neural network FCNN).
The information cascade prediction system based on the transform enhanced Hooke process also comprises an optimization module used for optimizing parameters in the system training process.
A cascade C k Is forwarded toTime
Figure BDA0003801224340000084
The joint likelihood of (c) can be expressed as:
Figure BDA0003801224340000085
wherein
Figure BDA0003801224340000086
Represents until time t j The time set of (c).
Assume that there are N information cascade label samples for training, { C 1 ,C 2 ,…,C N And the optimization module optimizes the loss of the popularity degree and the maximization of the log-likelihood of all information cascades by minimizing Mean Square Logarithm Error (MSLE):
Figure BDA0003801224340000087
y k representing the true popularity of the kth information concatenation label exemplar,
Figure BDA0003801224340000088
and the predicted popularity corresponding to the kth information concatenation label sample is shown.
The Adam gradient descent algorithm is then used to optimize the system network parameters.
The invention also provides an information cascade prediction method based on the Transformer enhanced Hox process, which comprises the following steps:
s1, acquiring user position-wise embedding and time coding, and taking the user position-wise embedding and the time coding as user embedding;
s2, for each user node in user embedding, distinguishing all precursor nodes as a main path node and a non-main path node, respectively obtaining attention probability matrixes of the main path node and the non-main path node, further obtaining the global dependence of each node, and then obtaining the global embedding of all the nodes through full connection processing;
s3, acquiring a local mode of each node according to a set time slice window, and then performing pooling operation on the local mode of each node to obtain pooling representation of each node; then based on a mask attention mechanism, generating a local mode hidden vector of each node;
s4, parameterizing a strength function of the Hox process according to the global embedding vector and the local mode hiding vector;
and S5, integrating the Hox process and the full-connection processing result of the global embedding vector and the local mode hiding vector to generate a popularity prediction result.
The present invention thus achieves a system and method for coupled modeling and prediction of time features and topological features in the continuous time domain by introducing a two-tier attention mechanism into the hokes process. At a first level, a DAG is distinguished according to a main path and a non-main path, and the layer is used for capturing global dependency relationships between nodes, namely coupled time and topological dependency; at the second level, a local pool attention module is proposed to embed the evolution rate of the cascade to simulate bursts and sags of short-term forwarding numbers.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the method, user position-wise embedding and time coding are firstly obtained to serve as user embedding, then a path perception hypothesis is introduced from the view angle of a topological structure, two attention layers are designed from the two angles of global embedding and a local mode, the intensity function of a Hox process is parameterized, and the coupled time and topological random characteristics of an information cascade diffusion process are learned by combining the Hox process, the global embedding and the local mode to predict popularity; the invention expands the traditional Hox process, effectively acquires knowledge from the continuous time domain and improves the popularity prediction accuracy.
2. The invention proposes a two-level attention architecture, which parameterizes the intensity function of the hokes process; firstly, a global module attention layer based on path perception is designed to capture a long-term diffusion dependency relationship between nodes in a dynamic diffusion topology, and then a local module is provided to capture a short-term evolution rate in information cascade.
3. The invention customizes a learnable position-wise user embedding method for the information cascade directed acyclic graph, thereby improving the parameter efficiency and reducing the calculation burden;
4. the method is extremely important for understanding the evolution process of the social network platform and explaining the popular reasons of the cascade; for example, the method and the system can predict the forwarding amount of a certain microblog in a future period of time, explain popular factors, and be used for downstream tasks such as marketing design, rumor prediction, influence maximization and the like.
Drawings
FIG. 1 is a process of information cascade diffusion presenting a Directed Acyclic Graph (DAG). Existing models model the temporal features of node occurrences and the topological features of the occurrences in the graph separately, e.g., temporal features are fed into the RNN and topological features are fed into the GNN. But the two are in a coupling mode in nature, and when the forwarding time of a certain node is determined, the topological structure in the DAG is also determined at the same time; e.g. u 2 Is dependent on u 0 Rather than its previous node u in time 1 (ii) a Thus, DAGs form coupled temporal topological dependencies, whereas separate modeling cannot capture coupled dependencies.
FIG. 2 is a hypothetical of primary and non-primary paths in a global dependency module, distinguished from a chain structure based on an RNN sequence model; this assumption can directly capture the long term dependencies on the primary path (non-primary path).
Fig. 3 is a detailed diagram of the implementation of the assumption of the primary path and the non-primary path in the global dependency module, that is, the implementation of fig. 2.
FIG. 4 is a model diagram of an information cascade prediction system based on a Transformer enhanced Hooke process, which includes an initial user embedding module, a global dependency module, a local dependency module and a prediction module.
FIG. 5 is a visual rendering of an attention probability matrix A representing thermodynamic diagrams (Heatmaps) and cascade structures corresponding to three cascades randomly selected in a Weibo dataset; the diagonal line in the thermodynamic diagram represents the attention of the current node to the current node, and each row represents the attention of the current node to all the predecessor nodes; each entry ranges from [0,1], the darker the entry color, the higher the attention, and the 1 for each row.
Interpretation of terms
Information Cascade (Information Cascade): figure 1 illustrates this process in one example: after a root node issues a piece of public content, friends and followers of the root node can see the public content and then forward the public content one by one. In this way, public content propagates through the edges of the social network and creates information cascades, a typical task of which is to predict the size of a cascade (tweet, microblog, etc.) that is potentially affected after an observation period. The theoretical basis can be found in the literature [ J.Cheng, L.Adamic, P.A.Dow, J.M.Kleinberg, and J.Leskovec.Can cassettes be predictedlin Proc.of WWW,2014 ]
Hokes process (HawkesProcess, HP): the hokes process is a mainstream Point-in-time process (TPPs). HP has three key factors: (1) The influence of the users is greater, and the influence users contribute to the newly forwarded arrival rate, which indicates that the tweets forwarded by the influence users are often forwarded more; (2) A self-excitation mechanism, wherein each forwarding can influence the arrival rate of future new forwarding; and (3) time attenuation effect. The effect of forwarding decays with increasing time. But traditional HPs simplify the arrival rate of events, model the arrival of events using, for example, exponential or power law functions, and make strong assumptions about diffusion mechanisms in cascading events, limiting their learning capabilities on large-scale cascading data.
Self-Attention Mechanism (Self-Attention Mechanism): the self-Attention mechanism is a special case of the Attention mechanism recently proposed in transform [ vaswanni, ashish, et al ] "Attention is all you needed." Advances in neural information processing systems 30 (2017) ], which has been widely applied to various NLP tasks such as machine translation, language modeling and the like, and brings about considerable improvement. It encodes the input sequence by correlating the input tokens, which consist of the matrices Q (Queries), K (Keys) and V (Values). The matrix form of the attention function is:
Figure BDA0003801224340000111
Detailed Description
The invention is further described with reference to the accompanying drawings.
Example 1
For convenience, we will use Twitter (Twitter) concatenation as an example setup in the rest of the invention, but the invention is also applicable to other types of information concatenation, such as micro blogging, academic citations, etc., and we will give examples in the experimental part.
The information cascade prediction system based on the transform enhanced hoxox process provided by this embodiment, as shown in fig. 4, includes a user embedding module, a global dependency module, a local dependency module, a strength function obtaining module, and a popularity prediction module.
(1) User embedded module
And the user embedding module is used for acquiring the user position-wise embedding and the time coding and taking the user position-wise embedding and the time coding as the user embedding.
The user embedded module comprises a Position-wise embedded acquisition submodule and a time code acquisition submodule.
Position-wise embedded acquisition submodule comprising a shared matrix
Figure BDA0003801224340000112
In a cascade, for the j-th forwarded user u j Let v j Representing its one-hot index vector on the shared matrix U, then user U j Position-wise embedding of (A) is determined by (U) v j The index is obtained, wherein
Figure BDA0003801224340000113
Representing the set of one-hot indices embedded by all users. Therefore, based on the shared matrix U, position-wise embedding the acquisition submoduleThe Position-wise embedding of the user to the UV.
The temporal code acquisition submodule characterizes the sequence position by means of a temporal code z, defined as follows:
Figure BDA0003801224340000121
the temporal coding is a variant from the original transform absolute position coding. Wherein
Figure BDA0003801224340000122
Is time coding, D is the dimension of time coding, [ z (t) j )] i Represents the value of the ith dimension of the code, i ∈ [1, D ]]。
Figure BDA0003801224340000123
Figure BDA0003801224340000124
Is a collection of temporal codes.
Thus, the user embedding can ultimately be written as: x = UV + Z. The initial user embedding X consists of a learnable shared position-wise embedding UV together with a fixed time coding Z.
The shared matrix U is initially initialized and its parameters are dynamically updated together by the designed optimization module using back-propagation to get subsequent context embedding related to the underlying cascade.
(2) Global dependency module
And the global dependency module is used for distinguishing all precursor nodes into a main path node and a non-main path node for each user node in the user embedding process, acquiring the attention probability matrix of the main path node and the non-main path node, further acquiring the global dependency of each node, and then performing full connection processing to obtain the global embedding of all the nodes.
The global dependency module includes a path-aware attention layer and a first output network.
The path perception attention layer is used for distinguishing all precursor nodes into a main path node and a non-main path node for each user node in user embedding, acquiring long-term dependence attention scores of the main path node and the non-main path node and constructing an attention probability matrix of the main path and the non-main path; and then obtaining the global dependence according to the attention probability matrix of the main path and the non-main path of each node.
As shown in fig. 2 and 3, a path-aware hypothesis, i.e., a primary path/non-primary path hypothesis, is proposed in the global module. Given a current node u j To obtain it and a certain node in the main path
Figure BDA0003801224340000125
Long term dependent attention score of (c):
Figure BDA0003801224340000126
wherein a, b and
Figure BDA0003801224340000127
respectively representing user embedded matrix X main path Q (Query) matrix
Figure BDA0003801224340000128
Non-primary path Q (Query) matrix
Figure BDA0003801224340000129
And K (Key) matrix
Figure BDA00038012243400001210
The transformed users embed vectors, and the three transformation matrices are used for distinguishing roles of the users in long-term dependence. Wherein, the first and the second end of the pipe are connected with each other,<·,·>represents the inner product function. Similarly, the current node u j A node in the non-main path
Figure BDA00038012243400001211
The long term dependent attention score of (a):
Figure BDA0003801224340000131
then, attention weighting and calculation are performed through the following formula to obtain the user u j Complete diffusion dependence e j
Figure BDA0003801224340000132
And finally, constructing the global dependency of all nodes according to the complete diffusion dependency of each node. The context embedding of each node is obtained according to the attention scores of the primary path and the non-primary path of each node: e = { E = 0 ,e 1 ,…,e j ,…,e L Each line in E represents the global dependency of a certain node in the cascade;
Figure BDA0003801224340000133
embedding matrix X via V (Value) matrix on behalf of user
Figure BDA0003801224340000134
The converted user embeds the vector.
The above is for the current single user u j Is calculated. In practical operation, the above attention is relied on using matrix parallel to speed up the operation.
Based on the path perception attention layer, the attention scores of the current node on the main path and the non-main path are calculated through the following formulas:
Figure BDA00038012243400001314
Figure BDA00038012243400001315
wherein M is P
Figure BDA0003801224340000135
One-hot matrices for preserving the attention scores of the nodes on the primary path and the non-primary path, respectively, if
Figure BDA0003801224340000136
Then element
Figure BDA0003801224340000137
Accordingly, if
Figure BDA0003801224340000138
Finally, the attention probability matrix for the primary and non-primary paths can be derived as:
Figure BDA0003801224340000139
where D is the dimension of the input embedding. Mask matrix
Figure BDA00038012243400001310
To prevent the system from peeping into future data if i<j,M i,j =0, otherwise M i,j And = - ∞. Matrix M forces the softmax function to be only at u j Distributing valid long-term dependent attention on previous forwarders to subsequent users (i.e., later than t) j Active user) is assigned 0. Each column vector in matrix A represents u j To its precursor node
Figure BDA00038012243400001311
Including the main path
Figure BDA00038012243400001312
And nodes on non-primary paths
Figure BDA00038012243400001313
Then, a matrix form of the global dependency of each node is obtained according to the following formula:
E=(W V X)A
in order to learn more of the semantic information,the path perception attention layer is designed into a module based on a multi-head attention mechanism, context is embedded and mapped to different subspaces, and the subspaces can use the matrix multiplication parallel operation given in the foregoing to further calculate a plurality of heads E 1 ,E 2 ,…,E H Then connecting them into E = W O Concat(E 1 ,E 2 ,…,E H ) In which
Figure BDA0003801224340000141
Is an aggregation matrix. And the first output network is used for carrying out full connection processing on the global dependence of all the nodes to obtain the global embedding of all the nodes. In this embodiment, the fully-connected neural network FCNN is used as the first output network, and the global dependencies E of all nodes are sent to the first output network to obtain global embeddings of all nodes:
Figure BDA0003801224340000142
h(t j )=H(:,j)
wherein the content of the first and second substances,
Figure BDA0003801224340000143
is a parameter of the neural network; h (t) j ) Is that
Figure BDA0003801224340000144
J-th column in (1) represents user u j Is indicative of long term dependence.
(3) Local dependency module
The local dependence module is used for acquiring the local mode of each node according to the set time slice window, and then performing pooling operation on the local mode of each node to obtain pooling representation of each node; a hidden vector for each node local pattern is then generated based on the masked attention mechanism.
The local dependency module includes a local pattern acquisition submodule, a pooling submodule, and a masking attention layer.
Local mode acquisition submodule forAnd according to the set time slice window, intercepting the part positioned in front of each node from the global dependency to obtain the local mode of each node. Setting a fixed time slice window t s Utilizing a local mode acquisition submodule to intercept a part positioned in front of each node from the global dependency to obtain a local mode of each node; thus, the local mode of the current node is determined by the current node u j And a global dependency component of some predecessor nodes, denoted as
Figure BDA0003801224340000147
Representing the set of all local patterns of a piece of information concatenation.
And the pooling sub-module is used for performing pooling operation on the local mode of each node to obtain a pooled representation of each node. For each current node u j Local mode to it by pooling submodules
Figure BDA0003801224340000148
Performing a pooling operation to polymerize a time slice t s Generating a new representation form by the features in (1)
Figure BDA0003801224340000145
Therefore, all local modes
Figure BDA0003801224340000149
The total pooled representation of (a) may be formalized as:
Figure BDA0003801224340000146
each column in (a) corresponds to a pooled representation of the local pattern.
And the mask attention layer is used for dynamically aggregating the local mode pooling representation of each node and the nodes before the node, and generating the hidden vector of the local mode of each node. Dynamically aggregating current time t with a mask attention layer based on a mask attention mechanism j All previous local mode embedding (i.e. pooling representation), generating a concealment vector to summarize the evolution process of all previous modes:
Figure BDA0003801224340000151
g (-) is a linear transformation function, γ = { v (t) 01 ),v(t 1 ),…,v(tj),…,v(t L ) Is a set of local dependencies of the evolution rate. The similarity function f (·, ·) is specified as:
Figure BDA0003801224340000152
(4) Intensity function acquisition module
And the strength function acquisition module is used for parameterizing the strength function of the traditional Hox process according to the global embedding vector and the local mode hiding vector. Here, the intensity function obtaining module models the continuous dynamics of the cascade diffusion using the following conditional intensity functions:
Figure BDA0003801224340000153
wherein, theta represents the parameter of the model, and the forwarding time t is defined in the interval [ t ] j ,t j+1 ) Where f (x) = β lg (1 + exp (x/β)) is the softplus function, and the parameter β is used to constrain the strength function to be positive; the first current represents the evolution process of the continuous time domain, and alpha controls the importance degree of interpolation nodes in two timestamps; w is a 1 、w 2 The learning weights of the global dependency and the local mode at the time t are respectively, and lambda (t) represents the inferred text arrival rate at the time t.
(5) Popularity prediction module
And the popularity prediction module is used for generating a popularity prediction result by combining the integral of the Hox process and the full-connection processing result of the global embedding and local mode hiding vectors. Here, the final popularity is predicted according to the following formula
Figure BDA0003801224340000154
Figure BDA0003801224340000155
The first term Λ represents the integral of the hokes process, representing the desired magnitude of popularity; t is t o Represents the observation time; t is t p Representing a predicted time; the second term represents the output result of the global embedding and local pattern hidden vector splicing result after passing through the second output network (here, the fully connected neural network FCNN).
Example 2
The embodiment is a further improvement on the embodiment 1.
The information cascade prediction system based on the transform enhanced hoxon process provided by the embodiment further includes an optimization module, which is used for parameter optimization in a system training process.
Assume that there are N information cascade label samples for training, { C 1 ,C 2 ,…,C N And the optimization module optimizes the loss of the popularity degree and the maximization of the log-likelihood of all information cascades by minimizing Mean Square Logarithm Error (MSLE):
Figure BDA0003801224340000161
y k representing the true popularity of the kth information concatenation label exemplar,
Figure BDA0003801224340000162
and the predicted popularity corresponding to the kth information concatenation label sample is represented.
The Adam gradient descent algorithm is then used to optimize the system network parameters.
Example 3
The present embodiment provides an information cascade prediction method based on a transform enhanced hokes process, which uses the information cascade prediction system of claim 1, and comprises the following steps:
s1, acquiring user position-wise embedding and time coding, and taking the user position-wise embedding and the time coding as user embedding.
In the step, a user embedding module is used for respectively acquiring a sharing position-wise embedding UV and a fixed time code Z, and a final user embedding X = UV + Z is obtained.
S2, for each user node in user embedding, distinguishing all the precursor nodes as a main path node and a non-main path node, respectively obtaining long-term dependence attention scores of the main path node and the non-main path node, further obtaining the global dependence of each node, and then obtaining the global embedding of all the nodes through full connection processing.
Firstly, distinguishing all predecessor nodes as a main path node and a non-main path node for each user node in user embedding by utilizing a path perception attention layer, and acquiring attention probability matrixes of the main path node and the non-main path node; then constructing global dependence of all nodes according to the attention probability matrix of the main path and the non-main path of each node; and then the global embedding of all the nodes is obtained through full connection processing of the first output network.
S3, acquiring a local mode of each node according to a set time slice window, and then performing pooling operation on the local mode of each node to obtain pooling representation of each node; then, based on the masking attention mechanism, a local mode hidden vector for each node is generated.
Firstly, utilizing a local mode acquisition submodule to intercept a part positioned in front of each node from global dependence according to a set time slice window to obtain a local mode of each node; then, performing pooling operation on the local mode of each node through a pooling submodule to obtain pooling representation of each node; and dynamically aggregating each node and the local mode pooling representation of the nodes before the node by using the mask attention layer to generate the hidden vector of the local mode of each node.
S4 is used to parameterize the strength function of the hokes process in terms of global embedding and local pattern concealment vectors.
Here, the strength function of the traditional hokes process is parameterized by using a strength function acquisition module according to the global embedding and local mode hiding vectors, and the strength function of the Hawkesformer model is obtained.
And S5, integrating the Hox process and the full-connection processing result of the global embedding vector and the local mode hiding vector to generate a popularity prediction result.
Here, the popularity prediction module combines the integral of the hokes process and the full-concatenation processing result of the global embedding and local pattern hiding vectors to generate a popularity prediction result.
Application example
The present application uses three different authentic datasets (Twitter, weibo, and APS, first dataset origin reference [ Lilian Weng, filippo Menczer, and Yong-YooloAhn.2013. Visual prediction and community structure in social networks. Scientific Reports3 (2013) ] second dataset origin reference [ Qi Cao, huawei Shen, keting Cen, wentao Ouyang, and Xueqi Cheng.2017. Deep-Hawkeys: bridging the gateway prediction and interpretation of information sources. CIKM.1149-1158 ] third dataset origin reference/transitions.
The three data sets are divided into a training set, a verification set and a test set according to the proportion of 70%, 15% and 15% respectively.
Firstly, training and optimizing the information cascade prediction system (Hawkesformer) based on the transform enhanced hokes process provided in embodiment 2 by using a training set, and verifying the trained system by using a verification set every time the training is finished until the loss function of the system is reached
Figure BDA0003801224340000171
And tends to be stable.
Then, the trained Hawkesformer is tested according to the information cascade prediction method based on the transform enhanced hokes process provided in example 3, and compared with 7 different baseline models (Feature-Deep, deep hawkes, casCN, forrest, vaCas, tempCas, hawkesformer), and a common measurement method MSLE is adopted (the smaller the value, the better the prediction effect).
Table 1: performance comparison of Hawkesformer with baseline model on three datasets
Figure BDA0003801224340000172
Note: the bold values indicate better performance than other settings. All experimental results were best in 10 trials with statistical significance ρ <0.05.
The 7 baseline methods in table 1 are presented below:
Feature-Deep: structural features, temporal features, user features are extracted and input into the two-layer MLP. [ Xu, xovee, et al. "Casflow: expanding resonant structures and propagation unreserved for casting prediction" IEEE Transactions on Knowledge and Data Engineering (2021) "]
DeepHawkes: the deep learning and the Hawkes self-excitation point process are combined, and the gap between the prediction performance and the interpretability is closed. [ Qi Cao, huawei Shen, keting Cen, wentao Ouyang, and Xueqi Cheng.2017.Deep-Hawkes: bridging the gap between prediction and interpretation of information and casting. InCIKM.1149-1158 ]
And (3) CasCN: is a hybrid model, which uses RNN to simulate the evolution process of diffusion by stacking GNNs to obtain node embedding. [ Chen, xueqin, et al. "Information differentiation prediction video recording cassettes conjugation." 2019IEEE 35th international conjugation on data engineering (ICDE). IEEE,2019 ]
FOREST: the multi-scale cascade prediction problem is handled in combination with reinforcement learning and RNN. "Multi-scale Information Diffusion with a Reinforced Recurrent networks" IJCAI.2019 "by Yang, cheng, et al
VaCas: the hierarchical diffusion model and the space-time structure characteristics of information cascade are integrated, and diffusion uncertainty is captured. [ Fan Zhou, xovee Xu, kunpeng Zhang, gocetrajcevski, and Ting Zhong.2020.Variation information diffusion for probabilistic cases compression. ININFOCOM.1618-1627 ]
The TempCas: a heuristic approach is implemented using RNNs to embed the cascade graphs and use LSTM on specially designed attention-based CNNs to learn historical short-term variation trends. "proceeding of the AAAI Conference on Artificial Intelligence insight. Vol.35.No. 1.2021" ]
From the experimental results in table 1, it can be seen that the information cascade prediction system based on the transform enhanced hokes process provided by the invention can greatly improve the accuracy of popularity prediction compared with other baseline models.
In addition, in order to study the effectiveness of the global dependency module provided by the present invention in capturing the long-term diffusion dependency relationship between nodes in the dynamic diffusion topology, the present application example further obtains an attention probability matrix a, as shown in fig. 5. The stepwise analysis was performed as follows:
(1) For smaller (13 users) information concatenation C 1 The information cascade prediction system provided by the invention focuses more on the main path and early users participating in diffusion. For each row of matrix A, we can see that the users on the main path usually have the most prominent color, which verifies our dominant/non-dominant hypothesis in the information propagation (a similar phenomenon is also shown in cascade C) 2 And C 3 ). For cascade C 1 Node 9, hawkesformer detects that 0 → 2 → 9 is the primary path, indicating that node 0 and node 2 have a greater impact on node 9, while other nodes on the non-primary path have a lesser impact on the forwarding behavior of node 9. For node 8, the hawkesformer detects that 0 → 4 → 8 is the primary path, indicating that node 0 and node 4 have a greater impact on node 8, while other nodes on the non-primary node have a lesser impact on the forwarding behavior of node 8.
(2) Information concatenation C for medium (58 users) 2 Long-term dependencies between the root user and subsequent nodes are observed. This suggests that long-term dependencies between nodes in a dynamic diffusion topology need to be modeled, because sequence models such as RNNs are difficult to train on long sequences,and when the sequence is too long it faces the gradient disappearance/explosion problem. In addition, several short-term bursts were observed, which again validates the assumption of the present invention to learn local patterns for information concatenation.
(3) At the same time, a virus transmission cascade C is shown 3 It has 106 users and a maximum depth of 8. Most pushups in this cascade (84%) are from indirect users to the root. We have found that even "tail" users at the end of the diffusion tree may trigger outbreaks (e.g., users 19 and 43). Some users act as hubs to connect other influential users bringing them into the information dissemination process.
Therefore, the information cascade prediction system based on the transform enhanced Hooke process, which is provided by the invention, not only can improve the accuracy of popularity prediction, but also has strong interpretability. Experiments on three real data sets demonstrate the superior performance of the present invention over the most advanced baseline model. The improvement of the performance of the proposed scheme shows that the combination of the advantages of deep learning (Transformer) and a probability model (hokes process) is a new direction for effectively simulating the information cascade diffusion process and improving the prediction accuracy.
In summary, the modeling mode of the information propagation process is important for understanding the information propagation mechanism and the popularity prediction task, the evolution of information diffusion presents a Directed Acyclic Graph (DAG), and the topological features and time are coupled in nature, but the two most important features (topological features and time features) of the cascade are modeled separately in the existing research, so that cross-domain information is lost, and the expression and prediction capability of the model is reduced. The method links a multi-level attention mechanism with the Hox self-excitation point process to realize more effective and reasonable modeling of information cascade so as to achieve more accurate popularity prediction. In particular, using a two-layer attention structure to parameterize the intensity function of the hokes process has been achieved to efficiently acquire representation knowledge from a continuous time domain.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (10)

1. An information cascade prediction system based on a Transformer enhanced hokes process, comprising:
the user embedding module is used for acquiring user position-wise embedding and time coding and taking the user position-wise embedding and the time coding as user embedding;
the global dependency module is used for distinguishing all precursor nodes into a main path node and a non-main path node for each user node in user embedding, acquiring an attention probability matrix of the main path node and the non-main path node, further acquiring global dependency of each node, and then acquiring global embedding of all nodes through full connection;
the local dependence module is used for acquiring the local mode of each node according to the set time slice window, and then performing pooling operation on the local mode of each node to obtain pooling representation of each node; finally, generating a hidden vector of each node local mode based on a mask attention mechanism;
the intensity function acquisition module is used for parameterizing an intensity function of the Hox process according to the global embedding vector and the local mode hiding vector;
and the popularity prediction module is used for generating a popularity prediction result by combining the integral of the Hox process and the full-connection processing result of the global embedding and local mode hiding vector.
2. The information cascade prediction system based on the Transformer-enhanced hoxon process as claimed in claim 1, wherein the user embedding module comprises a Position-wise embedding obtaining sub-module and a time coding obtaining sub-module;
the Position-wise embedded acquisition submodule comprisesSharing matrix U, in a cascade, for j-th forwarded user U j Let v be j Represents its one-hot index vector on the shared matrix U, then user U j Position-wise embedding of (A) is determined by (U) v j Obtaining indexes, wherein V represents a one-hot index set embedded by all users; the Position-wise embedding of the user obtained by the Position-wise embedding acquisition submodule is UV;
the temporal code acquisition submodule characterizes the sequence position by means of the temporal code z, defined as follows:
Figure FDA0003801224330000011
wherein
Figure FDA0003801224330000012
Is time coding, D is the dimension of time coding, [ z (t) j )] i Representing the value of the ith dimension of the code,
Figure FDA0003801224330000013
is a collection of temporal codes;
the user embedding is finally noted as: x = UV + Z.
3. The Transformer-enhanced hokes process-based information cascade prediction system as recited in claim 1, wherein the global dependency module comprises:
the path perception attention layer is used for distinguishing all precursor nodes into a main path node and a non-main path node for each user node in user embedding, acquiring long-term dependence attention scores of the main path node and the non-main path node and constructing an attention probability matrix of the main path and the non-main path; acquiring global dependence of each node according to the attention probability matrix of the main path and the non-main path of each node;
and the first output network is used for carrying out full connection processing on the global dependence of each node to obtain global embedding of all nodes.
4. The information cascade prediction system based on the Transformer enhanced hoxon process as claimed in claim 3, wherein first, based on the path-aware attention layer, the attention scores of the current node on the primary path and the non-primary path are calculated by the following formulas:
P score =(W P X) T (W K X)⊙M P
N score =(W N X) T (W K X)⊙M N
wherein M is P
Figure FDA0003801224330000021
One-hot matrices for preserving the attention scores of the nodes on the primary path and the non-primary path, respectively, if
Figure FDA0003801224330000029
Figure FDA0003801224330000028
Represents a set of primary path nodes, then element
Figure FDA00038012243300000210
Accordingly, if
Figure FDA00038012243300000211
Figure FDA00038012243300000212
A set of non-primary path nodes is represented,
Figure FDA0003801224330000022
finally, the attention probability matrix for the primary and non-primary paths is derived as:
Figure FDA0003801224330000023
where D is the dimension of the input embedding; mask matrix
Figure FDA0003801224330000024
To prevent the system from peeping into future data if i<j,M i,j =0, otherwise M i,j = infinity; each column vector in matrix A represents u j To its precursor node
Figure FDA00038012243300000213
Including the main path
Figure FDA00038012243300000215
And nodes on non-primary paths
Figure FDA00038012243300000214
Then, obtaining a matrix form of the global dependency of each node through a user global dependency extraction submodule: e = (W) V X)A;
And E is fully connected through a first output network to obtain the global embedding of all the nodes in the cascade:
Figure FDA0003801224330000025
h(t j )=H(:,j)
wherein the content of the first and second substances,
Figure FDA0003801224330000026
is a parameter of the neural network; h (t) j ) Is that
Figure FDA0003801224330000027
J-th column in (1) represents user u j Is indicative of long term dependence.
5. The transform-enhanced hoxox process-based information cascade prediction system according to any one of claims 1 to 4, wherein the local dependency module comprises:
the local mode acquisition submodule is used for intercepting a part positioned in front of each node from the global dependence according to a set time slice window to obtain a local mode of each node;
the pooling sub-module is used for performing pooling operation on the local mode of each node to obtain pooling representation of each node;
and the mask attention layer is used for dynamically aggregating the local mode pooling representation of each node and the nodes before the node, and generating the hidden vector of the local mode of each node.
6. The information cascade prediction system based on the Transformer-enhanced hoxon process as claimed in claim 5, wherein the current time t is dynamically aggregated by using a mask attention layer j Embedding all previous local modes, and generating a hidden vector to summarize the evolution process of all previous modes:
Figure FDA0003801224330000031
g (-) is a linear transformation function, γ = { v (t) = 0 ),v(t 1 ),…,v(t j ),…,v(t L ) Is a set of local dependencies of the evolution rate; the similarity function f (·, ·) is specified as:
Figure FDA0003801224330000032
7. the transform-enhanced hokes process-based information cascade prediction system of claim 6, wherein the strength function obtaining module models the continuous dynamics of cascade diffusion using the following conditional strength functions:
Figure FDA0003801224330000033
wherein, theta represents the parameter of the model, and the forwarding time t is defined in the interval [ t ] j ,t j+1 ) Where f (x) = β lg (1 + exp (x/β)) is the softplus function, parameter β is used to constrain the strength function to be positive; the first current represents the evolution process of a continuous time domain, and alpha controls the importance degree of interpolation nodes in two timestamps; w is a 1 、w 2 The learning weights of the global dependency and the local mode at the time t are respectively, and lambda (t) represents the tweet arrival rate at the time t.
8. The information cascade prediction system based on the Transformer-enhanced hoxon process as claimed in claim 7, wherein the popularity prediction module predicts the final popularity according to the following formula
Figure FDA0003801224330000034
Figure FDA0003801224330000035
The first term Λ represents the integral of the hokes process, representing the expected magnitude of popularity; t is t o Represents the observation time; t is t p Representing a predicted time; and the second term represents an output result of the global embedding and local mode hidden vector splicing result after passing through a second output network.
9. The information cascade prediction system based on the Transformer-enhanced hoxon process as claimed in claim 8, further comprising an optimization module for parameter optimization in the system training process; the optimization module optimizes the loss of popularity by minimizing the mean square log error and maximizing the log-likelihood of all information cascades:
Figure FDA0003801224330000041
y k representing the true popularity of the kth information concatenation label exemplar,
Figure FDA0003801224330000042
represents the predicted popularity, l (C) corresponding to the kth information cascade label sample k ) Represents a cascade C k N represents the number of information concatenation label samples.
10. An information cascade prediction method based on a Transformer enhanced Hooke process comprises the following steps:
s1, acquiring user position-wise embedding and time coding, and taking the user position-wise embedding and the time coding as user embedding;
s2, distinguishing all precursor nodes into a main path node and a non-main path node for each user node in user embedding, respectively obtaining long-term dependence attention scores of the main path node and the non-main path node, further obtaining the global dependence of each node, and then obtaining the global embedding of all the nodes through full connection processing;
s3, acquiring a local mode of each node according to a set time slice window, and then performing pooling operation on the local mode of each node to obtain pooling representation of each node; then based on a mask attention mechanism, generating a local mode hidden vector of each node;
s4, parameterizing a strength function of the Hox process according to the global embedding vector and the local mode hiding vector;
and S5, generating a popularity prediction result by combining the integral of the Hox process and the full-connection processing result of the global embedding vector and the local mode hiding vector.
CN202210983826.9A 2022-08-17 2022-08-17 Information cascade prediction system and method based on Transformer enhanced Hooke process Pending CN115409155A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210983826.9A CN115409155A (en) 2022-08-17 2022-08-17 Information cascade prediction system and method based on Transformer enhanced Hooke process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210983826.9A CN115409155A (en) 2022-08-17 2022-08-17 Information cascade prediction system and method based on Transformer enhanced Hooke process

Publications (1)

Publication Number Publication Date
CN115409155A true CN115409155A (en) 2022-11-29

Family

ID=84159909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210983826.9A Pending CN115409155A (en) 2022-08-17 2022-08-17 Information cascade prediction system and method based on Transformer enhanced Hooke process

Country Status (1)

Country Link
CN (1) CN115409155A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115759482A (en) * 2022-12-29 2023-03-07 人民网股份有限公司 Social media content propagation prediction method and device
CN116842398A (en) * 2023-06-27 2023-10-03 哈尔滨工业大学 Topic perception information forwarding prediction method and system based on shielding self-attention network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115759482A (en) * 2022-12-29 2023-03-07 人民网股份有限公司 Social media content propagation prediction method and device
CN116842398A (en) * 2023-06-27 2023-10-03 哈尔滨工业大学 Topic perception information forwarding prediction method and system based on shielding self-attention network

Similar Documents

Publication Publication Date Title
CN112529168B (en) GCN-based attribute multilayer network representation learning method
CN115409155A (en) Information cascade prediction system and method based on Transformer enhanced Hooke process
Short et al. Using blockchain technologies to improve security in federated learning systems
CN113919441A (en) Classification method based on hypergraph transformation network
Derr et al. Epidemic graph convolutional network
Le et al. Federated continuous learning with broad network architecture
CN113868474A (en) Information cascade prediction method based on self-attention mechanism and dynamic graph
Xiao et al. User behavior prediction of social hotspots based on multimessage interaction and neural network
Huang et al. Collaboration in participant-centric federated learning: A game-theoretical perspective
CN112884045A (en) Classification method of random edge deletion embedded model based on multiple visual angles
Eswaran et al. Higher-order label homogeneity and spreading in graphs
Wang et al. Res-TranBiLSTM: An intelligent approach for intrusion detection in the Internet of Things
Pan et al. Privacy-Preserving Multi-Granular Federated Neural Architecture Search–A General Framework
Lia et al. Privacy-preserving machine learning using federated learning and secure aggregation
Akter et al. Edge intelligence-based privacy protection framework for iot-based smart healthcare systems
Zhang et al. Ordering for reduced transmission energy detection in sensor networks testing a shift in the mean of a gaussian graphical model
CN112435034A (en) Marketing arbitrage black product identification method based on multi-network graph aggregation
Yang et al. Horizontal federated learning
CN111292062A (en) Crowdsourcing garbage worker detection method and system based on network embedding and storage medium
CN116187469A (en) Client member reasoning attack method based on federal distillation learning framework
CN115660147A (en) Information propagation prediction method and system based on influence modeling between propagation paths and in propagation paths
Xie et al. Independent asymmetric embedding for information diffusion prediction on social networks
Cheng et al. BHONEM: Binary high-order network embedding methods for networked-guarantee loans
Xu et al. Adaptive Sparse Federated Learning in Large Output Spaces via Hashing
Ma et al. Efl: Elastic federated learning on non-iid data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination