CN116975778A

CN116975778A - Social network information propagation influence prediction method based on information cascading

Info

Publication number: CN116975778A
Application number: CN202310940863.6A
Authority: CN
Inventors: 王剑; 黄梦杰; 庾桂铭; 王章全; 郭世远; 张革; 王京岭; 安镇宙; 杨健
Original assignee: Zhengzhou University
Current assignee: Zhengzhou University
Priority date: 2023-07-28
Filing date: 2023-07-28
Publication date: 2023-10-31

Abstract

The invention discloses a social network information propagation influence prediction method based on information cascading, which comprises the following steps: storing user node information by using a directed graph; step 2: calculating structural feature information of each user node; step 3: extracting an observation sequence and processing data; step 4: modeling cascading features using diffusion random sampling; step 5: encoding time information using a sine and cosine position vector; step 6: converting the time matrix into an input vector of an encoder; step 7: performing a self-attention transform at the encoder layer; step 8: alternately stacking the GAT layer and the encoder layer, and simultaneously fusing the time characteristics and the structural characteristics to perform characteristic transformation; step 9: performing layering Drop operation; step 10: adding the cascade graph, the aggregate vector and the time matrix; step 11: repeating the steps 7-9; step 12: obtaining a final aggregate vector from the encoder; step 13: performing cascade growth prediction; step 14: and processing the data to reduce the prediction error.

Description

Social network information propagation influence prediction method based on information cascading

Technical Field

The invention relates to the field of cascade growth prediction, in particular to a social network information propagation influence prediction method based on information cascade.

Background

The online social network platform provides a broad and rapid channel for the dissemination of information, and multimedia technology also enables information to carry more content. The online social network platform becomes a hotbed for large-scale spreading of the malicious information, and the spreading speed and range of the malicious information under the holding of the network platform are greatly enhanced. Research shows that false information such as rumors, false news and the like is more likely to be focused and spread on social media. If hot spot information with large influence and possibly widely spread can be identified in advance by a method for predicting the information growth scale, and the hot spot information is monitored in real time, countermeasures can be taken timely, the crisis is prevented and solved, and adverse effects caused by rapid propagation of adverse information can be reduced to the greatest extent under the condition of consuming fewer resources.

Information in a social network propagates in a cascade, and the influence of the information is generally represented by the length of the cascade, and the longer the cascade is, the greater the influence of the information is. In recent years, the deep learning technology has obvious superiority in the aspect of end-to-end cascade popularity prediction, and can automatically extract useful information from cascade data. Some research approaches represent the information cascade as a sequence of multiple user nodes and input it into a recurrent neural network (Recurrent Neural Network, RNN) model in order to better mine potential diffusion patterns. In addition, some researchers represent the information cascade as a cascade graph or social network, and apply a graph neural network (Graph Neural Network, GNN) model on the basis of the information cascade graph to extract features of the structure of the information cascade during early propagation.

However, the current research methods have several problems:

(1) RNNs can only learn temporal features from the input order of the concatenated data sequence, ignoring more temporal feature details that may be present in the intermediate temporal sequence. However, time intervals during information propagation, propagation speed per unit time, etc. have all proven to be critical to information propagation prediction problems;

(2) The information cascade length in the same time conforms to the power law distribution, so that the cascade length has great difference. When the cyclic neural network faces overlong sequence input, long-term dependence problems can occur, and the problems of gradient disappearance and gradient explosion can also cause that the network cannot effectively update parameters;

(3) In early cascading prediction tasks, the observed cascading series was dominated by short sequences, as the observation time was too short and the information-affected population fit the power law distribution. However, when constructing a cascade graph based on a short cascade sequence, it is difficult to extract features sufficient to distinguish the predicted cascade length from the graph structure due to an excessively small number of graph nodes, resulting in insufficient amount of information contained in the graph structure to distinguish information cascades of different lengths.

These problems all lead to lower accuracy of predicting social network information propagation influence by the current method.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a social network information propagation influence prediction method based on information cascading, which improves the existing extraction method of time features and structural features, firstly adopts a position coding function and an encoder structure in a transducer to replace the traditional RNN structure for time feature extraction, and simultaneously adopts a data enhancement method based on diffusion random sampling to extract cascading diagram structural features.

In order to achieve the above object, the technical scheme adopted for solving the technical problems is as follows:

a social network information propagation influence prediction method based on information cascading comprises the following steps:

step 1: storing the extracted relationship between the user node and the user with a directed graph;

step 2: calculating structural feature information of each user node in the user global social relation network;

step 3: extracting an observation sequence according to the definition of the information cascade prediction problem, and processing data;

step 4: introducing a part of global features to model cascade features by using diffusion random sampling;

Step 5: encoding time information by using sine and cosine position vectors to obtain a time matrix;

step 6: converting the time matrix into an input vector of an encoder;

step 7: adding the second-order diffusion random sampling cascade diagram and the time vector, and inputting the added result and the aggregate vector into an encoder layer together for self-attention transformation;

step 8: alternately stacking the GAT layer and the encoder layer, and simultaneously fusing the time characteristics and the structural characteristics to perform characteristic transformation;

step 9: taking each multi-head self-attention vector and the aggregate vector as input and taking a new aggregate vector as output;

step 10: outputting an aggregate vector and a diffusion cascade diagram;

step 11: extracting structural features by using GAT to obtain a new diffusion cascade diagram;

step 12: performing layering Drop operation;

step 13: adding the cascade graph, the aggregate vector and the time matrix;

step 14: repeating the steps 7 to 12, and taking the obtained cascade diagram, the aggregate vector and the initial time matrix as the input of the next encoder;

step 15: obtaining a final aggregate vector from the encoder;

step 16: inputting the aggregate vector obtained in the step 15 into a prediction module for prediction of cascade growth, and obtaining a final output;

Step 17: and processing the data and updating the learnable parameters in the network to reduce the prediction error.

Further, the step 1 includes the following:

extracting the relationship between the user node and the user from the online social platform, and storing information by using a directed graph, wherein the graph is called a user global social relationship network; wherein nodes in the directed graph represent users in the social network, edges in the directed graph are used for representing attention relationships among the users, and directions of the edges represent information transfer directions.

Further, the step 2 includes the following:

the information to be calculated includes kernel number, pageRank score, pivot coefficient, authority coefficient, characteristic vector centrality and aggregation coefficient, the characteristics can respectively represent a part of structural characteristics of nodes in the graph, the six attributes are used together to represent a node, a vector formed by the six attributes is called a user global attribute vector, the numerical range of each attribute is different, normalization is carried out on each attribute vector according to the attribute vector of the user, and subsequent numerical calculation is carried out on the attribute vector after normalization.

Further, the step 3 includes the following:

Information in the social network is stored in a form of information cascade, each piece of information corresponds to a cascade sequence, and each element in the cascade sequence is a binary group formed by a user number and information sending time; the number of elements in the cascade sequence represents the number of users affected by the information, the elements in the sequence are arranged according to time sequence, the information transmission track is represented, and the number of the users affected by the final increase is predicted according to the information of a plurality of hours before the information transmission, namely the number of the subsequent nodes is predicted according to the previous nodes in the cascade sequence.

Further, the step 4 includes the following:

for one piece of information cascade data, a part of global features are introduced by adopting a data enhancement method of diffusion random sampling, so that modeling of cascade features is facilitated; firstly, determining the order of diffusion random sampling according to the stacking layer number of a graph neural network, searching all neighbor nodes of K-order nodes in a social relation network, and eliminating the sampled nodes; randomly sampling a maximum of 128 nodes from the rest nodes as K+1st-order nodes until the specified order is sampled; and sampling twice to obtain a second-order diffusion random sampling cascade diagram of the cascade sequence, namely a subgraph containing nodes and partial neighbor nodes thereof.

Further, the step 5 includes the following:

for time information in a cascade sequence, a time number is converted into a vector through a special position coding function, the time is converted into the vector, the time and other node attributes are operated together, meanwhile, the time converted into the vector is ensured to keep the characteristic of the most basic time, and the conversion method is as follows:

determining the dimension n of the converted vector TE required to be converted at each time point;

determining the value of the kth element in the vector TE, if k is odd, thenIf k is even +.>

The above transformation is performed for all times in the concatenated sequence, resulting in a time matrix.

Further, the step 6 includes the following:

for the time matrix, a vector of all zeros is added at the head as the input vector to the space-time aggregate embedded partial encoder.

Further, the step 7 includes the following:

defining an N-dimensional all-0 aggregation vector for aggregating the attributes of all information cascade nodes in the transformation process of the neural network, adding the second-order diffusion random sampling cascade graph obtained in the step 4 and the time vector obtained in the step 6, inputting the added result and the aggregation vector into an encoder layer of a space-time aggregation embedding module together, and outputting the aggregation vector and the diffusion cascade graph after self-attention transformation;

In the CasDiffGNN model, the encoder of each layer generates an aggregate vector representing the characteristics of the whole graph, and the vector does not have corresponding nodes in the graph and does not participate in the characteristic transformation of GAT;

the process of self-attention transformation is as follows:

(1) Calculating a self-attention conversion result of the user node: defining three matrices W that can update parameters by back propagation ^Q ，W ^K ，W ^V Respectively with each user input vectorMultiplying the row matrix to obtain Q, K and V vectors of each user node; respectively carrying out vector multiplication on the Q vector of one user and the K vectors of all other users, dividing the calculated result by the evolution of the vector dimension, and calculating a Softmax function; the Softmax function can "compress" one K-dimensional vector z containing arbitrary real numbers into another K-dimensional real vector σ (z) such that each element ranges between (0, 1) and the sum of all elements is 1; the correlation degree between other users and the user can be obtained through the calculation result, the correlation degree is used as a weight, the V vectors of all the user nodes are weighted and summed, and a group of self-attention heads of the user nodes are calculated;

(2) Defining a plurality of parameter matrices W ^Q ，W ^K ，W ^V Calculating a plurality of groups of self-attention heads of the user nodes, and splicing the self-attention heads of all users to obtain a multi-head attention matrix; defining a linear neural network layer, and inputting a plurality of attention heads of each user into the linear neural network layer to obtain the output of the self-attention conversion of the user.

Further, the step 8 includes the following:

taking the diffusion cascade diagram obtained in the step 7 as the input of a GAT layer, and selecting the GAT model as a method for extracting structural features in a CasDiffGNN model; in the GAT model, the neighbor nodes perform weighted calculation through the attention coefficient, and each node only considers the information of the local neighbor nodes; the CasDiffGNN model adopts a method of alternately stacking an encoder layer and a GAT layer, performs depth fusion on time characteristics and structural characteristics, performs single-layer time aggregation transformation on information before data is input into the GAT layer, and integrates the time characteristics into node characteristics for calculation; introducing global node features and temporal features of the graph through a self-attention mechanism of the encoder layer; the GAT layer can adaptively calculate the contribution weight of each neighbor node to the current node due to the adoption of the attention mechanism, and the definition of the attention weight of the GAT layer u to the v node and the output is as follows:

wherein ,is the output of the layer 1 GAT layer, < >>Is the weight vector of the first layer for node characteristic transformation, N _v Representing v set of neighbor nodes, alpha _uv Represents the attention weight of u to the v node, < +.>A parameter matrix representing the attention of a node in layer I ^(l) Is the weight vector parameter of the first layer;

finally, a new diffusion cascade diagram is obtained.

Further, the step 9 includes the following:

and (3) performing hierarchical Drop operation on the nodes in the diffusion cascade graph obtained in the step (8), namely discarding the outermost nodes in the network layer by layer according to the reverse sampling sequence to obtain the cascade graph.

Further, the step 10 includes the following:

and adding the cascade diagram obtained in the step 9 and the initial time matrix, wherein the added result and the aggregate vector obtained in the step 7 are taken as the input of the next space-time aggregate embedding part Encoder layer.

Further, the step 12 includes the following:

inputting the cascade diagram, the aggregate vector and the initial time matrix obtained in the step 11 into an encoder, and obtaining a final aggregate vector from the result output by the encoder layer; the final aggregate vector aggregates the full graph features of the GAT layers with different depths, wherein the cascade graph features are taken as the main and the neighbor nodes as the auxiliary; in the aggregation process, the node which is farther from the cascade graph has weaker contribution to the final aggregation vector; the vector is the final embedded vector of the information cascade output by the space-time aggregation embedded module.

Further, the step 14 includes the following:

square difference between the predicted result and the real result is calculated, and average value is calculated on the square difference of all data, and the average error is used as the average error of the neural network model and is used for evaluating the performance of the network model; the learnable parameters in the network are updated by means of a counter-propagating gradient descent to achieve a result of reduced prediction error.

Compared with the prior art, the invention has the following advantages and positive effects due to the adoption of the technical scheme:

the invention discloses an information cascade-based social network information propagation influence prediction method, which uses a data enhancement method of diffusion random sampling to introduce a part of global features so as to better model cascade structural features, adopts sine and cosine position vector coding time features, uses GAT to extract features of a diffusion random sampling cascade graph, and alternately stacks a coding layer used for extracting the time features and represented by aggregation cascade nodes with the GAT layer to form a space-time aggregation embedding module so as to realize the depth fusion of the time features and the structural features. The effect of enhancing the influence of the time characteristic and the structural characteristic on the prediction result is achieved. Aiming at the scene of insufficient features of the information cascade in the early stage of propagation, the method has higher prediction accuracy.

Drawings

In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings that are required to be used in the description of the embodiments will be briefly described below. It is evident that the drawings in the following description are only some embodiments of the invention and that other drawings may be obtained from these drawings by those skilled in the art without inventive effort. In the accompanying drawings:

FIG. 1 is a schematic flow chart of a social network information propagation influence prediction method based on information cascading;

FIG. 2 is a block diagram of a social network information propagation influence prediction method based on information concatenation;

FIG. 3a is a statistical graph (microblog) of the length distribution of the observation cascade and the prediction cascade according to the invention;

FIG. 3b is a statistical graph (twitter) of the length distribution of the observation cascade and the prediction cascade according to the invention;

FIG. 3c is a statistical plot of the length distribution of the observation cascade and prediction cascade of the present invention (bean cotyledon);

FIG. 3d is a statistical plot (Synthetic) of the observed and predicted cascade length distributions of the present invention;

FIG. 4a is a graph of predicted loss versus observed time (microblog) of the present invention;

FIG. 4b is a graph of predicted loss versus observed time (twitter) according to the present invention;

FIG. 4c is a graph of predicted loss versus observed time for the present invention (bean cotyledon);

FIG. 5a is a graph of predictive loss versus number of sample layers (microblog) of the present invention;

FIG. 5b is a graph of the predicted loss versus the number of sample layers (twitter) of the present invention;

fig. 5c is a graph of the predicted loss versus the number of sampling layers (bean cotyledon) according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment provides a social network information propagation influence prediction method based on information cascading. According to the method, a social relation network is utilized, a random diffusion method is adopted to obtain the neighborhood of nodes in an information cascade sequence, a fixed number of nodes are randomly sampled in layers from the neighborhood, and a second-order diffusion random sampling cascade graph is constructed to serve as input of the GNN. And then discarding the outermost neighborhood nodes layer by layer in the process of extracting structural features by the GNN layering to reduce the influence of the number of the graph nodes on the operation speed and the result. We constructed an end-to-end cascading growth prediction model, casDiffGNN, to evaluate the structural feature extraction method presented herein. In CasDiffGNN, a two-layer graph annotation network (Graph Attention Networks, GAT) is used to perform feature extraction on a diffusion random sampling cascade graph. And the encoder layer for extracting the time features and the aggregation cascade node representation is alternately stacked with the GAT layer to form a space-time aggregation embedded module so as to realize the depth fusion of the time features and the structural features. Finally, the aggregate vector output by the encoder layer is input into the fully connected layer for prediction of the cascade growth.

As shown in fig. 1 and 2, the embodiment discloses a social network information propagation influence prediction method based on information cascading, which mainly comprises the following steps:

further, the step 1 includes the following:

extracting the relationship between the user node and the user from the online social platform, and storing information by using a directed graph, wherein the graph is called a user global social relationship network; wherein nodes in the directed graph represent users in the social network, edges in the directed graph are used for representing attention relationships among the users, and directions of the edges represent information transfer directions. For example, node a points to node B, indicating that user B is a fan of user a, and that user a sends information to user B.

further, the step 2 includes the following:

further, the step 3 includes the following:

information in the social network is stored in a form of information cascade, each piece of information corresponds to a cascade sequence, and each element in the cascade sequence is a binary group formed by a user number and information sending time; the number of elements in the cascade sequence represents the number of users affected by the information, the elements in the sequence are arranged according to time sequence, the information transmission track is represented, and the number of the users affected by the final increase is predicted according to the information of a plurality of hours before the information transmission, namely the number of the subsequent nodes is predicted according to the previous nodes in the cascade sequence. And assuming that the observation time of the cascade sequence is 3 hours, extracting elements in the first 3 hours in the information cascade sequence to form an observation sequence, and predicting the number of the remaining elements through the time characteristics and the structural characteristics of the observation sequence. Since the information affects the number of people to be power distribution, the actual predicted value is logarithmically processed based on 2.

Further, the step 4 includes the following:

further, the step 5 includes the following:

for time information in a cascade sequence, a time number is converted into a vector through a special position coding function, the time is converted into the vector, so that the time and other node attributes can be operated together, meanwhile, the time converted into the vector is ensured to keep the characteristics of the most basic time, such as the distance relation among different times, and the like, and the conversion method is as follows:

Step 6: converting the time matrix into an input vector of an encoder;

further, the step 6 includes the following:

further, the step 7 includes the following:

the process of self-attention transformation is as follows:

(1) Calculating a self-attention conversion result of the user node: defining three matrices W that can update parameters by back propagation ^Q ，W ^K ，W ^V Respectively performing matrix multiplication operation with each user input vector to obtain Q, K and V vectors of each user node; respectively carrying out vector multiplication on the Q vector of one user and the K vectors of all other users, dividing the calculated result by the evolution of vector dimension, and calculating a Softmax function (normalized exponential function); the Softmax function can "compress" one K-dimensional vector z containing arbitrary real numbers into another K-dimensional real vector σ (z) such that each element ranges between (0, 1) and the sum of all elements is 1; the correlation degree between other users and the user can be obtained through the calculation result, the correlation degree is used as a weight, the V vectors of all the user nodes are weighted and summed, and a group of self-attention heads of the user nodes are calculated;

further, the step 8 includes the following:

taking the diffusion cascade diagram obtained in the step 7 as the input of a GAT layer, and selecting the GAT model as a method for extracting structural features in a CasDiffGNN model; the GAT model may consider the importance of each node alone, as compared to the more common GCN model. In the GAT model, each neighbor node has its own unique weights that are learned rather than fixed. This means that GAT can more accurately capture information transfer between nodes. The neighbor nodes perform weighted calculation through the attention coefficients, each node only considers the information of the local neighbor nodes, and if global node information exchange can be performed, the performance of the model can be improved; the transducer of the encoder layer can transmit information among global nodes, the CasDiffGNN model adopts a method of alternately stacking the encoder layer and the GAT layer, the time characteristics and the structural characteristics are subjected to depth fusion, the information is subjected to single-layer time aggregation transformation before being input into the GAT layer, and the time characteristics are fused into the node characteristics to be calculated together; introducing global node features and temporal features of the graph through a self-attention mechanism of the encoder layer; the GAT layer adopts a attention mechanism, so that the contribution weight of each neighbor node to the current node can be calculated in a self-adaptive manner, thereby better processing the sparse large-scale graph and reducing the use of a storage space; the definition of the attention weight and output of GAT layer u to v node is as follows:

finally, a new diffusion cascade diagram is obtained.

Step 9: performing layering Drop operation;

further, the step 9 includes the following:

and (3) performing hierarchical Drop operation on the nodes in the diffusion cascade graph obtained in the step (8), namely discarding the outermost nodes in the network layer by layer according to the reverse sequence of sampling to obtain the cascade graph. The layering Drop operation can reduce the influence of noise data in the extra information on the performance of the model, and meanwhile, the operation speed is increased, so that the model can discard redundant characteristic information in time in the characteristic extraction process.

Step 10: adding the cascade graph, the aggregate vector and the time matrix;

further, the step 10 includes the following:

Step 11: repeating the steps 7 to 9, and taking the obtained cascade diagram, the aggregate vector and the initial time matrix as the input of the next encoder;

step 12: obtaining a final aggregate vector from the encoder;

further, the step 12 includes the following:

inputting the cascade diagram, the aggregate vector and the initial time matrix obtained in the step 11 into an encoder, and obtaining a final aggregate vector from the result output by the encoder layer; the final aggregate vector aggregates the full graph features of the GAT layers with different depths, wherein the cascade graph features are taken as the main and the neighbor nodes as the auxiliary; in the aggregation process, the node which is farther from the cascade graph has weaker contribution to the final aggregation vector, so that the influence of redundant information in the neighborhood on the prediction can be prevented, and the prediction accuracy is improved; the vector is the final embedded vector of the information cascade output by the space-time aggregation embedded module.

Step 13: inputting the aggregate vector obtained in the step 12 into a three-layer fully-connected neural network for cascade growth prediction, and obtaining a final output;

step 14: and processing the data and updating the learnable parameters in the network to reduce the prediction error.

Further, the step 14 includes the following:

Examples:

in the specific implementation stage, experiments are carried out on a synthetic data set, a new wave microblog and a true data set of a twitter and a bean cotyledon, and the statistical information of the data set is shown in table 1:

table 1 dataset description

The network generation method in the synthetic data set adopts an extended-basic-Albert model, and the basic-Albert model (BA model for short) is a social network generation method commonly used in network analysis, and can well generate a network structure basically consistent with a real social network. The expanded BA model is added with some randomness on the basic BA model, so that the generated network is more in line with the real situation. Referring to the synthetic network generation method in the previous study, the embodiment sets the number of initial nodes of the BA model to 1, the iteration number to 3000, p=0.4, q=0.4, finally removing the isolated nodes in the generated graph, and converting the graph into a directed graph, thereby finally obtaining a generated user network comprising 2951 users and 8967 sides.

The information cascade generation method adopts CThe setting in oupledGNN, the size of the seed subset is sampled according to a power law distribution with a parameter of 2.5, i.e. p (n) ≡n ^-2.5 Uniformly sampling nodes in each seed set, wherein an independent cascade model is adopted as an information diffusion model, and the activation probability from node u to node v is 1/d _v ，d _v Is the degree of ingress of node v. The iteration is set to 15000 times at this time, and meanwhile, the cascade of which the propagation step length is less than or equal to 3 is eliminated, and finally 3057 effective cascades are obtained, and the effective cascades are divided into a training set, a test set and a verification set according to the proportion of 0.8:0.1:0.1.

For a real data set, information cascade on a new wave microblog is selected. In order to obtain the most realistic user relationship network and user cascade information, the disclosed data set of this embodiment tracks and records detailed information of approximately 10 hundred million microblogs in one month in a large network with 170W users and 4 hundred million relationships, wherein the detailed information comprises 30W information cascades. The data set can well show the cascading propagation situation in the real situation, so that according to the subset extraction method, one and small user network and cascading information are extracted from the large-scale data set.

The extracted network contains 841356 edges and 26720 user nodes, and contains 2234 information cascades. These cascades are also divided into training sets, validation sets and test sets in a ratio of 0.8:0.1:0.1.

The twitter data set contains the tweets during month 10 of 2010 and their propagation paths in the user. After the elimination of the excessively short invalid cascade, a total of 3453 information cascades are included, involving 12627 users. And taking the friend relations among all users participating in cascade propagation as social relations, so as to construct a social relation network. The social relationship network contains 309631 edges.

The bean data set is collected from a bean web site in which users can share their book or movie reading status and pay attention to the status of others. The users participating in the same information transmission form an information cascade, and the information cascade comprises 3484 information cascades and 24926 related users. The co-occurrence relationships of users (e.g., reading the same book) are considered their social relationships, with a total of 379155 co-occurrence relationships between all users in the dataset. Information propagation speed on the bean network is slower than that of microblog and twitter, so we choose to make cascading observations in month units.

For the soybean data set, the observation times were defined as 3 months, 6 months, and 9 months; the observation time of the microblog data set is 1 hour, 2 hours and 3 hours; the observation time of the twitter data set was 0.5 hours, 1 hour, and 1.5 hours; the synthetic data set has no propagation time concept, and because the observation is carried out according to the number of steps of information propagation, the observation time length is taken to be when the propagation is 2 steps. The statistics of the dataset at this observation time are shown in table 2:

table 2 statistics of data sets at different observation times

In addition, the relationship between the observed length and the predicted length of the information cascade and the corresponding number is analyzed, as shown in Figure 3.

From fig. 3 a-d we can see that the observed length and the predicted length of the cascade both correspond to the power law distribution in statistical quantity, and the observed length distribution and the predicted length distribution in the same dataset are basically consistent, and the total length distribution of the cascade also tends to be consistent. This illustrates that it is reasonable to predict the growth of the cascade according to the set observation time. Within this observation time, the propagation modes of the different cascades have been different and their future growth can be predicted from these factors.

According to the setting of evaluation indexes in the past work, MLSE is also selected as the evaluation index of the experiment:

wherein N represents the total number of cascades, Δy' _i Representing the number of increases in cascade i of model predictions, Δy _i Representing the actual number of increases in cascade i.

Regarding parameter setting in the comparison method, wherein the feature-based method is used, the hidden layer dimension is set to be 128, the super parameters of the other models are taken from the super parameters of the optimal model given in the original paper and the code, and meanwhile, the limit of the iteration round number is removed, so that a converged loss value can be obtained.

Since the deep hawkes model requires a propagation path for information, which is missing from the data set employed in this embodiment, an algorithm is employed to reconstruct the propagation path. Firstly, arranging user nodes in information cascade according to time sequence, taking the first user node in the sequence as an information transmission source user, adding the first user node into an influence user set, and adding the rest nodes into the user set to be influenced. And sequentially taking out the user nodes from the user set to be affected as information receivers according to the time sequence, and judging whether the user and the user in the user set to be affected have a neighbor relation in the social relation network. If a neighbor relation exists with the unique user, the user is selected as a propagator of the information; if a neighbor relation exists between the user and a plurality of users, sequentially comparing the outbound degree, the inbound degree and the node number of the users in the social relation network, and selecting a propagator of information from the outbound degree, the inbound degree and the node number; if no users with neighbor relation exist, comparing the outbound degree, the inbound degree and the node number of all users in the user set in the social relation network, and selecting a propagator of information from the outbound degree, the inbound degree and the node number. Finally, a propagation track from the information propagator to the receiver is constructed, and the information receiver is moved from the user set to be affected into the user set to be affected. And repeating the steps until no user exists in the set of users to be influenced, and taking all propagation tracks constructed in the process as the information cascade propagation tracks for deep hawkes.

The CasDiffGNN algorithm proposed in this embodiment has 2 layers of random diffusion and 2 GAT pairs of lamination, and the enhancement time is aggregated into 3 layers; the dimension of the hidden layer in the GAT is 32, the number of the multi-head attention heads is 2, and the dropout rate is 0.1; the dimension of a hidden layer in the enhanced time aggregation layer is 32, the number of the multi-head attention heads is 2, and the dropout rate is 0.2; the total three hidden layers of the full-connection layer are respectively 32, 32 and 1 in dimension, and the dropout rate is 0.1; using Adam optimizer, the initial learning rate is 0.005 and the weight decay factor is 0.001.

According to the same data processing and dividing method, in this embodiment, all the comparison models are compared with the proposed CasDiffGNN model on the bean data set, the microblog data set, the twitter data set and the synthetic data set, respectively, and the comparison test results are shown in table 3:

TABLE 3 Overall prediction Performance

Performance comparison as shown in table 3, the CasDiffGNN model was superior to the other models over all data sets, with a maximum 24.86% reduction in prediction error, and an average 9.75% reduction, relative to the previous model. Deepfawkes processes the time-to-prediction relationship through a time decay function, CCasGNN encodes the order of user participation in the cascading sequence through a position coding function, and this embodiment encodes the time stamp of the information propagation through the position coding function. In the composite dataset, regardless of the time stamp or the sequence information, the statistics are expressed as the number of propagation steps, and neither the sequence nor the propagation rate can be accurately expressed. And the CasSeqGCN counts the user nodes in the same time period into the same cascade graph by dividing the time period. The model is consistent with the thought of information propagation in the composite dataset according to the iteration number, so that the performance is not affected, and smaller prediction errors are obtained. In addition to the embodiment model, feature Linear and deep cas models, other models all have the problem that as the observation time increases, the prediction error also increases.

To better explore the contribution of each component in CasDiffGNN to the final prediction result, and to explore the real factors affecting the cascade prediction problem, the present embodiment devised various variant models of CasDiffGNN. In CasDiffGNN, the main contribution is the diffuse random sampling and spatio-temporal aggregation embedding part, in order to explore the roles of these two parts, the following model variants were designed:

CasDiffGNN-noDrop: the original casdiffgnn discards nodes and corresponding edges in the outermost neighbors of the diffusion random sampling cascade graph after the graph neural network layer ends. In this variant model, any node in the diffuse random sampling cascade graph is not discarded.

CasDiffGNN-seqStack: the encoder layer and the GAT layer are alternately stacked in the CasDiffGNN to realize the depth fusion of the time feature and the structural feature, and node embedded vectors extracted by GAT with different depths are aggregated. To explore the rationality and effectiveness of this part of the design, in this variant model, GAT layers are stacked sequentially with encoder layers.

CasDiffGNN-diff0: the number of sampling iterations in the diffusion random sampling is set to 0, and the diffusion random sampling is not performed.

CasDiffGNN-diff1: setting the sampling iteration number in the diffusion random sampling to be 1, performing diffusion random sampling on the 1-order neighborhood node only, and then discarding the neighborhood node after the first GAT layer is finished.

CasDiffGNN-diff2: the number of sampling iterations in the diffuse random sampling is set to 2, consistent with the original CasDiffGNN.

CasDiffGNN-diff3: setting the sampling iteration number in the diffusion random sampling to be 3, sampling nodes in the 3-order neighbors of the cascade user, then discarding the outermost neighborhood nodes after the first GAT layer is finished, and discarding the remaining two neighborhood nodes after the second GAT layer is finished.

TABLE 4CasDiffGNN ablation model results comparison

As can be seen from the data in table 4, the CasDiffGNN model proposed in this example, that is, model variant CasDiffGNN-diff2, achieved optimal prediction results over most data sets. In addition, it can also be observed from the table that the prediction result of the CasDiffGNN-noDrop model without hierarchical drop operation on the shortest observation time on the microblog and twitter data sets is slightly better than CasDiffGNN-diff2, and the prediction error is reduced by 3.25% and 7.72%, respectively. However, after the hierarchical Drop operation of the node is removed, the stability of the algorithm is greatly reduced. As can be seen from the trend of the linear function fitting error over the observation time, the slopes of the linear function on the three real data sets of the model without node hierarchical Drop are 0.143, 0.561 and 0.030, respectively, while the slopes corresponding to the CasDiffGNN model are 0.016, 0.032 and 0.007, respectively. It can be seen that there is a lot of noise data in the diffusion random sampling graph that is irrelevant to prediction, and if all data is used for prediction without screening, not only the time cost and the storage space cost of the operation are increased, but also some negative effects are caused on the prediction result. Therefore, discarding part of the nodes at the proper time after the diffusion random sampling is necessary to improve the prediction accuracy and algorithm stability.

By comparing experimental data of CasDiffGNN-diffX with CasDiffGNN-seqStack in FIGS. 4 a-c, we can see that the predictive power of the model is significantly reduced after the stacking order of components within the model is changed. The predicted loss of CasDiffGNN-seqStack is generally lower than that of a series of CasDiffGNN-diffX models stacked alternately. This indicates that information useful for prediction exists in the neighborhood nodes after diffusion sampling, and if information in the neighborhood nodes is extracted only by using GAT without relying on the enhanced time aggregation module, the information in the neighborhood nodes cannot be effectively utilized. Thus, the neighborhood node features should be aggregated using an enhanced time aggregation module before they are discarded.

In order to explore the influence of the diffusion layer number on the prediction result in diffusion random sampling, the embodiment compares models under 0-order, 1-order, 2-order (CasDiffGNN) and 3-order diffusion without changing the network structure, and it can be seen from the table that when the diffusion order is 2, the error between the prediction result and the real result is minimal. When the diffusion order is 1, the prediction error of the model is smaller than that of most baseline methods, and the model is better than the CasDiffGNN model proposed in the chapter even under the long observation time of the twitter. By showing the relationship between the diffusion random sampling order and the prediction error in different data sets and different observation times in fig. 5a to c, we can clearly see that, in most data sets, when the diffusion random sampling order is less than or equal to 2, the prediction error of the model decreases with the increase of the order. This illustrates that diffusion sampling of the neighborhood of the cascade graph is an effective way to improve model performance.

However, once sampled to the 3 rd order neighborhood of the tandem user node, the error of the algorithm increases rapidly. This phenomenon is related to the relation between the order and the number of nodes in the r-order reachable graph. Previous studies have shown that the number of nodes can reach hundreds of thousands in a 2-order reachability graph with only 10 user nodes in cascade. The diffuse random sampling to 3 rd order is equivalent to random sampling of user nodes in the entire social network relationship graph. Therefore, the neighbor nodes in the 3-order neighbor domains not only introduce the beneficial information for enhancing the prediction accuracy, but also bring more noise data, thereby influencing the increase of the model accurate prediction cascade future. Therefore, on the premise that the interference of irrelevant information in the graph cannot be eliminated, only nodes in the 2-order neighborhood should be used for predicting the future growth of the cascading sequence.

The method for processing the combination of the time characteristics and the structural characteristics avoids precision loss caused by insufficient structural characteristics, and is more advantageous in early cascade prediction with shorter observation time.

In terms of operation efficiency, the calculation cost is the largest and the operation efficiency is the lowest due to graph convolution on the whole graph; one great advantage of the Encoder over the RNN is that the computations are parallel, with faster speed, both forward and backward propagation. The efficiency of operation of our model is superior to CasSeqGCN using RNN type networks, and is substantially consistent with CCasGNN.

The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. The social network information propagation influence prediction method based on information cascading is characterized by comprising the following steps of:

step 6: converting the time matrix into an input vector of an encoder;

step 9: performing layering Drop operation;

step 10: adding the cascade graph, the aggregate vector and the time matrix;

step 12: obtaining a final aggregate vector from the encoder;

2. The method for predicting influence of information dissemination in a social network based on information concatenation according to claim 1, wherein the step 1 includes the following:

3. The method for predicting influence of information dissemination in a social network based on information concatenation according to claim 2, wherein the step 2 includes the following:

4. A method for predicting influence of information dissemination in a social network based on information concatenation according to claim 3, wherein step 3 includes the following:

5. The method for predicting influence of information dissemination in a social network based on information concatenation according to claim 4, wherein the step 4 includes the following:

6. The method for predicting influence of information dissemination in a social network based on information concatenation according to claim 5, wherein the step 5 includes the following:

7. The method for predicting influence of information dissemination in a social network based on information concatenation according to claim 6, wherein the step 6 includes the following:

8. The method for predicting influence of information dissemination in a social network based on information concatenation according to claim 7, wherein the step 7 includes the following:

the process of self-attention transformation is as follows:

(1) Calculating a self-attention conversion result of the user node: defining three matrices W that can update parameters by back propagation ^Q ，W ^K ，W ^V Respectively performing matrix multiplication operation with each user input vector to obtain Q, K and V vectors of each user node; respectively carrying out vector multiplication on the Q vector of one user and the K vectors of all other users, dividing the calculated result by the evolution of vector dimension, andcalculating a Softmax function; the Softmax function can "compress" one K-dimensional vector z containing arbitrary real numbers into another K-dimensional real vector σ (z) such that each element ranges between (0, 1) and the sum of all elements is 1; the correlation degree between other users and the user can be obtained through the calculation result, the correlation degree is used as a weight, the V vectors of all the user nodes are weighted and summed, and a group of self-attention heads of the user nodes are calculated;

9. The method for predicting influence of information dissemination in a social network based on information concatenation according to claim 8, wherein the step 8 includes the following:

taking the diffusion cascade diagram obtained in the step 7 as the input of a GAT layer, and selecting the GAT model as a method for extracting structural features in a CasDiffGNN model; in the GAT model, the neighbor nodes perform weighted calculation through the attention coefficient, and each node only considers the information of the local neighbor nodes; the CasDiffGNN model adopts a method of alternately stacking an encoder layer and a GAT layer, performs depth fusion on time characteristics and structural characteristics, performs single-layer time aggregation transformation on information before data is input into the GAT layer, and integrates the time characteristics into node characteristics for calculation; introducing global node features and temporal features of the graph through a self-attention mechanism of the encoder layer; the GAT layer adopts an attention mechanism, so that the GAT layer can adaptively calculate the contribution weight of each neighbor node to the current node, and the definition of the attention weight of the GAT layer u to the v node and the definition of the output are as follows:

finally, a new diffusion cascade diagram is obtained.

10. The method for predicting influence of information dissemination in a social network based on information concatenation according to claim 9, wherein the step 9 includes the following:

11. The method for predicting influence of information dissemination in a social network based on information concatenation according to claim 10, wherein the step 10 includes the following:

12. The method for predicting influence of information dissemination in a social network based on information concatenation according to claim 11, wherein the step 12 includes:

13. The method for predicting influence of propagation of social networking information based on information concatenation of claim 12, wherein the step 14 includes: