CN115168678A

CN115168678A - Time sequence perception heterogeneous graph nerve rumor detection model

Info

Publication number: CN115168678A
Application number: CN202210721077.2A
Authority: CN
Inventors: 宋玉蓉; 陈林威
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2022-06-23
Filing date: 2022-06-23
Publication date: 2022-10-11

Abstract

The invention discloses a time sequence perception heterogeneous graph nerve rumor detection model, in recent years, the breeding and propagation of rumors are greatly accelerated by the development of online social media, and the harmfulness of the rumors leads the automatic rumor detection technology to be widely concerned by researchers. The invention simultaneously considers the global structure relationship between events and the time sequence relationship of message propagation in the events, uses a heterogeneous graph as a carrier to jointly and explicitly model the two relationships, and provides a novel time sequence perception heterogeneous graph nerve rumor detection model. The model captures the time sequence relation between forwarding (or comment) posts in the event by using a time sequence perception self-attention mechanism, and fuses the forwarding (or comment) posts with time sequence information with a source post to obtain the local time sequence representation of the event; capturing a global structure relation between events by using an element level attention mechanism, and learning a global structure representation of the events; finally, the two are fused for detecting rumors.

Description

Time sequence perception heterogeneous graph Neurorumor detection model

Technical Field

The invention provides a time sequence perception heterogeneous graph nerve rumor detection model, belongs to the technical field of rumor detection, and particularly relates to graph nerve network correlation technology.

Background

In recent years, the growth and spread of rumors have been greatly accelerated by the development of online social media, and the harmfulness of rumors has led researchers to pay much attention to the automatic rumor detection technology. Most of the early rumor detection methods utilize feature engineering to mine effective features from the aspects of text content, user configuration information, propagation structure and the like. The method depends on heavy feature engineering, is time-consuming, needs a large amount of human resources, and has strong subjectivity and lacks of high-order feature representation of artificially constructed features. With the development of deep learning, deep neural networks have achieved good results in many natural language processing tasks, such as emotion analysis, machine translation, and text classification. With the above-mentioned introduction, researchers have begun to model text contents, propagation structures, etc. by using deep learning models, and have proposed many effective rumor detection methods. Recently, the graph model-based method utilizes the structural characteristics of graph neural network modeling message propagation to convert the rumor detection task into the graph classification task, and a good effect is achieved. However, these methods only consider the local propagation structure of posts inside the event, and ignore the global structural relationship of the event on the social media. Yuan et al (Yuan Chun-Yuan, ma Qian-wen, zhou Wei, et al. Joint embedding the local and global relationships of a heterologous graph for rumor detection [ C ]//2019 IEEE international conference on data mining (ICDM). IEEE,2019 796-805.) consider that each event is not an independent individual, that a relationship may arise between events due to participation of the same user, and that a relationship between events is ignored only in consideration of the characteristics of each event itself, which necessarily limits the detection performance of the model. Therefore, they studied the associations between events from the perspective of heterogeneous networks, proposing a heterogeneous graph combining global and local relationships to capture the local semantic relationships and global structural information of message propagation. Although the model achieves good effect, the model ignores the timing information in the process of message propagation inside the event.

Disclosure of Invention

The present invention is directed to overcoming the above problems, and providing a time-series aware heterogeneous graph neurorumor detection model, which includes a heterogeneous graph construction module, a local time-series information encoding module, a global structure information encoding module, and a rumor classification module. The method comprises the following steps:

and S01, constructing a heterogeneous graph, wherein the module comprises two parts: constructing a heterogeneous graph based on the interactive relation between the event internal forwarding (comment) posts and the source posts and the interactive relation between the user and the event; performing initialization representation on each type of node in the heterogeneous graph;

s02, extracting local time sequence characteristics of the event, wherein the module comprises two parts: mining local time sequence information in an event by adopting a time sequence perception self-attention mechanism to obtain a response paste representation with time sequence information

Then response paste with time sequence information

Fusion into source-pasted representations to obtain local temporal characterization of events

S03, extracting global structural features of the event, wherein the module comprises two parts: computing participation event c _i A specific user u in _j Attention vector y in different aspects _j (ii) a By attention vector gamma _j And participate in event c _i All users in the system are aggregated in an element product mode to capture the global structure relation between events;

s04, after obtaining the local time sequence representation and the global structure representation of each event node, splicing the two features to be used as the final representation of the event node, and calculating the prediction result of the event through a full connection layer and a softmax function, namely the probability value of the event as each label

Last definitionA loss function that continuously updates the model parameters to obtain the optimal values.

The step S01 specifically includes:

s11, firstly abstracting an event and related users into two different types of nodes in a network, and establishing a connection edge relationship between a user node and the event node according to the participation condition of the user to the event (the user has the behavior of forwarding or commenting posts in the event). And, within each event, a source post and a series of response posts. The response tiles are constructed as a time series according to the time delay of the response tiles after the source tiles are issued, so that each source tile corresponds to a response sequence. And finally constructing an event-user heterogeneous graph with timing information.

S12, performing initialization representation on each node in the heterogeneous graph, and specifically comprising the following steps:

and S12-1, performing initialization representation on the event node. The internal essence of the event is the text contents of the source paste and the response paste, which are initialized in a word vector mode and are coded by using CNN. Specifically, the number of words per post is fixed to be L, and when the number of words is less than L, 0 is used for filling; when the number of words exceeds L, truncation is performed. Then, training is carried out on the corpus in the specific field through the Word2Vec algorithm to obtain the vector representation of each Word, and for the words which do not appear in the pre-training Word vector library, the invention uses uniform distribution to carry out initialization and keeps the Word vectors to be finely adjusted in the training process. Remember an initial vector for each word as

j represents the jth word in the post, the post with the number of words L can be represented as:

x _1:L ＝[x ₁ ；x ₂ ；…x _L ],

wherein, "; "is a splicing operation,

further, the sentence sequence is encoded using CNN, given a formulaSentence sequence x composed of word vectors _1:L Performing one-dimensional convolution operation on each possible window through the convolution layer of the CNN:

e _i ＝σ(W*x _i:i+h-1 ),

obtaining a characteristic diagram

Wherein the content of the first and second substances,

is a convolution kernel of size h, followed by a max pooling operation

For the ith event, the source label is represented as

Each response label is represented as

Record the matrix formed by the response paste in the event as

And S12-2, performing initialization representation on the user node. And encoding attribute information (including gender, age, fan number, attention number and the like) of the user to obtain an initialization vector representation of the user node. And initializing the user information which cannot be obtained through normal distribution.

The step S02 specifically includes:

and S21, mining local timing information in the event by adopting a self-attention mechanism of timing perception, and capturing the difference between response posts generated by the rumor event and the non-rumor event at different time stages and potential timing dependency relations between the response posts.

S21-1, in order to encode the time delay information of each response paste, a position embedding is generated for each response paste by using a Position Encoding (PE) formula in a Transformer model:

where pos represents the position of the response paste in the sequence, d represents the dimension of PE, 2k represents the even dimension, and 2k +1 represents the odd dimension (i.e., 2k ≦ d,2k +1 ≦ d).

S21-2, associating the embedding of each response sticker with the corresponding position embedding to capture the time sequence information between the response stickers:

and S21-3, focusing attention on important response labels by using a multi-head attention mechanism, wherein the self-attention mechanism can explicitly give larger weight to information with larger influence to the self-attention mechanism and weight the information into the self-attention mechanism, so that the representation of nodes is greatly enriched, and the multi-head attention mechanism can consider the influence of external information in many aspects as possible:

s22, fusing the response paste with the time sequence information into the representation of the source paste to obtain the local time sequence representation of the event

The invention regards a series of response stickers as first-order neighbor nodes of corresponding source stickers, and adopts an aggregation function in a graph attention network for fusion, and the concrete calculation is as follows:

α _ii ＝softmax(LeakyReLU(a ^T [m _i ；m _i ])),

wherein, the first and the second end of the pipe are connected with each other,

activating a function for sigmoid, α _ii 、α _ij Respectively representing the attention scores between node i and itself and between node i and node j, N (m) _i ) The response posts corresponding to the current source post are all response posts,

and (4) a weight parameter for the node feature transformation of the layer.

The step S03 specifically includes:

s31, establishing a global structure relationship between events based on common users, and considering how to learn the global structure characteristics of event nodes

Inspired by the element-level attention mechanism widely used in the task of recommendation systems, the invention proposes a user-embedded element-oriented attention mechanism, which assumes that each dimension of user embedding reflects different aspects of information of a user, and that these different attributes of the user have different effects on the propagation of messages. In particular toThe process is as follows:

for participation in event c _i Specific user in (1)

Computing user u _j Attention vector y in different aspects _j ：

γ _j ＝tanh(W _c ·u _j +b),

Wherein the content of the first and second substances,

in order to transform the matrix for the features,

is a different aspect of the attention vector.

The larger the representation u is embedded by the user _j The greater the impact of the kth aspect of (a) on message propagation.

S32, using the attention vector gamma _j And participate in event c _i All users in (2) are aggregated in an element product manner to capture events and global structural relationships between events:

the step S04 specifically includes:

s41, splicing the local time sequence representation and the global structure representation of the event to be used as the final representation of the event node, and calculating the prediction result of the event through a full connection layer and a softmax function, namely the probability value of the event for each label:

wherein Fc (·) is a fully connected layer, and the output dimension is consistent with the classification category.

And S42, finally, defining the loss function of the model as the cross entropy between the prediction result and the real label:

where r is the number of classes classified, θ is the parameter of the entire model, y _i ∈{0,1,2,3}(Twitter)，y _i E {0,1} (Weibo) is the true tag value.

Compared with the prior art, the time sequence perception heterogeneous graph Neadry detection model implemented by the invention has the following beneficial effects:

the invention fully considers the local time sequence relation between the forwarding (or comment) posts in the event and the source posts and the global structure relation between the event and the event, and jointly and explicitly models the local time sequence information and the global structure information to complete the rumor detection task.

The method is based on the interactive relation between forwarding (or comment) posts and source posts, the local time sequence relation between response posts is modeled through position coding, important response posts are focused by using a multi-head attention mechanism, and then the source posts and the response posts are fused to obtain the local time sequence representation of each event node.

The method is based on the interaction relationship between the user and the event, and utilizes an element-level attention mechanism to learn the global structure representation of each event node so as to capture complex and various propagation structure characteristics.

The experimental results show that the model provided by the invention achieves better effects than the existing model on rumor classification and rumor early detection tasks.

Drawings

Fig. 1 is a diagram of the overall framework of the time-series aware heterogeneous pattern neurumor detection model-SHGN model.

Detailed Description

For a better understanding of the objects, aspects and advantages of the present invention, reference is made to the following detailed description taken in conjunction with the accompanying drawings, which are included to provide a further understanding of the invention, and are not intended to limit the scope of the invention.

FIG. 1 is a block diagram of the general framework of the time-series aware Heterogeneous neurorumor Detection model, SHGN (Sequence-aware Heterogeneous Neural Rumor Detection). As shown in fig. 1, it includes 4 modules: the system comprises a heterogeneous graph construction module, a local time sequence information coding module, a global structure information coding module and a rumor classification module. Specifically comprises

And S01, constructing a heterogeneous graph, wherein the module comprises two parts: constructing a heterogeneous graph based on the interactive relation between the event internal forwarding (comment) posts and the source posts and the interactive relation between the user and the event; and performing initialization representation on each type of node in the heterogeneous graph.

Specifically, the construction of the S01 heterogeneous map comprises the following steps:

s11, firstly abstracting an event and related users into two different types of nodes in a network, and establishing a connection edge relationship between a user node and an event node according to the participation condition of the user to the event (the user has the action of forwarding or commenting posts in the event). And, a source post and a series of response posts are contained in each event. The response tiles are constructed as a time series according to the time delay of the response tiles after the source tiles are issued, so that each source tile corresponds to a response sequence. And finally constructing an event-user heterogeneous graph with timing information.

x _1:L ＝[x ₁ ；x ₂ ；…x _L ],

wherein, "; "is the splicing operation, and the splicing operation,

further, the sentence sequence is encoded using CNN, given a sentence sequence x consisting of word vectors _1:L Performing one-dimensional convolution operation on each possible window through the convolution layer of the CNN:

e _i ＝σ(W*x _i:i+h-1 ),

get the characteristic diagram

is a convolution kernel of size h, followed by a max pooling operation

Selecting the maximum value of each feature graph, and obtaining the initialization vector representation of each post through splicing operation

Each response label is represented as

The matrix formed by the response paste in the event is recorded as

And S12-2, performing initialization representation on the user node. And encoding attribute information (including gender, age, fan number, attention number and the like) of the user to obtain an initialization vector representation of the user node. And initializing the user information which cannot be acquired through normal distribution.

Then response paste with time sequence information

Specifically, the S02 local timing information encoding includes the following steps:

and S21, mining local time sequence information in the event by adopting a time sequence-aware self-attention mechanism, and capturing the difference of response patches generated at different time stages of the rumor event and the non-rumor event and the potential time sequence dependency relationship between the response patches.

where pos represents the position of the response paste in the sequence, d represents the dimension of PE, 2k represents the even dimension, 2k +1 represents the odd dimension (i.e., 2k ≦ d,2k +1 ≦ d);

s21-2, associating the embedding of each response paste with the corresponding position embedding so as to capture the time sequence information between the response pastes:

α _ii ＝softmax(LeakyReLU(a ^T [m _i ；m _i ])),

wherein the content of the first and second substances,

and weight parameters for the node feature transformation of the layer.

S03, extracting global structural features of the event, wherein the module comprises two parts: computing participation event c _i Specific user u in _j Attention vector y in different aspects _j (ii) a By attention vector gamma _j And participate in event c _i All users in (2) are aggregated in an element product manner to capture events and global structural relationships between events.

Specifically, the S03 global structure information encoding includes the following steps:

s31, based on the global structure relationship between the events established by the common users, the method considers how to learn the global structure characteristics of the event nodes

Inspired by the element-level attention mechanism widely used in the task of recommendation systems, the invention proposes a user-embedded element-oriented attention mechanism, which assumes that each dimension of user embedding reflects different aspects of information of a user, and that these different attributes of the user have different effects on the propagation of messages. The specific process is as follows:

for participation in event c _i Specific user in (1)

Computing user u _j In different aspectsAttention vector y of _j ：

γ _j ＝tanh(W _c ·u _j +b),

Wherein the content of the first and second substances,

in order to transform the matrix for the features,

are different aspects of the attention vector.

And finally, defining a loss function, and continuously updating model parameters to obtain an optimal value.

Specifically, the S04 rumor classification includes the following steps:

wherein, fc (-) is a full connection layer, and the output dimension is consistent with the classification.

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. A time-series aware heterogeneous map neuadry detection model, comprising 4 modules: the system comprises a heterogeneous graph construction module, a local time sequence information coding module, a global structure information coding module and a rumor classification module, wherein the heterogeneous graph construction module constructs an event-user heterogeneous graph based on a rumor detection data set and initializes and represents each node in the graph by utilizing an embedding technology; the local time sequence information coding module learns the response paste representation with time sequence information by utilizing a time sequence perception self-attention mechanism based on the time sequence relation between event internal forwarding or comment pastes, and then fuses the content information of the source paste to obtain the local time sequence representation of each event node

The global structure information coding module learns the global structure representation of each event node by utilizing an element-level attention mechanism based on the interactive relation between users and events

The rumor classification module fuses local time sequence characterization and global structure characterization of events and predicts the probability that the current event is a rumor; the module is characterized by specifically comprising the following steps of:

s01, constructing a heterogeneous graph: constructing a heterogeneous graph based on the time sequence relation between the event internal forwarding or comment posts and the source posts and the interaction relation between the user and the event; performing initialization representation on each type of node in the heterogeneous graph;

s02, extracting local time sequence characteristics of the events: mining local time sequence information in an event by adopting a time sequence perception self-attention mechanism to obtain a response paste representation with time sequence information

Then response paste with time sequence information

S03, extracting the global structural features of the event: computing participation event c _i Specific user u in _j Attention vector y in different aspects _j (ii) a By attention vector gamma _j And participate in event c _i All users in the system are aggregated in an element product mode to capture the global structure relation between events;

s04, after the local time sequence representation and the global structure representation of each event node are obtained, the two characteristics are spliced to be used as the final representation of the event node, the prediction result of the event is calculated through a full connection layer and a softmax function, namely the probability value of the event being each label

Finally define oneAnd continuously updating the model parameters by using the loss function to obtain an optimal value.

2. The time-series aware heterogeneous graph Neadry detection model of claim 1, wherein the step S01 comprises:

s11, firstly abstracting an event and related users into two different types of nodes in a network, and establishing a connection edge relationship between a user node and the event node according to the participation condition of the user to the event, namely the behavior of forwarding or commenting the posts in the event by the user; and each event contains a source paste and a series of response pastes; constructing the response paste into a time sequence according to the time delay of the response paste after the source paste is issued, so that each source paste corresponds to a response sequence; finally constructing an event-user heterogeneous graph with time sequence information;

s12-1, initializing the event node: initializing in a word vector mode, and coding the word vector by using CNN; specifically, the number of words per post is fixed to be L, and when the number of words is less than L, 0 is used for filling; when the number of words exceeds L, truncating; then training on the corpus by a Word2Vec algorithm to obtain the vector representation of each Word, initializing the words which do not appear in the pre-training Word vector library by using uniform distribution, and keeping the Word vectors to be allowed to be finely adjusted in the training process; remember an initial vector for each word as

j represents the jth word in the post, and the post with the number of each word L is represented as:

x _1:L ＝[x ₁ ；x ₂ ；...x _L ]，

wherein, "; "is a splicing operation,

further, the sentence sequence is encoded using CNN: given a sentence sequence x consisting of word vectors _1:L Performing one-dimensional convolution operation on each possible window through the convolution layer of the CNN:

e _i ＝σ(W*x _i:i+h-1 ),

obtaining a characteristic diagram

Wherein the content of the first and second substances,

is a convolution kernel of size h; then using maximum pooling operation

Selecting the maximum value of each feature map, and obtaining the initialized vector representation of each post through splicing operation; for the ith event, its source tile is denoted as

Each response label is represented as

The matrix formed by the response paste in the event is recorded as

S12-2, initializing the user node: and coding the attribute information of the user, including gender, age, fan number and attention number, to obtain an initialization vector representation of the user node, and initializing the unavailable user information through normal distribution.

3. The time-series aware heterogeneous graph Neadry detection model of claim 1, wherein the step S02 comprises:

s21, mining local time sequence information in the event by adopting a time sequence perception self-attention mechanism, and capturing the difference between response posts generated by the rumor event and the non-rumor event at different time stages and the potential time sequence dependency relationship between the response posts;

s21-1, in order to encode the time delay information of each response paste, generating a position embedding for each response paste by using a position encoding formula in a Transformer model:

wherein pos represents the position of the response paste in the sequence, d represents the dimension of PE, 2k represents the dimension of even number, 2k +1 represents the odd dimension, i.e. 2k ≦ d,2k +1 ≦ d;

s21-3, performing important attention on the important response paste by using a multi-head attention mechanism:

s22, timing information is providedThe response paste is fused into the representation of the source paste to obtain the local time sequence representation of the event

A series of response stickers are regarded as first-order neighbor nodes of corresponding source stickers, and fusion is performed by adopting an aggregation function in the attention network, and the specific calculation is as follows:

α _ii ＝softmax(LeakyReLU(a ^T [m _i ；m _i ])),

for sigmoid activation functions, alpha _ii 、α _ij Respectively representing the attention scores between node i and itself and between node i and node j, N (m) _i ) The response posts corresponding to the current source post are all response posts,

and (4) a weight parameter for the node feature transformation of the layer.

4. The time-series aware heterogeneous graph Neadry detection model of claim 1, wherein the step S03 further comprises:

s31, establishing a global structure relation between events based on common users, wherein the specific process is as follows:

for participation in event c _i Specific user in (1)

Calculating user u _j Attention vector y in different aspects _j ：

γ _j ＝tanh(W _c ·u _j +b),

Wherein the content of the first and second substances,

in order to transform the matrix for the features,

is a different aspect of the attention vector;

the larger the representation u is embedded by the user _j The greater the impact of the kth aspect of (a) on message propagation;

5. the time-series aware heterogeneous graph Neadry detection model of claim 1, wherein the step S04 specifically comprises:

wherein, fc (-) is a full connection layer, and the output dimension is consistent with the classification;