CN116108286A

CN116108286A - False information detection method, device and equipment based on propagation reconstruction

Info

Publication number: CN116108286A
Application number: CN202211723012.8A
Authority: CN
Inventors: 周薇; 卫玲蔚; 胡斗; 虎嵩林
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2022-09-27
Filing date: 2022-12-30
Publication date: 2023-05-12

Abstract

The present disclosure relates to a method, an apparatus and a device for detecting false information based on propagation reconstruction, wherein the method comprises: acquiring propagation data of the blog in the social network; constructing an information propagation diagram based on the propagation data; aggregating node neighborhood characteristics in the information propagation graph by using a depth graph convolution network to obtain node representation V of the information propagation graph; estimating potential propagation interactions between nodes in the information propagation graph according to the node representation V to generate a plurality of potential propagation graphs; after node neighborhood characteristics of each potential propagation graph are aggregated by using a depth graph convolution network, integrating updated node representations of all updated potential propagation graphs to obtain a reconstructed node representation Z; and performing task classification based on the node representation V and the reconstruction node representation Z to obtain a false information detection result of the blog. The present disclosure may improve performance of false information detection tasks.

Description

False information detection method, device and equipment based on propagation reconstruction

Technical Field

The invention relates to the technical field of data mining, in particular to a false information detection method, device and equipment based on propagation reconstruction.

Background

In the new media era, social media websites provide great convenience for users to acquire information, express comments and communicate with each other. More and more users are enthusiastically involved in the discussion of hot topics in social media, and user-generated content can quickly reach a wide audience due to the convenience of the platform and the like. However, a great amount of false information is generated and spread in social media, which brings harm to social stability and seriously affects the daily life of people and the healthy development of society. Therefore, how to automatically detect false information is an urgent research subject in the field of social network analysis, and has important research significance and practical application significance.

Over time, the source blog forms its particular propagation structure through a series of forwarding or replying to propagation relationships. Existing studies have found that spurious information presents a faster, wider, deeper propagation structure than real information. These structural features provide the opportunity for researchers to detect false information using the propagated data. With the development of Deep Learning (Deep Learning), data Mining (Data Mining) and Graph Learning (Graph Learning) technologies, many methods model propagation Data of source blog text into a tree structure or a Graph structure, learn feature representations of the source blog text by using a Deep Learning tool, and construct a classifier for detection.

However, in the existing detection method, only local propagation dependence can be aggregated by utilizing shallow map convolution, and modeling of deeper and wider propagation structures is insufficient; on the other hand, under the big data age, in order to guarantee the privacy security of network users, researchers have difficulty in acquiring complete information transmission data and have high cost, and abnormal data such as false and irrelevant transmission cannot be proposed, so that strong uncertainty of a network structure exists. These incomplete, unreliable propagation structures limit the learning ability of the model, making it unable to focus on key propagation structure features that are beneficial to false information detection.

Disclosure of Invention

Aiming at the main technical problems, the invention comprises two problems, namely, how to fully model long-distance dependence in deeper and wider propagation structures; and how to extract the high-order structural features in the incomplete and unreliable propagation structure. The invention provides a false information detection method, device and equipment based on propagation reconstruction, which are used for improving the performance of false information detection tasks.

The specific technical scheme of the invention is as follows:

according to a first aspect of an embodiment of the present disclosure, the present disclosure provides a method for detecting false information based on propagation reconstruction, where the method includes the following steps:

acquiring propagation data of the blog in the social network; wherein the propagation data includes: the method comprises the steps of obtaining text content of a source blog and a subsequent propagation blog, a propagation relation set between a source blog node and a subsequent propagation blog node and a propagation relation set between propagation blog nodes;

constructing an information propagation diagram based on the propagation data;

aggregating node neighborhood characteristics in the information propagation graph by using a depth graph convolution network to obtain node representation V of the information propagation graph;

estimating potential propagation interactions between nodes in the information propagation graph according to the node representation V to generate a plurality of potential propagation graphs;

after node neighborhood characteristics of each potential propagation graph are aggregated by using a depth graph convolution network, integrating updated node representations of all updated potential propagation graphs to obtain a reconstructed node representation Z;

and performing task classification based on the node representation V and the reconstruction node representation Z to obtain a false information detection result of the blog.

Further, the constructing an information propagation graph based on the propagation data includes:

taking the text characteristics of the text content as the initialization characteristics of the nodes in the information propagation diagram;

and, a step of, in the first embodiment,

and acquiring an adjacency matrix in the information propagation graph based on the propagation relation set between the source blog node and the subsequent propagation blog node and the propagation relation set between the propagation blog nodes.

Further, the depth map-based convolution network aggregates node neighborhood features in the information propagation map to obtain a node representation V, including:

stacking the graph roll lamination of the K layers; wherein each graph convolution layer introduces an initial residual error and a unit mapping;

aggregating node neighborhood features in the information propagation graph based on the graph volume lamination of the K layers to obtain node representations

Wherein, when the kth graph convolution layer is aggregated, the updated node represents

Sigma (·) represents the activation function, alpha _k Representing the first super-parameter, beta _k Representing a second superparameter,/->

Representing a weight matrix, +.>

D represents a diagonal matrix, I represents a unit matrix, < ->

Node representation representing source Bowen nodes obtained by K rounds of aggregation, < >>

Representing the node representation of the nth propagation node obtained by K rounds of aggregation.

Further, the estimating potential propagation interactions between nodes in the information propagation graph according to the node representation V to generate a plurality of potential propagation graphs includes:

computing the node representation

Gaussian distribution at each modeling viewing angle +.>

Wherein (1)>

Represents the mean value at the mth modeling viewing angle, < >>

Representing the variance at the mth modeling perspective; wherein, M is more than or equal to 1 and less than or equal to M, M represents the total number of modeling viewing angles;

from the distribution

After the middle sampling, obtaining the distributed representation of the node i under the mth modeling view angle

Wherein, E represents a standard normal distribution with a mean of 0 and a variance of 1, E is E N (0,I);

integrating distributed representations

Obtaining node initialization characteristic Q under the mth modeling view angle ^m ；

For any modeling view angle m, calculating potential propagation interactions between every two nodes

Based on the potentially propagated interactions

Generating an adjacency matrix S at the mth modeling perspective ^m ；

Initializing a feature Q according to the node ^m With the adjacent matrix S ^m A potential propagation map at the mth modeling perspective is generated.

Further, after aggregating the node neighborhood features of each potential propagation graph by using the depth graph convolution network, synthesizing node representations of all updated potential propagation graphs to obtain a reconstructed node representation Z, including:

respectively aggregating node neighborhood characteristics in each potential propagation graph by using a two-layer graph rolling network to obtain node representation of the updated potential propagation graph

Wherein (1)>

Representing the adjacency matrix S ^m Is a regularized form of->

Weights representing the first layer graph rolling network, < ->

A weight representing a second layer graph rolling network;

calculating the reconstruction node represents z=w _z [U ¹ ；…；U ^M ]+b _z Wherein W is _z As a first trainable parameter, b _z Representing a second trainable parameter.

Further, the performing task classification based on the node representation V and the reconstructed node representation Z to obtain a false information detection result of the blog, including:

calculating a graph feature representation o=meanpooling ([ V; Z ]) by averaging pooling layers for a given said node representation V and said reconstructed node representation Z;

and executing a classification task on the graph characteristic representation O to obtain a false information detection result of the blog.

According to a second aspect of the disclosed embodiments, the present invention provides a false information detection device based on propagation reconstruction, the device comprising:

the data acquisition module is used for acquiring propagation data of the blog in the social network; wherein the propagation data includes: the method comprises the steps of obtaining text content of a source blog and a subsequent propagation blog, a propagation relation set between a source blog node and a subsequent propagation blog node and a propagation relation set between propagation blog nodes;

the propagation diagram construction module is used for constructing an information propagation diagram based on the propagation data;

the first graph rolling network module is used for utilizing a depth graph rolling network to aggregate node neighborhood characteristics in the information propagation graph to obtain node representation V of the information propagation graph;

the Gaussian propagation reconstruction module is used for estimating potential propagation interactions among nodes in the information propagation graph according to the node representation V so as to generate a plurality of potential propagation graphs;

the second graph rolling network module is used for integrating node neighborhood characteristics of each potential propagation graph by utilizing the depth graph rolling network, and integrating updated node representations of all updated potential propagation graphs to obtain a reconstructed node representation Z;

and the false information classification module is used for performing task classification based on the node representation V and the reconstruction node representation Z to obtain a false information detection result of the blog.

According to a third aspect of the embodiments of the present disclosure, the present disclosure provides an electronic device, including: a processor; a memory for storing the processor-executable instructions; the processor is configured to read the executable instruction from the memory and execute the instruction to implement any of the above false information detection methods based on propagation reconstruction.

According to a fourth aspect of the disclosed embodiments, the present invention provides a computer readable storage medium having stored thereon computer program instructions, wherein the program instructions, when executed by a processor, implement any of the above-mentioned false information detection methods based on propagation reconstruction.

Compared with the prior art, the technical scheme provided by the invention at least comprises the following beneficial effects:

the method provided by the invention can learn incomplete propagation trees better, explore more complex and potential propagation dependence in the propagation process, and excavate more valuable propagation structural features for detection, thereby effectively improving the detection accuracy of false information; the invention can effectively model long-distance propagation dependence in propagation trees, and has more accurate detection performance especially for deeper and wider propagation tree structures.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Fig. 1 is a flowchart of a false information detection method based on propagation reconstruction.

Fig. 2 is a block diagram of a false information detection system based on propagation reconstruction.

Detailed Description

The invention is described in detail below with reference to the drawings and examples, it being noted that the examples described are only intended to facilitate an understanding of the invention and do not limit it in any way.

According to the false information detection method based on propagation reconstruction, an information propagation diagram is constructed by utilizing texts and partial propagation data of source blogs in a social network, long-distance propagation dependence in the information propagation diagram is captured through a depth diagram convolution network, a plurality of potential propagation diagrams are generated by estimating potential propagation interactions among nodes based on updated node representations, and finally false information detection results are obtained according to the information propagation diagram and the node representations updated by the potential propagation diagrams.

Specifically, as shown in the implementation flowchart of fig. 1, the present invention includes the following steps:

fig. 1 is a flowchart of an implementation of a false information detection method based on propagation reconstruction, which specifically includes the following steps:

step A: text and partial propagation data of source blogs in the Twitter social network are obtained. Specifically, the propagation data of the sample is defined as g=<V,E>Wherein v= { r, x ₁ ,...,x _n-1 The expression "blog information collection in propagation process" r "is the source blog, x ₁ ,...,x _n-1 The method comprises the steps of spreading the blog text for the follow-up; e= { E _st S, t=0,..n-1 } represents a set of propagation relationships.

And (B) step (B): text features are extracted based on the text data, and a propagation graph structure is constructed by using the propagation data. In particular, the method comprises the steps of,

step B1: for each sample, extracting a source blog r and a propagation blog x ₁ ,…,x _n-1 Is defined as the TF-IDF text feature of

Wherein (1)>

As the source BovinrText feature->

To spread the blog, d ₀ Is the text feature dimension.

Step B2: based on the information itself in the social media and the propagation behavior (such as comments and forwarding) under the information, an undirected information propagation graph is constructed, and the adjacency matrix corresponding to the graph structure is respectively defined as

The initial value is defined as:

initializing feature representation of all Bowen nodes in information propagation diagram is constructed based on text features and is marked as X ^TD ＝X ^BU ＝X。

Step C: and C, fully modeling long-distance dependence in the original propagation graph by utilizing deep graph convolution according to the propagation graph constructed in the step B, and extracting the high-order structural features of the original propagation. In order to alleviate the problem of overcomplete of graph modeling, initial residual errors and unit mapping are introduced, node neighborhood features in an original propagation graph are aggregated through stacking multiple layers of graph convolution layers, feature representation of nodes is learned, and long-distance dependency relations between information propagation are fully modeled. The specific updating mode of the node is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,

is a regularized graph Laplace matrix. D is a diagonal matrix, I is a unit matrix, alpha _k And beta _k Is two super parameters. />

Is a weight matrix, sigma(. Cndot.) is an activation function, I _n Representing the identity matrix.

By stacking the K-layer graph convolutions, node representations in the original propagation graph are obtained

Step D: after modeling the raw propagation in step C, potential propagation between nodes in the raw propagation map is defined multi-perspective using a Gaussian propagation reconstruction module. C, designing a Gaussian propagation reconstruction module based on the node representation obtained in the step C, modeling implicit propagation in the actual information propagation process at multiple angles, and estimating potential propagation interaction between every two nodes; specifically:

step D1: based on the node characteristics in the original propagation diagram, the Gaussian propagation reconstruction module models the node characteristics into a distributed representation, so that potential propagation dependence can be reflected more accurately and comprehensively. The distributed representation calculation method of the node is as follows:

wherein M represents the number of modeling perspectives, g _θ And g' _θ For two trainable neural networks, phi is a nonlinear activation function,

for mean value->

As variance, the multi-angle uncertainty of the propagating node is described. From distribution->

After mid-sampling, a distributed representation of the node is obtained:

wherein, E represents a standard normal distribution with a mean value of 0 and a variance of 1, and I is a unit array.

Step D2: obtaining node representations based on step D1

A new node representation is obtained. I.e. for any modeling perspective m, the corresponding node representation is denoted Q ^m 。

Based on the distribution used in step D1

For any modeling perspective m, the potential propagation interactions between the two nodes are calculated:

according to the calculation, M groups of potential propagation graphs after reconstruction can be obtained, and the corresponding node representation and the adjacent matrix are { Q }, respectively ¹ ,…,Q ^M Sum { S } ¹ ,…,S ^M }.

Step E: designing node representations of graph rolling network learning reconstruction graphs based on root node enhancement according to the reconstructed potential propagation graphs; in particular, the method comprises the steps of,

for each reconstructed potential propagation graph, given the node representation and adjacency matrix, the node neighborhood features in the potential propagation are further aggregated using a two-layer graph rolling network. The message propagation formula is defined as follows:

in the regularized form of the adjacency matrix S +.>

And->

Is a trainable parameter matrix.

Finally, M reconstructed potential propagation graphs are comprehensively considered, and each node representation is calculated as follows:

Z＝W _z [U ¹ ；…；U ^M ]+b _z

wherein W is _z And b _z Is a trainable parameter.

Step F: c, calculating final characteristic representation of the source blog through an average pooling layer according to the multi-group node representation obtained in the step C and the step E, and inputting the final characteristic representation into a classifier to finish false information detection. Specifically:

step F1: given the learned node representations in the original propagation and potential propagation graphs, the graph feature representations are computed by the average pooling layer in the following manner:

O＝meanpooling([V；Z])

wherein meanpooling (·) is the average pooling function.

Step F2: the false information detection task is essentially a classification task. Based on the final feature representation of the sample obtained in the step F1, calculating false information category label probability of the sample, namely:

wherein W is _c ,b _c Is a trainable parameter.

In summary, the present invention has the following technical effects:

1. the method provided by the invention can explore complex and potential propagation dependence in more propagation processes, and excavate more valuable propagation structure characteristics for detection;

2. the invention can effectively improve the detection accuracy of false information, and improves the accuracy by 5.3 percent and 1.9 percent on the public false news data sets of politics and entertainment compared with the prior method;

3. the invention can effectively model long-distance propagation dependence in propagation trees, has more accurate detection performance especially for deeper and wider propagation tree structures, and can improve the detection accuracy by about 50% for tree structures with propagation depths greater than 6 in the public data set.

As shown in fig. 2, the present invention further provides a false information detection device based on propagation reconstruction, where the system includes: the system comprises a data acquisition module 100, an information propagation graph construction module 200, a first graph rolling network module 300, a Gaussian propagation reconstruction module 400, a second graph rolling network module 500 and a false information classification module 600.

The data acquisition module 100 is used for acquiring propagation data of the blog in the social network; wherein the propagation data includes: the method comprises the steps of obtaining text content of a source blog and a subsequent propagation blog, a propagation relation set between a source blog node and a subsequent propagation blog node and a propagation relation set between propagation blog nodes;

a propagation map construction module 200, configured to construct an information propagation map based on the propagation data;

a first graph rolling network module 300, configured to aggregate node neighborhood features in the information propagation graph by using a depth graph convolution network, so as to obtain a node representation V of the information propagation graph;

a gaussian propagation reconstruction module 400, configured to estimate potential propagation interactions between nodes in the information propagation graph according to the node representation V, so as to generate a plurality of potential propagation graphs;

a second graph rolling network module 500, configured to aggregate node neighborhood features of each potential propagation graph by using the depth graph rolling network, and synthesize updated node representations of all updated potential propagation graphs to obtain a reconstructed node representation Z;

and the false information classification module 600 is configured to perform task classification based on the node representation V and the reconstructed node representation Z, so as to obtain a false information detection result of the blog.

The false information detection device based on propagation reconstruction provided by the embodiment of the present disclosure can implement each process implemented by the embodiment of the false information detection method based on propagation reconstruction, and in order to avoid repetition, a description is omitted here.

The invention further provides electronic equipment. The electronic device may be a computer device, a notebook computer, a server, or other type of electronic device.

An electronic device may include at least one processor and memory. The processor may execute instructions stored in the memory. The processor is communicatively coupled to the memory via a data bus. In addition to the memory, the processor may also be communicatively coupled to input devices, output devices, and communication devices via a data bus.

The processor may be any conventional processor. The processor may include, for example, a central processing unit (Central Processing Unit, CPU), an image processor (Graphic Process Unit, GPU), a field programmable gate array (Field Programmable Gate Array, FPGA), a System On Chip (SOC), an application specific integrated Chip (Application Specific Integrated Circuit, ASIC), or a combination thereof.

The memory may be implemented by any type of volatile or nonvolatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

In the embodiment of the disclosure, the memory stores executable instructions, and the processor can read the executable instructions from the memory and execute the instructions to implement all or part of the steps of the false information detection method based on propagation reconstruction.

In addition to the methods and apparatus described above, exemplary embodiments of the present disclosure include a computer program product or a computer-readable storage medium storing the computer program product. The computer program instructions are embodied in a computer program instruction that is executable by a processor to implement all or part of the steps described in the above exemplary embodiments.

The computer program product may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages, as well as scripting languages (e.g., python). The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

A computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the readable storage medium include: a Static Random Access Memory (SRAM), an electrically erasable programmable read-only memory (EEPROM), an erasable programmable read-only memory (EPROM), a programmable read-only memory (PROM), a read-only memory (ROM), a magnetic memory, a flash memory, a magnetic or optical disk, or any suitable combination of the foregoing having one or more electrical conductors.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which is also intended to be covered by the present invention.

Claims

1. A method for detecting false information based on propagation reconstruction, the method comprising:

constructing an information propagation diagram based on the propagation data;

2. The false information detection method based on propagation reconstruction according to claim 1, wherein the constructing an information propagation map based on the propagation data includes:

and, a step of, in the first embodiment,

3. The method for detecting false information based on propagation reconstruction according to claim 1, wherein the depth map-based convolution network aggregates node neighborhood features in the information propagation map to obtain node representation V, and the method comprises:

aggregating node neighbors in the information propagation graph based on the graph roll stacking of the K layersDomain features to derive node representations

Representing a weight matrix, +.>

D represents a diagonal matrix, I represents a unit matrix, < ->

4. The method for detecting false information based on propagation reconstruction as claimed in claim 3, wherein said estimating potential propagation interactions between nodes in said information propagation map based on said node representation V to generate a plurality of potential propagation maps comprises:

computing the node representation

Gaussian distribution at each modeling viewing angle +.>

Wherein (1)>

Represents the mean value at the mth modeling viewing angle, < >>

from the distribution

After mid-sampling, a distributed representation of node i under the mth modeling view is obtained +.>

/>

integrating distributed representations

Based on the potentially propagated interactions

Generating an adjacency matrix S at the mth modeling perspective ^m ；

5. The method for detecting false information based on propagation reconstruction as claimed in claim 4, wherein after aggregating node neighborhood features of each potential propagation map by using the depth map convolution network, synthesizing node representations of all updated potential propagation maps to obtain a reconstructed node representation Z, including:

Wherein (1)>

Representing the adjacency matrix S ^m Is a regularized form of->

Weights representing the first layer graph rolling network, < ->

A weight representing a second layer graph rolling network;

6. The method for detecting false information based on propagation reconstruction as claimed in claim 1, wherein said performing task classification based on said node representation V and said reconstruction node representation Z to obtain a false information detection result of said blog comprises:

7. A false information detection apparatus based on propagation reconstruction, the apparatus comprising:

8. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the spurious information detection method based on propagation reconstruction of any one of claims 1-6.

9. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, are adapted to carry out the method for false information detection based on propagation reconstruction as claimed in any one of the claims 1-6.