CN115964482A - Multi-mode false news detection method based on user cognitive consistency reasoning - Google Patents

Multi-mode false news detection method based on user cognitive consistency reasoning Download PDF

Info

Publication number
CN115964482A
CN115964482A CN202210574816.XA CN202210574816A CN115964482A CN 115964482 A CN115964482 A CN 115964482A CN 202210574816 A CN202210574816 A CN 202210574816A CN 115964482 A CN115964482 A CN 115964482A
Authority
CN
China
Prior art keywords
news
text
features
information
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210574816.XA
Other languages
Chinese (zh)
Inventor
吴连伟
齐召帅
张艳宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202210574816.XA priority Critical patent/CN115964482A/en
Publication of CN115964482A publication Critical patent/CN115964482A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to a multi-mode false news detection method based on user cognitive consistency reasoning, which comprises the following steps: carrying out embedded coding representation on text information and image information contained in multi-modal news; extracting the characteristics of the embedded coded text information and image information by using a self-attention network, inputting the characteristics into a designed cross attention block for semantic interaction, and integrating the consistent characteristics captured from two aspects of news text characteristics and news image characteristics; designing a context interaction layer to enable all comment information to be comprehensively interacted with global comment semantics, so that semantic features most concerned by users in comments are mined and strengthened; and designing a collaborative reasoning layer consisting of a collaborative guide block, a cross attention block and a polymerization fusion block, and carrying out collaborative inference on multi-modal news semantics and the most concerned comment features so as to discover inconsistent information between news and comments and improve the detection performance of the model.

Description

Multi-mode false news detection method based on user cognitive consistency reasoning
Technical Field
The invention belongs to the technical field of electronic information, and relates to a multi-mode false news detection method based on user cognitive consistency reasoning.
Background
Social media greatly facilitates the democratization of information sharing by virtue of the rapidity of information publishing and the equality of consumption. Therefore, how to efficiently and accurately identify false news propagated in social media has become one of the major key issues in the fields of social media analysis and information content security.
Recently, news spread on social media has gradually changed from a single modality form based on text to a multi-modality form based on images, videos, and the like. The multi-mode news attracts browsing, absorbing, sharing and spreading of readers, and can provide an immersive reading experience for the readers. However, false news also leverages this advantage to attract and mislead readers, resulting in a constant emergence of multi-modal false news. In order to alleviate the negative effects of false news, automatic detection of multi-modal false news on social media has become an important issue that needs to be solved urgently by the academic world, the industrial industry, and even government agencies.
Compared with the single-mode false news detection task, the multi-mode false news detection has the challenge of improving the detection performance by learning valuable features from multi-model information with unbalanced modes, large dimension difference and non-uniform semantic space. The existing research mainly focuses on extracting and learning the correlation characteristics among different multi-modal information for detection, and the existing research is mainly divided into two types: the first is a multi-modal interaction method, which aims to measure the similarity relationship between different cross-interaction mechanisms so as to match out similar semantic features. For example, zhou et al first extracts text and visual features from news, and then further constructs an aligned attention interaction mechanism to extract matching relationships between cross-modal features. The second method is to introduce auxiliary tasks related to multi-modal information and learn the association sharing characteristics among the multiple tasks. Jaiswal et al combine deep multimodal representation learning with outlier detection methods to capture the consistency relationship between multimodal features. Meanwhile, research based on a pre-training model is also becoming more common, and the pre-training model such as BERT, VGG-19 and the like is used for learning deep association semantics of texts and images in news, so that the detection performance is remarkably improved. However, although these methods achieve certain performance, they still have a serious drawback that it is difficult to capture inconsistent information between different modalities. Traditional detection methods usually build a series of aligned interaction models to capture common similar semantics between different modalities to improve detection performance, but it is difficult to capture inconsistent information between multiple modalities and apply such high-confidence indication features to detection. To address these problems, the present invention observes that a series of cognitive behaviors of users when they question rumors can easily discover inconsistent information in multimodal fake news. Based on this, how to model the process of user cognitive rumor to extract the difference features in the multi-modal information is the key of the multi-modal false news detection research.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects existing in the current multi-modal false information detection method and inspired by the cognitive process of user rumors, the invention provides a multi-modal false news detection method (UCCIN) based on user cognitive consistency reasoning, which respectively follows the following three visual angles: inconsistent information of false news is comprehensively mined from multiple levels, such as between news visual information and text content, between comments and comments, between news and comments and the like, and the multi-mode false news detection capability is improved. Specifically, in the UCCIN, the invention firstly designs a cross-modal alignment layer to perform semantic alignment on text information and visual information in news so as to check the consistency of news semantics. In order to obtain valuable information widely discussed by readers from comments, a context interaction layer is developed to enable each piece of comment information to interact with global comment semantics, and the comment semantics irrelevant to news are filtered through a designed dual-channel gating block, so that the most concerned comment semantics are strengthened. Finally, the collaborative inference layer is designed to drive the most concerned comment semantics and the consistent semantics of news to carry out interactive inference, and the inferred inconsistent features are aggregated and fused, so that the detection capability between news and comments is enhanced.
Technical scheme
A multi-mode false news detection method based on user cognitive consistency reasoning is characterized by comprising the following steps:
s1: embedded coding module
Respectively adopting a BERT encoder and a ResNet-152 to carry out embedded encoding representation on text information and image information contained in the multi-mode news;
s2: cross-modality alignment module
Extracting the characteristics of the embedded coded text information and image information by using a self-attention network, inputting the characteristics into a designed cross attention block for semantic interaction, and integrating the consistent characteristics captured from two aspects of news text characteristics and news image characteristics;
s3: context interaction module
Designing a context interaction layer to enable all comment information to be comprehensively interacted with global comment semantics, so that semantic features most concerned by users in comments are mined and strengthened;
s4: collaborative inference module
And designing a collaborative inference layer consisting of a collaborative guide block, a cross attention block and a polymerization fusion block, and carrying out collaborative inference on multi-mode news semantics and the most concerned comment features so as to discover inconsistent information between news and comments and improve the detection performance of the model.
The further technical scheme of the invention is as follows: s2 comprises the following steps:
s21: self-attention network: and respectively learning the global relevance of all positions in the text sequence and the image by adopting a multi-head self-attention mechanism, giving a query Q, a key K and a value V, and expressing the zoomed dot product attention as follows:
Figure RE-GDA0003843600220000031
wherein, in the text content, setting
Figure RE-GDA0003843600220000032
In the image information, setting/>
Figure RE-GDA0003843600220000033
d is the embedding dimension of the word, N is the length of the text sequence;
s22: projecting the query, key and value h times through different linear projections, and then these results perform scaled dot product attention in parallel; formally, a multi-head attention network can be represented as:
head i =Attention(QW i q ,KW i k ,VW i v ) (4)
Figure RE-GDA0003843600220000034
wherein, W i q ,W i k ,W i v And
Figure RE-GDA0003843600220000041
are trainable parameters and H is d/H; o = O 3 And O = O G The text features of the embedded codes and the image features of the embedded codes;
s23: cross attention block: setting the text characteristic after encoding as O T As a query, and image characteristics O G As keys and values; the coded image is characterized by O G Representing queries, and text functions O T As a key and a value; this process is described as:
Figure RE-GDA0003843600220000042
Figure RE-GDA0003843600220000043
Figure RE-GDA0003843600220000044
Figure RE-GDA0003843600220000045
wherein all W are trainable parameters;
s24: integrating consistent features captured from two perspectives of news text features and news image features:
Figure RE-GDA0003843600220000046
wherein, a 'and a' are provided; ' is an operation of splicing and
Figure RE-GDA0003843600220000047
is an overall consistency feature.
The invention further adopts the technical scheme that: s3 comprises the following steps:
s31: processing all comments by an average pooling strategy to obtain a global feature C avgp (ii) a Processing all comments by a max-pooling strategy to obtain a global feature C maxp
S32: cross attention Block reception C with S23 maxp As a query, C i As a bond, C avgp As a value, to obtain a potentially useful feature of the ith comment, i.e. that
Figure RE-GDA0003843600220000048
S33: a double-channel gate control block: one channel is used for directly transmitting all information downstream, and the other channel is used for screening potential useful features through a gating mechanism, namely the potential useful features are firstly converted in a space dimension, and then the features are filtered by using linear gating, and the method is represented as follows:
Figure RE-GDA0003843600220000049
Figure RE-GDA00038436002200000410
wherein W and b are both trainable parameters, a product operation between elements;
s34: the significant useful information mined from the k reviews is integrated to obtain the overall valuable sound O of all reviews CC
Figure RE-GDA0003843600220000051
The further technical scheme of the invention is as follows: s4, the method comprises the following steps:
s41: a cooperative guide block: developing two collaborative guiding blocks, guiding the learning of original valuable comment features from two angles of text content and image content respectively, and thus obtaining valuable comment features closely related to news texts and news images respectively; the text content guidance process is expressed as follows:
Figure RE-GDA0003843600220000052
Figure RE-GDA0003843600220000053
Figure RE-GDA0003843600220000054
Figure RE-GDA0003843600220000055
Figure RE-GDA0003843600220000056
Figure RE-GDA0003843600220000057
/>
wherein all W and b are trainable parameters;
s42: in the same manner as S41, the valuable comment characteristics guided by the news pictures are learned as
Figure RE-GDA0003843600220000058
S43: cross attention block: the semantics of the news text and the news image and the semantics of the comments are interacted by using two cross attention blocks respectively, and the module is used in the interaction of the news text and the comments
Figure RE-GDA0003843600220000059
As a query, is asserted>
Figure RE-GDA00038436002200000510
As a key, is selected>
Figure RE-GDA00038436002200000511
As a value; in interacting with comments for a news image, the present module uses @>
Figure RE-GDA00038436002200000512
As a query>
Figure RE-GDA00038436002200000513
As a key, is selected>
Figure RE-GDA00038436002200000514
As a value, the learned inconsistent semantics for text and images are labeled O, respectively TC And O GC
S44: a convergence fusion block: firstly, the salient feature parts of the inconsistent semantics of two angle information are respectively concerned through the product operation between elements, namely
Figure RE-GDA00038436002200000515
And &>
Figure RE-GDA00038436002200000516
Secondly, fusing the salient features with the text and image features by using a residual error network, namely T and g; and finally, fully fusing the characteristics in an absolute value difference mode:
Figure RE-GDA00038436002200000517
Figure RE-GDA00038436002200000518
Figure RE-GDA00038436002200000519
Figure RE-GDA0003843600220000061
TG=[T;|T-G|;T⊙G;G] (24)
s45: predicting the learning probability distribution by using a Softmax function, and performing cross entropy training by using a global loss function:
p=softmax(W p TG+b p ) (25)
Loss=-∑ylogp (26)
wherein, W p 、b p Is a trainable parameter and y is a true tag.
A computer system, comprising: one or more processors, a computer readable storage medium, for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the above-described method.
A computer-readable storage medium having stored thereon computer-executable instructions for, when executed, implementing the method described above.
A computer program comprising computer executable instructions which when executed perform the method described above.
Advantageous effects
The invention provides a multi-mode false news detection method (UCCIN) based on user cognitive consistency inference, which respectively comprises the following three visual angles: inconsistent information of false news is comprehensively mined from multiple levels, such as between news visual information and text content, between comments and comments, between news and comments and the like, and the multi-mode false news detection capability is improved. The invention provides a new idea for multi-mode information fusion and improves the detection performance of multi-mode false news. Compared with the prior art, the invention has the following innovations:
1: the consistency reasoning network based on the user cognitive rumor process is used for multi-mode false news detection, inconsistent information between news and comments can be found, and the detection capability of the multi-mode false news is improved. To our knowledge, this is the first application of user-cognitive mechanisms to multi-modal false news detection tasks;
2: the invention is inspired by the user to observe the cognitive process of the multi-mode news, designs the cross-mode alignment layer, and can focus on the consistency information of the multi-mode news from the two aspects of texts and images.
3: the context interaction layer developed by the invention can discover valuable semantics among comments through the combination of cross interaction and a two-channel gating mechanism.
Extensive experiments on three competitive false news detection data sets prove the superiority of the method and the synergistic effectiveness of each module.
Drawings
The drawings, in which like reference numerals refer to like parts throughout, are for the purpose of illustrating particular embodiments only and are not to be considered limiting of the invention.
FIG. 1 is an architectural diagram of the present invention;
FIG. 2 is a graph of experimental performance of the present invention under three data sets of Weibo, twitter and PHEME;
FIG. 3 is a graph comparing the separation performance of different modules of the present invention under three data sets of Weibo, twitter and PHEME.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the respective embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Aiming at the defects existing in the current multi-modal false information detection method and inspired by the cognitive process of user rumors, the invention provides a multi-modal false news detection method (UCCIN) based on user cognitive consistency reasoning, which respectively follows the following three visual angles: inconsistent information of false news is comprehensively mined from multiple levels, such as between news visual information and text content, between comments and comments, between news and comments, and the like, and the multi-mode false news detection capability is improved. Specifically, in the UCCIN, the invention firstly designs a cross-modal alignment layer to perform semantic alignment on text information and visual information in news so as to check the consistency of news semantics. In order to obtain valuable information widely discussed by readers from comments, a context interaction layer is developed to enable each piece of comment information to interact with global comment semantics, and the comment semantics irrelevant to news are filtered through a designed dual-channel gating block, so that the most concerned comment semantics are strengthened. Finally, the collaborative inference layer is designed to drive the most concerned comment semantics and the consistent semantics of news to carry out interactive inference, and the inferred inconsistent features are aggregated and fused, so that the detection capability between news and comments is enhanced.
The architecture diagram of the invention is shown in fig. 1, and comprises the following four modules:
module 1: and embedding the coding module.
Aiming at text information and visual information contained in multi-mode news, a BERT encoder and a ResNet-152 are respectively adopted to carry out embedded encoding representation on the text and the vision;
and a module 2: and aligning the modules across the modes.
The module simulates the attention process of people observing a piece of multi-modal news, namely, visual information is observed at first, then text content information is seen in detail with the content of picture information, self-attention network coding visual and text features are constructed, and a cross-attention network is designed to enable the visual features and the text features to carry out deep semantic interaction, so that the consistency of news semantics is checked;
and a module 3: a context interaction module.
Aiming at the fact that a large number of semantics with different viewpoints usually exist in the comments under news, the module enables all comment information and global comment semantics to be comprehensively interacted, and therefore the semantic features which are most concerned by users in the comments are mined and strengthened.
And a module 4: a collaborative inference module.
The module constructs a collaborative guiding layer and a cross attention layer to enable multi-modal news semantics and the most concerned comment features to carry out collaborative inference, and therefore inconsistent information between news and comments is explored to improve the detection performance of the model.
The method comprises the following specific steps:
module 1: embedded coding module
Step 1: multimodal news typically includes textual content and image information. Aiming at the coding of text content, the invention codes a text sequence by means of a pre-training BERT model, the text sequence being encoded may be denoted as T = { T = { T } 1 ,t 2 ,…,t N Therein of
Figure RE-GDA0003843600220000095
Is the coded i-th word.
Step 2: for encoding of news images, the present invention converts the image to 224 × 224 pixels and then learns a representation of the image information using ResNet-152. In order for visual features to achieve the same dimensionality as text features, we project the encoded image representation ResNet (I) through a linear transformation:
Figure RE-GDA0003843600220000091
G=W r ResNet(I) (2)
wherein r is i To represent the i-th image area, W, delineated by a 2048-dimensional vector r Are trainable parameters. G is the encoded vector of the image content.
And 3, step 3: for encoding of the ith comment under news, the invention uses a pre-trained BERT model to obtain a comment sequence representation C i ={c 1 ,c 2 ,…,c N Where M is the length of the comment sequence. The embedded encoding of k comments may be represented as C 1 ,C 2 ,…,C k
And a module 2: and aligning the modules across modes.
And 4, step 4: intuitively, when people watch news, people usually watch vivid images and then look at the text in detail with the image information. This process can be abstracted as multi-modal interactions, primarily learning information that is consistent between textual content and image content. To this end, the present invention designs a cross-modal alignment layer to simulate this process by first encoding text and image content using a self-attention network and then entering the encoded features into the designed cross-attention block for semantic interaction. The method comprises the following specific steps:
and 5: self-attention networks. The invention adopts a multi-head self-attention mechanism to learn the global correlation of all positions in a text sequence and an image respectively. Given query Q, key K, and value V, the scaled dot product attention is expressed as:
Figure RE-GDA0003843600220000092
wherein, in the text content, the module is arranged
Figure RE-GDA0003843600220000093
In the image information, the module sets &>
Figure RE-GDA0003843600220000094
d is the embedding dimension of the word, N is the length of the text sequence;
step 6: to obtain more valuable information from news in parallel, the module projects the query, keys and values h times through different linear projections, and then these results perform scaled dot product attention in parallel. Formally, a multi-head attention network can be represented as:
head i =Attention(QW i q ,KW i k ,VW i v ) (4)
Figure RE-GDA0003843600220000101
/>
wherein, W i q ,W i k ,W i v And
Figure RE-GDA0003843600220000102
are trainable parameters and H is d/H. O = O T And O = O G For embedded coded text sequences and embedded coded image features.
And 7: cross attention blocks. The cross-attention block is a variant of the standard multi-headed attention block, and can capture the global dependency between text and images. The present invention develops two cross-attention blocks from a text-image and image-text perspective. Wherein for the text-to-image cross attention blocks (equations (6) and (7)), this module sets the encoded text feature to O T As a query, and image features O G As a key and a value. In this way, the text features can guide the model to focus on a consistent image region. In contrast, for the image-text cross-attention blocks (equations (8) and (9)),the module sets the image to have an image characteristic of O G Representing queries, and text functions O T As keys and values. In this way, the module can capture consistent semantic features in the text content through the focus of the image features. This process can be described as:
Figure RE-GDA0003843600220000103
Figure RE-GDA0003843600220000104
Figure RE-GDA0003843600220000105
Figure RE-GDA0003843600220000106
wherein all W are trainable parameters.
And 8: to obtain more extensive consistent features from the news content itself, the present invention integrates the consistent features captured from both the perspective of news text semantics and news visual features:
Figure RE-GDA0003843600220000107
wherein'; ' is an operation of splicing and
Figure RE-GDA0003843600220000108
is an overall consistency feature.
And a module 3: and a context interaction module.
And step 9: for specific news there are often different sound perspectives in the reviews, especially for false news, where there are more different sound perspectives. In order to learn valuable information widely discussed by readers (audiences) from the comments, the invention designs a context interaction layer, so that all comments under news interact with each other. The module consists of two blocks, 1) cross attention block: enabling different comments to be deeply interacted, so that potentially useful features in the comments are captured; 2) A double-channel gate control block: more significant valuable semantics are enforced from potentially useful features.
Step 10: cross attention blocks. In order to capture key valuable features from all reviews, it is necessary for all reviews to interact with each other. If two reviews interact with each other, for a piece of news containing k reviews, k x (k-1) interaction networks need to be designed, and the mode can greatly expand the model and seriously affect the efficiency of the model. To overcome this problem, the present invention takes advantage of the global nature of the average pooling and maximum pooling operations to obtain a global feature of all comments, namely C avgp And C maxp
Step 11: the present invention then drives each review to interact with these two global features, thereby obtaining potentially useful segments between the two reviews. Specifically, our cross-attention block (same structure as the cross-attention of step 7) receives C maxp As a query, C i As a bond, C avgp As a value, to obtain a potentially useful feature of the ith comment, i.e.
Figure RE-GDA0003843600220000111
Step 12: and (4) a double-channel gating block. In order to filter the features which are not related to news and extract important and valuable sounds from the potentially useful features, the invention provides a two-channel gating block to purify the features so as to capture the important and valuable features in each comment. Specifically, the invention uses one channel to transmit all information directly downstream, and uses the other channel to screen potential useful features through a gating mechanism, i.e. firstly converting the potential useful features in a spatial dimension, and then filtering the features by using linear gating, which can be expressed as follows:
Figure RE-GDA0003843600220000112
Figure RE-GDA0003843600220000113
wherein W and b are trainable parameters. As a product operation between elements.
Step 13: the invention then integrates the significant useful information mined from the k reviews to obtain the overall valuable sound O of all reviews CC
Figure RE-GDA0003843600220000121
And (4) module: a collaborative inference module.
Step 14: in order to fully discover the inconsistent features between news and comments, the module does not simply splice the learned features from multiple angles of text content, image content, comments and the like. The method interacts valuable features in the comments with image features and text features of the news respectively, so that inconsistent features of the news and comment semantics are mined out. Specifically, the module designs a cooperative inference layer consisting of a cooperative guidance block, a cross attention block and an aggregation fusion block. The method comprises the following specific steps:
step 15: and coordinating with the guide block. Considering the diversity of valuable information in the comments and the possibility of existence of a plurality of characteristics irrelevant to news, the module develops two collaborative guiding blocks to guide the learning of the original valuable comment characteristics from the two aspects of text content and image content respectively, so as to obtain the valuable comment characteristics closely related to news text and news images respectively. Taking text content guide comments as an example, the process can be expressed as:
Figure RE-GDA0003843600220000122
Figure RE-GDA0003843600220000123
Figure RE-GDA0003843600220000124
Figure RE-GDA0003843600220000125
Figure RE-GDA0003843600220000126
Figure RE-GDA0003843600220000127
wherein all W and b are trainable parameters.
Step 16: in the same manner as step 15, the module can learn that the valuable comment characteristics guided by the news pictures are
Figure RE-GDA0003843600220000128
Will be/are>
Figure RE-GDA0003843600220000129
Is replaced by>
Figure RE-GDA00038436002200001210
And step 17: cross attention blocks. In order to learn from the comment features and reveal valuable semantics of the credibility of the questioning news content, the invention uses two cross attention blocks to respectively interact the semantics of news texts and news images with the semantics of comments. For interaction of news text and comments, the module is used
Figure RE-GDA00038436002200001211
As a query, is asserted>
Figure RE-GDA00038436002200001212
As a key, is selected>
Figure RE-GDA00038436002200001213
As a value. In interacting with comments for a news image, the present module uses @>
Figure RE-GDA00038436002200001214
As a query>
Figure RE-GDA00038436002200001215
As a key>
Figure RE-GDA00038436002200001216
As a value. In this manner, the learned inconsistent semantics for text and images are labeled O, respectively TC And O GC
Step 18: and converging the fusion blocks. The invention constructs the aggregation fusion block to fully fuse the inconsistent semantics acquired from the text and the image. In particular, the module first focuses on the salient feature parts of the inconsistent semantics of the two angular information, respectively, by means of an inter-element product operation, i.e. the
Figure RE-GDA0003843600220000131
And/or>
Figure RE-GDA0003843600220000132
Secondly, the salient features are fused with the text and image features using a residual network, namely T and g. And finally, fully fusing the characteristics in an absolute value difference mode.
Figure RE-GDA0003843600220000133
Figure RE-GDA0003843600220000134
Figure RE-GDA0003843600220000135
Figure RE-GDA0003843600220000136
TG=[T;|T-G|;T⊙G;G] (24)
Step 19: finally, predicting the learning probability distribution of the task by using a Softmax function, and using global loss to force a model to minimize cross entropy error of a training sample with a real label y:
p=softmax(W p TG+b p ) (25)
Loss=-∑ylogp (26)
the method is suitable for social network environment, and can provide a large amount of multi-modal information in the social media network environment.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present disclosure.

Claims (7)

1. A multi-mode false news detection method based on user cognitive consistency reasoning is characterized by comprising the following steps:
s1: embedded coding module
Respectively adopting a BERT encoder and a ResNet-152 to carry out embedded encoding representation on text information and image information contained in the multi-mode news;
s2: cross-modality alignment module
Extracting the characteristics of the embedded coded text information and image information by using a self-attention network, inputting the characteristics into a designed cross attention block for semantic interaction, and integrating the consistent characteristics captured from two angles of news text characteristics and news image characteristics;
s3: context interaction module
Designing a context interaction layer to enable all comment information to be comprehensively interacted with global comment semantics, so that semantic features most concerned by users in comments are mined and strengthened;
s4: collaborative inference module
And designing a collaborative reasoning layer consisting of a collaborative guide block, a cross attention block and a polymerization fusion block, and carrying out collaborative inference on multi-modal news semantics and the most concerned comment features so as to discover inconsistent information between news and comments and improve the detection performance of the model.
2. The multi-modal false news detection method based on user cognitive consistency reasoning according to claim 1, wherein S2 comprises the following steps:
s21: self-attention network: and respectively learning the global relevance of all positions in the text sequence and the image by adopting a multi-head self-attention mechanism, giving a query Q, a key K and a value V, and expressing the zoomed dot product attention as follows:
Figure FDA0003660213820000011
wherein, in the text content, setting
Figure FDA0003660213820000012
In the image information, setting->
Figure FDA0003660213820000013
Figure FDA0003660213820000014
d is the embedding dimension of the word, N is the length of the text sequence;
s22: projecting the query, key and value h times through different linear projections, and then these results perform scaled dot product attention in parallel; formally, a multi-head attention network can be represented as:
head i =Attention(QWi i q ,KW i k ,VW i v ) (4)
Figure FDA0003660213820000021
wherein, W i q ,W i k ,W i v And
Figure FDA0003660213820000022
are trainable parameters and H is d/H; o = O T And O = O G The text features of the embedded codes and the image features of the embedded codes;
s23: cross attention block: setting the coded text characteristic as O T As a query, and image features O G As keys and values; the coded image is characterized by O G Representing queries, and text functions O T As a key and a value; this process is described as:
Figure FDA0003660213820000023
Figure FDA0003660213820000024
Figure FDA0003660213820000025
Figure FDA0003660213820000026
wherein all W are trainable parameters;
s24: integrating the consistent features captured from two aspects of news text features and news image features:
Figure FDA0003660213820000027
wherein, a 'and a' are provided; ' is an operation of splicing and
Figure FDA0003660213820000028
is an overall consistency feature.
3. The multi-modal false news detection method based on user cognitive consistency reasoning according to claim 2, wherein the step S3 comprises the following steps:
s31: processing all comments by an average pooling strategy to obtain a global feature C avgp (ii) a Processing all comments by a maximum pooling strategy to obtain a global feature C maxp
S32: cross attention Block reception C with S23 maxp As a query, C i As a bond, C avgp As a value, to obtain a potentially useful feature of the ith comment, i.e.
Figure FDA0003660213820000029
S33: a double-channel gate control block: one channel is used for directly transmitting all information downstream, the other channel is used for screening potential useful features through a gating mechanism, namely the potential useful features are firstly converted in a space dimension, and then the features are filtered through linear gating, and the method is represented as follows:
Figure FDA0003660213820000031
Figure FDA0003660213820000032
wherein W and b are trainable parameters, a product operation between elements;
s34: the significant useful information mined from the k comments is integrated to obtain the overall valuable sound O of all comments CC
Figure FDA0003660213820000033
4. The multi-modal false news detection method based on user cognitive consistency reasoning according to claim 3, wherein S4 comprises the following steps:
s41: a collaborative guiding block: developing two collaborative guiding blocks, guiding the learning of original valuable comment features from two angles of text content and image content respectively, and accordingly obtaining valuable comment features closely related to news text and news images respectively; the text content guidance process is expressed as follows:
Figure FDA0003660213820000034
Figure FDA0003660213820000035
Figure FDA0003660213820000036
Figure FDA0003660213820000037
Figure FDA0003660213820000038
Figure FDA0003660213820000039
wherein all W and b are trainable parameters;
s42: in the same manner as S41, the valuable comment characteristics guided by the news pictures are learned as
Figure FDA00036602138200000310
S43: cross attention block: the semantics of the news text and the news image and the semantics of the comments are interacted respectively by using two cross attention blocks, and the module is used in the interaction of the news text and the comments
Figure FDA00036602138200000311
As a query, is asserted>
Figure FDA00036602138200000312
As a key, is selected>
Figure FDA00036602138200000313
As a value; in interacting with comments for a news image, the present module uses @>
Figure FDA00036602138200000314
As a query->
Figure FDA00036602138200000315
As a key, is selected>
Figure FDA00036602138200000316
As a value, learned inconsistent semantic score for text and imagesIs marked as O TC And O GC
S44: a convergence fusion block: firstly, the salient feature parts of the inconsistent semantics of two angle information are respectively concerned through the product operation between elements, namely
Figure FDA0003660213820000041
And &>
Figure FDA0003660213820000042
Secondly, fusing the salient features with the text and image features by using a residual error network, namely T and g; and finally, fully fusing the characteristics in an absolute value difference mode:
Figure FDA0003660213820000043
Figure FDA0003660213820000044
Figure FDA0003660213820000045
Figure FDA0003660213820000046
TG=[T;|T-G|;T⊙G;G] (24)
s45: predicting the learning probability distribution by using a Softmax function, and performing cross entropy training by using a global loss function:
p=softmax(W p TG+b p ) (25)
Loss=-∑ylogp (26)
wherein, W p 、b p In order to train the parameters, the user may,and y is a real label.
5. A computer system, comprising: one or more processors, a computer readable storage medium, for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of claim 1.
6. A computer-readable storage medium having stored thereon computer-executable instructions, which when executed, perform the method of claim 1.
7. A computer program comprising computer executable instructions which when executed perform the method of claim 1.
CN202210574816.XA 2022-05-24 2022-05-24 Multi-mode false news detection method based on user cognitive consistency reasoning Pending CN115964482A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210574816.XA CN115964482A (en) 2022-05-24 2022-05-24 Multi-mode false news detection method based on user cognitive consistency reasoning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210574816.XA CN115964482A (en) 2022-05-24 2022-05-24 Multi-mode false news detection method based on user cognitive consistency reasoning

Publications (1)

Publication Number Publication Date
CN115964482A true CN115964482A (en) 2023-04-14

Family

ID=87362240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210574816.XA Pending CN115964482A (en) 2022-05-24 2022-05-24 Multi-mode false news detection method based on user cognitive consistency reasoning

Country Status (1)

Country Link
CN (1) CN115964482A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116340887A (en) * 2023-05-29 2023-06-27 山东省人工智能研究院 Multi-mode false news detection method and system
CN117077085A (en) * 2023-10-17 2023-11-17 中国科学技术大学 Multi-mode harmful social media content identification method combining large model with two-way memory
CN117370679A (en) * 2023-12-06 2024-01-09 之江实验室 Method and device for verifying false messages of multi-mode bidirectional implication social network
CN118171235A (en) * 2024-05-08 2024-06-11 深圳市金大智能创新科技有限公司 Bimodal counterfeit information detection method based on large language model

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116340887A (en) * 2023-05-29 2023-06-27 山东省人工智能研究院 Multi-mode false news detection method and system
CN116340887B (en) * 2023-05-29 2023-09-01 山东省人工智能研究院 Multi-mode false news detection method and system
CN117077085A (en) * 2023-10-17 2023-11-17 中国科学技术大学 Multi-mode harmful social media content identification method combining large model with two-way memory
CN117077085B (en) * 2023-10-17 2024-02-09 中国科学技术大学 Multi-mode harmful social media content identification method combining large model with two-way memory
CN117370679A (en) * 2023-12-06 2024-01-09 之江实验室 Method and device for verifying false messages of multi-mode bidirectional implication social network
CN117370679B (en) * 2023-12-06 2024-03-26 之江实验室 Method and device for verifying false messages of multi-mode bidirectional implication social network
CN118171235A (en) * 2024-05-08 2024-06-11 深圳市金大智能创新科技有限公司 Bimodal counterfeit information detection method based on large language model

Similar Documents

Publication Publication Date Title
CN115964482A (en) Multi-mode false news detection method based on user cognitive consistency reasoning
Dong et al. Reading-strategy inspired visual representation learning for text-to-video retrieval
Tang et al. Graph-based multimodal sequential embedding for sign language translation
CN110597963B (en) Expression question-answering library construction method, expression search device and storage medium
Zeng et al. Tag-assisted multimodal sentiment analysis under uncertain missing modalities
Li et al. Cross-modal semantic communications
CN115455970A (en) Image-text combined named entity recognition method for multi-modal semantic collaborative interaction
Sun et al. Graph prompt learning: A comprehensive survey and beyond
Zang et al. Multimodal icon annotation for mobile applications
CN113158798A (en) Short video classification method based on multi-mode feature complete representation
Tajrobehkar et al. Align R-CNN: A pairwise head network for visual relationship detection
Sudhakaran et al. Gate-shift-fuse for video action recognition
Gao et al. Generalized pyramid co-attention with learnable aggregation net for video question answering
CN117893948A (en) Multi-mode emotion analysis method based on multi-granularity feature comparison and fusion framework
Jia et al. Semantic association enhancement transformer with relative position for image captioning
CN112988959A (en) False news interpretability detection system and method based on evidence inference network
Kou et al. What and Why? Towards Duo Explainable Fauxtography Detection Under Constrained Supervision
Zhou et al. Multi-modal multi-hop interaction network for dialogue response generation
Sharma et al. Visual question answering model based on the fusion of multimodal features by a two-way co-attention mechanism
CN113010662B (en) Hierarchical conversational machine reading understanding system and method
Xue et al. A multi-modal fusion framework for continuous sign language recognition based on multi-layer self-attention mechanism
Zhou et al. The State of the Art for Cross-Modal Retrieval: A Survey
CN113869518A (en) Visual common sense reasoning method and device, electronic equipment and storage medium
Li et al. Using artificial intelligence assisted learning technology on augmented reality-based manufacture workflow
Hu et al. Overall-Distinctive GCN for Social Relation Recognition on Videos

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination