CN115964482A - Multi-mode false news detection method based on user cognitive consistency reasoning - Google Patents
Multi-mode false news detection method based on user cognitive consistency reasoning Download PDFInfo
- Publication number
- CN115964482A CN115964482A CN202210574816.XA CN202210574816A CN115964482A CN 115964482 A CN115964482 A CN 115964482A CN 202210574816 A CN202210574816 A CN 202210574816A CN 115964482 A CN115964482 A CN 115964482A
- Authority
- CN
- China
- Prior art keywords
- news
- text
- features
- information
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Machine Translation (AREA)
Abstract
The invention relates to a multi-mode false news detection method based on user cognitive consistency reasoning, which comprises the following steps: carrying out embedded coding representation on text information and image information contained in multi-modal news; extracting the characteristics of the embedded coded text information and image information by using a self-attention network, inputting the characteristics into a designed cross attention block for semantic interaction, and integrating the consistent characteristics captured from two aspects of news text characteristics and news image characteristics; designing a context interaction layer to enable all comment information to be comprehensively interacted with global comment semantics, so that semantic features most concerned by users in comments are mined and strengthened; and designing a collaborative reasoning layer consisting of a collaborative guide block, a cross attention block and a polymerization fusion block, and carrying out collaborative inference on multi-modal news semantics and the most concerned comment features so as to discover inconsistent information between news and comments and improve the detection performance of the model.
Description
Technical Field
The invention belongs to the technical field of electronic information, and relates to a multi-mode false news detection method based on user cognitive consistency reasoning.
Background
Social media greatly facilitates the democratization of information sharing by virtue of the rapidity of information publishing and the equality of consumption. Therefore, how to efficiently and accurately identify false news propagated in social media has become one of the major key issues in the fields of social media analysis and information content security.
Recently, news spread on social media has gradually changed from a single modality form based on text to a multi-modality form based on images, videos, and the like. The multi-mode news attracts browsing, absorbing, sharing and spreading of readers, and can provide an immersive reading experience for the readers. However, false news also leverages this advantage to attract and mislead readers, resulting in a constant emergence of multi-modal false news. In order to alleviate the negative effects of false news, automatic detection of multi-modal false news on social media has become an important issue that needs to be solved urgently by the academic world, the industrial industry, and even government agencies.
Compared with the single-mode false news detection task, the multi-mode false news detection has the challenge of improving the detection performance by learning valuable features from multi-model information with unbalanced modes, large dimension difference and non-uniform semantic space. The existing research mainly focuses on extracting and learning the correlation characteristics among different multi-modal information for detection, and the existing research is mainly divided into two types: the first is a multi-modal interaction method, which aims to measure the similarity relationship between different cross-interaction mechanisms so as to match out similar semantic features. For example, zhou et al first extracts text and visual features from news, and then further constructs an aligned attention interaction mechanism to extract matching relationships between cross-modal features. The second method is to introduce auxiliary tasks related to multi-modal information and learn the association sharing characteristics among the multiple tasks. Jaiswal et al combine deep multimodal representation learning with outlier detection methods to capture the consistency relationship between multimodal features. Meanwhile, research based on a pre-training model is also becoming more common, and the pre-training model such as BERT, VGG-19 and the like is used for learning deep association semantics of texts and images in news, so that the detection performance is remarkably improved. However, although these methods achieve certain performance, they still have a serious drawback that it is difficult to capture inconsistent information between different modalities. Traditional detection methods usually build a series of aligned interaction models to capture common similar semantics between different modalities to improve detection performance, but it is difficult to capture inconsistent information between multiple modalities and apply such high-confidence indication features to detection. To address these problems, the present invention observes that a series of cognitive behaviors of users when they question rumors can easily discover inconsistent information in multimodal fake news. Based on this, how to model the process of user cognitive rumor to extract the difference features in the multi-modal information is the key of the multi-modal false news detection research.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects existing in the current multi-modal false information detection method and inspired by the cognitive process of user rumors, the invention provides a multi-modal false news detection method (UCCIN) based on user cognitive consistency reasoning, which respectively follows the following three visual angles: inconsistent information of false news is comprehensively mined from multiple levels, such as between news visual information and text content, between comments and comments, between news and comments and the like, and the multi-mode false news detection capability is improved. Specifically, in the UCCIN, the invention firstly designs a cross-modal alignment layer to perform semantic alignment on text information and visual information in news so as to check the consistency of news semantics. In order to obtain valuable information widely discussed by readers from comments, a context interaction layer is developed to enable each piece of comment information to interact with global comment semantics, and the comment semantics irrelevant to news are filtered through a designed dual-channel gating block, so that the most concerned comment semantics are strengthened. Finally, the collaborative inference layer is designed to drive the most concerned comment semantics and the consistent semantics of news to carry out interactive inference, and the inferred inconsistent features are aggregated and fused, so that the detection capability between news and comments is enhanced.
Technical scheme
A multi-mode false news detection method based on user cognitive consistency reasoning is characterized by comprising the following steps:
s1: embedded coding module
Respectively adopting a BERT encoder and a ResNet-152 to carry out embedded encoding representation on text information and image information contained in the multi-mode news;
s2: cross-modality alignment module
Extracting the characteristics of the embedded coded text information and image information by using a self-attention network, inputting the characteristics into a designed cross attention block for semantic interaction, and integrating the consistent characteristics captured from two aspects of news text characteristics and news image characteristics;
s3: context interaction module
Designing a context interaction layer to enable all comment information to be comprehensively interacted with global comment semantics, so that semantic features most concerned by users in comments are mined and strengthened;
s4: collaborative inference module
And designing a collaborative inference layer consisting of a collaborative guide block, a cross attention block and a polymerization fusion block, and carrying out collaborative inference on multi-mode news semantics and the most concerned comment features so as to discover inconsistent information between news and comments and improve the detection performance of the model.
The further technical scheme of the invention is as follows: s2 comprises the following steps:
s21: self-attention network: and respectively learning the global relevance of all positions in the text sequence and the image by adopting a multi-head self-attention mechanism, giving a query Q, a key K and a value V, and expressing the zoomed dot product attention as follows:
wherein, in the text content, settingIn the image information, setting/>d is the embedding dimension of the word, N is the length of the text sequence;
s22: projecting the query, key and value h times through different linear projections, and then these results perform scaled dot product attention in parallel; formally, a multi-head attention network can be represented as:
head i =Attention(QW i q ,KW i k ,VW i v ) (4)
wherein, W i q ,W i k ,W i v Andare trainable parameters and H is d/H; o = O 3 And O = O G The text features of the embedded codes and the image features of the embedded codes;
s23: cross attention block: setting the text characteristic after encoding as O T As a query, and image characteristics O G As keys and values; the coded image is characterized by O G Representing queries, and text functions O T As a key and a value; this process is described as:
wherein all W are trainable parameters;
s24: integrating consistent features captured from two perspectives of news text features and news image features:
wherein, a 'and a' are provided; ' is an operation of splicing andis an overall consistency feature.
The invention further adopts the technical scheme that: s3 comprises the following steps:
s31: processing all comments by an average pooling strategy to obtain a global feature C avgp (ii) a Processing all comments by a max-pooling strategy to obtain a global feature C maxp ;
S32: cross attention Block reception C with S23 maxp As a query, C i As a bond, C avgp As a value, to obtain a potentially useful feature of the ith comment, i.e. that
S33: a double-channel gate control block: one channel is used for directly transmitting all information downstream, and the other channel is used for screening potential useful features through a gating mechanism, namely the potential useful features are firstly converted in a space dimension, and then the features are filtered by using linear gating, and the method is represented as follows:
wherein W and b are both trainable parameters, a product operation between elements;
s34: the significant useful information mined from the k reviews is integrated to obtain the overall valuable sound O of all reviews CC :
The further technical scheme of the invention is as follows: s4, the method comprises the following steps:
s41: a cooperative guide block: developing two collaborative guiding blocks, guiding the learning of original valuable comment features from two angles of text content and image content respectively, and thus obtaining valuable comment features closely related to news texts and news images respectively; the text content guidance process is expressed as follows:
wherein all W and b are trainable parameters;
s42: in the same manner as S41, the valuable comment characteristics guided by the news pictures are learned as
S43: cross attention block: the semantics of the news text and the news image and the semantics of the comments are interacted by using two cross attention blocks respectively, and the module is used in the interaction of the news text and the commentsAs a query, is asserted>As a key, is selected>As a value; in interacting with comments for a news image, the present module uses @>As a query>As a key, is selected>As a value, the learned inconsistent semantics for text and images are labeled O, respectively TC And O GC ;
S44: a convergence fusion block: firstly, the salient feature parts of the inconsistent semantics of two angle information are respectively concerned through the product operation between elements, namelyAnd &>Secondly, fusing the salient features with the text and image features by using a residual error network, namely T and g; and finally, fully fusing the characteristics in an absolute value difference mode:
TG=[T;|T-G|;T⊙G;G] (24)
s45: predicting the learning probability distribution by using a Softmax function, and performing cross entropy training by using a global loss function:
p=softmax(W p TG+b p ) (25)
Loss=-∑ylogp (26)
wherein, W p 、b p Is a trainable parameter and y is a true tag.
A computer system, comprising: one or more processors, a computer readable storage medium, for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the above-described method.
A computer-readable storage medium having stored thereon computer-executable instructions for, when executed, implementing the method described above.
A computer program comprising computer executable instructions which when executed perform the method described above.
Advantageous effects
The invention provides a multi-mode false news detection method (UCCIN) based on user cognitive consistency inference, which respectively comprises the following three visual angles: inconsistent information of false news is comprehensively mined from multiple levels, such as between news visual information and text content, between comments and comments, between news and comments and the like, and the multi-mode false news detection capability is improved. The invention provides a new idea for multi-mode information fusion and improves the detection performance of multi-mode false news. Compared with the prior art, the invention has the following innovations:
1: the consistency reasoning network based on the user cognitive rumor process is used for multi-mode false news detection, inconsistent information between news and comments can be found, and the detection capability of the multi-mode false news is improved. To our knowledge, this is the first application of user-cognitive mechanisms to multi-modal false news detection tasks;
2: the invention is inspired by the user to observe the cognitive process of the multi-mode news, designs the cross-mode alignment layer, and can focus on the consistency information of the multi-mode news from the two aspects of texts and images.
3: the context interaction layer developed by the invention can discover valuable semantics among comments through the combination of cross interaction and a two-channel gating mechanism.
Extensive experiments on three competitive false news detection data sets prove the superiority of the method and the synergistic effectiveness of each module.
Drawings
The drawings, in which like reference numerals refer to like parts throughout, are for the purpose of illustrating particular embodiments only and are not to be considered limiting of the invention.
FIG. 1 is an architectural diagram of the present invention;
FIG. 2 is a graph of experimental performance of the present invention under three data sets of Weibo, twitter and PHEME;
FIG. 3 is a graph comparing the separation performance of different modules of the present invention under three data sets of Weibo, twitter and PHEME.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the respective embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Aiming at the defects existing in the current multi-modal false information detection method and inspired by the cognitive process of user rumors, the invention provides a multi-modal false news detection method (UCCIN) based on user cognitive consistency reasoning, which respectively follows the following three visual angles: inconsistent information of false news is comprehensively mined from multiple levels, such as between news visual information and text content, between comments and comments, between news and comments, and the like, and the multi-mode false news detection capability is improved. Specifically, in the UCCIN, the invention firstly designs a cross-modal alignment layer to perform semantic alignment on text information and visual information in news so as to check the consistency of news semantics. In order to obtain valuable information widely discussed by readers from comments, a context interaction layer is developed to enable each piece of comment information to interact with global comment semantics, and the comment semantics irrelevant to news are filtered through a designed dual-channel gating block, so that the most concerned comment semantics are strengthened. Finally, the collaborative inference layer is designed to drive the most concerned comment semantics and the consistent semantics of news to carry out interactive inference, and the inferred inconsistent features are aggregated and fused, so that the detection capability between news and comments is enhanced.
The architecture diagram of the invention is shown in fig. 1, and comprises the following four modules:
module 1: and embedding the coding module.
Aiming at text information and visual information contained in multi-mode news, a BERT encoder and a ResNet-152 are respectively adopted to carry out embedded encoding representation on the text and the vision;
and a module 2: and aligning the modules across the modes.
The module simulates the attention process of people observing a piece of multi-modal news, namely, visual information is observed at first, then text content information is seen in detail with the content of picture information, self-attention network coding visual and text features are constructed, and a cross-attention network is designed to enable the visual features and the text features to carry out deep semantic interaction, so that the consistency of news semantics is checked;
and a module 3: a context interaction module.
Aiming at the fact that a large number of semantics with different viewpoints usually exist in the comments under news, the module enables all comment information and global comment semantics to be comprehensively interacted, and therefore the semantic features which are most concerned by users in the comments are mined and strengthened.
And a module 4: a collaborative inference module.
The module constructs a collaborative guiding layer and a cross attention layer to enable multi-modal news semantics and the most concerned comment features to carry out collaborative inference, and therefore inconsistent information between news and comments is explored to improve the detection performance of the model.
The method comprises the following specific steps:
module 1: embedded coding module
Step 1: multimodal news typically includes textual content and image information. Aiming at the coding of text content, the invention codes a text sequence by means of a pre-training BERT model, the text sequence being encoded may be denoted as T = { T = { T } 1 ,t 2 ,…,t N Therein ofIs the coded i-th word.
Step 2: for encoding of news images, the present invention converts the image to 224 × 224 pixels and then learns a representation of the image information using ResNet-152. In order for visual features to achieve the same dimensionality as text features, we project the encoded image representation ResNet (I) through a linear transformation:
G=W r ResNet(I) (2)
wherein r is i To represent the i-th image area, W, delineated by a 2048-dimensional vector r Are trainable parameters. G is the encoded vector of the image content.
And 3, step 3: for encoding of the ith comment under news, the invention uses a pre-trained BERT model to obtain a comment sequence representation C i ={c 1 ,c 2 ,…,c N Where M is the length of the comment sequence. The embedded encoding of k comments may be represented as C 1 ,C 2 ,…,C k 。
And a module 2: and aligning the modules across modes.
And 4, step 4: intuitively, when people watch news, people usually watch vivid images and then look at the text in detail with the image information. This process can be abstracted as multi-modal interactions, primarily learning information that is consistent between textual content and image content. To this end, the present invention designs a cross-modal alignment layer to simulate this process by first encoding text and image content using a self-attention network and then entering the encoded features into the designed cross-attention block for semantic interaction. The method comprises the following specific steps:
and 5: self-attention networks. The invention adopts a multi-head self-attention mechanism to learn the global correlation of all positions in a text sequence and an image respectively. Given query Q, key K, and value V, the scaled dot product attention is expressed as:
wherein, in the text content, the module is arrangedIn the image information, the module sets &>d is the embedding dimension of the word, N is the length of the text sequence;
step 6: to obtain more valuable information from news in parallel, the module projects the query, keys and values h times through different linear projections, and then these results perform scaled dot product attention in parallel. Formally, a multi-head attention network can be represented as:
head i =Attention(QW i q ,KW i k ,VW i v ) (4)
wherein, W i q ,W i k ,W i v Andare trainable parameters and H is d/H. O = O T And O = O G For embedded coded text sequences and embedded coded image features.
And 7: cross attention blocks. The cross-attention block is a variant of the standard multi-headed attention block, and can capture the global dependency between text and images. The present invention develops two cross-attention blocks from a text-image and image-text perspective. Wherein for the text-to-image cross attention blocks (equations (6) and (7)), this module sets the encoded text feature to O T As a query, and image features O G As a key and a value. In this way, the text features can guide the model to focus on a consistent image region. In contrast, for the image-text cross-attention blocks (equations (8) and (9)),the module sets the image to have an image characteristic of O G Representing queries, and text functions O T As keys and values. In this way, the module can capture consistent semantic features in the text content through the focus of the image features. This process can be described as:
wherein all W are trainable parameters.
And 8: to obtain more extensive consistent features from the news content itself, the present invention integrates the consistent features captured from both the perspective of news text semantics and news visual features:
And a module 3: and a context interaction module.
And step 9: for specific news there are often different sound perspectives in the reviews, especially for false news, where there are more different sound perspectives. In order to learn valuable information widely discussed by readers (audiences) from the comments, the invention designs a context interaction layer, so that all comments under news interact with each other. The module consists of two blocks, 1) cross attention block: enabling different comments to be deeply interacted, so that potentially useful features in the comments are captured; 2) A double-channel gate control block: more significant valuable semantics are enforced from potentially useful features.
Step 10: cross attention blocks. In order to capture key valuable features from all reviews, it is necessary for all reviews to interact with each other. If two reviews interact with each other, for a piece of news containing k reviews, k x (k-1) interaction networks need to be designed, and the mode can greatly expand the model and seriously affect the efficiency of the model. To overcome this problem, the present invention takes advantage of the global nature of the average pooling and maximum pooling operations to obtain a global feature of all comments, namely C avgp And C maxp 。
Step 11: the present invention then drives each review to interact with these two global features, thereby obtaining potentially useful segments between the two reviews. Specifically, our cross-attention block (same structure as the cross-attention of step 7) receives C maxp As a query, C i As a bond, C avgp As a value, to obtain a potentially useful feature of the ith comment, i.e.
Step 12: and (4) a double-channel gating block. In order to filter the features which are not related to news and extract important and valuable sounds from the potentially useful features, the invention provides a two-channel gating block to purify the features so as to capture the important and valuable features in each comment. Specifically, the invention uses one channel to transmit all information directly downstream, and uses the other channel to screen potential useful features through a gating mechanism, i.e. firstly converting the potential useful features in a spatial dimension, and then filtering the features by using linear gating, which can be expressed as follows:
wherein W and b are trainable parameters. As a product operation between elements.
Step 13: the invention then integrates the significant useful information mined from the k reviews to obtain the overall valuable sound O of all reviews CC 。
And (4) module: a collaborative inference module.
Step 14: in order to fully discover the inconsistent features between news and comments, the module does not simply splice the learned features from multiple angles of text content, image content, comments and the like. The method interacts valuable features in the comments with image features and text features of the news respectively, so that inconsistent features of the news and comment semantics are mined out. Specifically, the module designs a cooperative inference layer consisting of a cooperative guidance block, a cross attention block and an aggregation fusion block. The method comprises the following specific steps:
step 15: and coordinating with the guide block. Considering the diversity of valuable information in the comments and the possibility of existence of a plurality of characteristics irrelevant to news, the module develops two collaborative guiding blocks to guide the learning of the original valuable comment characteristics from the two aspects of text content and image content respectively, so as to obtain the valuable comment characteristics closely related to news text and news images respectively. Taking text content guide comments as an example, the process can be expressed as:
wherein all W and b are trainable parameters.
Step 16: in the same manner as step 15, the module can learn that the valuable comment characteristics guided by the news pictures areWill be/are>Is replaced by>
And step 17: cross attention blocks. In order to learn from the comment features and reveal valuable semantics of the credibility of the questioning news content, the invention uses two cross attention blocks to respectively interact the semantics of news texts and news images with the semantics of comments. For interaction of news text and comments, the module is usedAs a query, is asserted>As a key, is selected>As a value. In interacting with comments for a news image, the present module uses @>As a query>As a key>As a value. In this manner, the learned inconsistent semantics for text and images are labeled O, respectively TC And O GC 。
Step 18: and converging the fusion blocks. The invention constructs the aggregation fusion block to fully fuse the inconsistent semantics acquired from the text and the image. In particular, the module first focuses on the salient feature parts of the inconsistent semantics of the two angular information, respectively, by means of an inter-element product operation, i.e. theAnd/or>Secondly, the salient features are fused with the text and image features using a residual network, namely T and g. And finally, fully fusing the characteristics in an absolute value difference mode.
TG=[T;|T-G|;T⊙G;G] (24)
Step 19: finally, predicting the learning probability distribution of the task by using a Softmax function, and using global loss to force a model to minimize cross entropy error of a training sample with a real label y:
p=softmax(W p TG+b p ) (25)
Loss=-∑ylogp (26)
the method is suitable for social network environment, and can provide a large amount of multi-modal information in the social media network environment.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present disclosure.
Claims (7)
1. A multi-mode false news detection method based on user cognitive consistency reasoning is characterized by comprising the following steps:
s1: embedded coding module
Respectively adopting a BERT encoder and a ResNet-152 to carry out embedded encoding representation on text information and image information contained in the multi-mode news;
s2: cross-modality alignment module
Extracting the characteristics of the embedded coded text information and image information by using a self-attention network, inputting the characteristics into a designed cross attention block for semantic interaction, and integrating the consistent characteristics captured from two angles of news text characteristics and news image characteristics;
s3: context interaction module
Designing a context interaction layer to enable all comment information to be comprehensively interacted with global comment semantics, so that semantic features most concerned by users in comments are mined and strengthened;
s4: collaborative inference module
And designing a collaborative reasoning layer consisting of a collaborative guide block, a cross attention block and a polymerization fusion block, and carrying out collaborative inference on multi-modal news semantics and the most concerned comment features so as to discover inconsistent information between news and comments and improve the detection performance of the model.
2. The multi-modal false news detection method based on user cognitive consistency reasoning according to claim 1, wherein S2 comprises the following steps:
s21: self-attention network: and respectively learning the global relevance of all positions in the text sequence and the image by adopting a multi-head self-attention mechanism, giving a query Q, a key K and a value V, and expressing the zoomed dot product attention as follows:
wherein, in the text content, settingIn the image information, setting-> d is the embedding dimension of the word, N is the length of the text sequence;
s22: projecting the query, key and value h times through different linear projections, and then these results perform scaled dot product attention in parallel; formally, a multi-head attention network can be represented as:
head i =Attention(QWi i q ,KW i k ,VW i v ) (4)
wherein, W i q ,W i k ,W i v Andare trainable parameters and H is d/H; o = O T And O = O G The text features of the embedded codes and the image features of the embedded codes;
s23: cross attention block: setting the coded text characteristic as O T As a query, and image features O G As keys and values; the coded image is characterized by O G Representing queries, and text functions O T As a key and a value; this process is described as:
wherein all W are trainable parameters;
s24: integrating the consistent features captured from two aspects of news text features and news image features:
3. The multi-modal false news detection method based on user cognitive consistency reasoning according to claim 2, wherein the step S3 comprises the following steps:
s31: processing all comments by an average pooling strategy to obtain a global feature C avgp (ii) a Processing all comments by a maximum pooling strategy to obtain a global feature C maxp ;
S32: cross attention Block reception C with S23 maxp As a query, C i As a bond, C avgp As a value, to obtain a potentially useful feature of the ith comment, i.e.
S33: a double-channel gate control block: one channel is used for directly transmitting all information downstream, the other channel is used for screening potential useful features through a gating mechanism, namely the potential useful features are firstly converted in a space dimension, and then the features are filtered through linear gating, and the method is represented as follows:
wherein W and b are trainable parameters, a product operation between elements;
s34: the significant useful information mined from the k comments is integrated to obtain the overall valuable sound O of all comments CC :
4. The multi-modal false news detection method based on user cognitive consistency reasoning according to claim 3, wherein S4 comprises the following steps:
s41: a collaborative guiding block: developing two collaborative guiding blocks, guiding the learning of original valuable comment features from two angles of text content and image content respectively, and accordingly obtaining valuable comment features closely related to news text and news images respectively; the text content guidance process is expressed as follows:
wherein all W and b are trainable parameters;
s42: in the same manner as S41, the valuable comment characteristics guided by the news pictures are learned as
S43: cross attention block: the semantics of the news text and the news image and the semantics of the comments are interacted respectively by using two cross attention blocks, and the module is used in the interaction of the news text and the commentsAs a query, is asserted>As a key, is selected>As a value; in interacting with comments for a news image, the present module uses @>As a query->As a key, is selected>As a value, learned inconsistent semantic score for text and imagesIs marked as O TC And O GC ;
S44: a convergence fusion block: firstly, the salient feature parts of the inconsistent semantics of two angle information are respectively concerned through the product operation between elements, namelyAnd &>Secondly, fusing the salient features with the text and image features by using a residual error network, namely T and g; and finally, fully fusing the characteristics in an absolute value difference mode:
TG=[T;|T-G|;T⊙G;G] (24)
s45: predicting the learning probability distribution by using a Softmax function, and performing cross entropy training by using a global loss function:
p=softmax(W p TG+b p ) (25)
Loss=-∑ylogp (26)
wherein, W p 、b p In order to train the parameters, the user may,and y is a real label.
5. A computer system, comprising: one or more processors, a computer readable storage medium, for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of claim 1.
6. A computer-readable storage medium having stored thereon computer-executable instructions, which when executed, perform the method of claim 1.
7. A computer program comprising computer executable instructions which when executed perform the method of claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210574816.XA CN115964482A (en) | 2022-05-24 | 2022-05-24 | Multi-mode false news detection method based on user cognitive consistency reasoning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210574816.XA CN115964482A (en) | 2022-05-24 | 2022-05-24 | Multi-mode false news detection method based on user cognitive consistency reasoning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115964482A true CN115964482A (en) | 2023-04-14 |
Family
ID=87362240
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210574816.XA Pending CN115964482A (en) | 2022-05-24 | 2022-05-24 | Multi-mode false news detection method based on user cognitive consistency reasoning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115964482A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116340887A (en) * | 2023-05-29 | 2023-06-27 | 山东省人工智能研究院 | Multi-mode false news detection method and system |
CN117077085A (en) * | 2023-10-17 | 2023-11-17 | 中国科学技术大学 | Multi-mode harmful social media content identification method combining large model with two-way memory |
CN117370679A (en) * | 2023-12-06 | 2024-01-09 | 之江实验室 | Method and device for verifying false messages of multi-mode bidirectional implication social network |
CN118171235A (en) * | 2024-05-08 | 2024-06-11 | 深圳市金大智能创新科技有限公司 | Bimodal counterfeit information detection method based on large language model |
-
2022
- 2022-05-24 CN CN202210574816.XA patent/CN115964482A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116340887A (en) * | 2023-05-29 | 2023-06-27 | 山东省人工智能研究院 | Multi-mode false news detection method and system |
CN116340887B (en) * | 2023-05-29 | 2023-09-01 | 山东省人工智能研究院 | Multi-mode false news detection method and system |
CN117077085A (en) * | 2023-10-17 | 2023-11-17 | 中国科学技术大学 | Multi-mode harmful social media content identification method combining large model with two-way memory |
CN117077085B (en) * | 2023-10-17 | 2024-02-09 | 中国科学技术大学 | Multi-mode harmful social media content identification method combining large model with two-way memory |
CN117370679A (en) * | 2023-12-06 | 2024-01-09 | 之江实验室 | Method and device for verifying false messages of multi-mode bidirectional implication social network |
CN117370679B (en) * | 2023-12-06 | 2024-03-26 | 之江实验室 | Method and device for verifying false messages of multi-mode bidirectional implication social network |
CN118171235A (en) * | 2024-05-08 | 2024-06-11 | 深圳市金大智能创新科技有限公司 | Bimodal counterfeit information detection method based on large language model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115964482A (en) | Multi-mode false news detection method based on user cognitive consistency reasoning | |
Dong et al. | Reading-strategy inspired visual representation learning for text-to-video retrieval | |
Tang et al. | Graph-based multimodal sequential embedding for sign language translation | |
CN110597963B (en) | Expression question-answering library construction method, expression search device and storage medium | |
Zeng et al. | Tag-assisted multimodal sentiment analysis under uncertain missing modalities | |
Li et al. | Cross-modal semantic communications | |
CN115455970A (en) | Image-text combined named entity recognition method for multi-modal semantic collaborative interaction | |
Sun et al. | Graph prompt learning: A comprehensive survey and beyond | |
Zang et al. | Multimodal icon annotation for mobile applications | |
CN113158798A (en) | Short video classification method based on multi-mode feature complete representation | |
Tajrobehkar et al. | Align R-CNN: A pairwise head network for visual relationship detection | |
Sudhakaran et al. | Gate-shift-fuse for video action recognition | |
Gao et al. | Generalized pyramid co-attention with learnable aggregation net for video question answering | |
CN117893948A (en) | Multi-mode emotion analysis method based on multi-granularity feature comparison and fusion framework | |
Jia et al. | Semantic association enhancement transformer with relative position for image captioning | |
CN112988959A (en) | False news interpretability detection system and method based on evidence inference network | |
Kou et al. | What and Why? Towards Duo Explainable Fauxtography Detection Under Constrained Supervision | |
Zhou et al. | Multi-modal multi-hop interaction network for dialogue response generation | |
Sharma et al. | Visual question answering model based on the fusion of multimodal features by a two-way co-attention mechanism | |
CN113010662B (en) | Hierarchical conversational machine reading understanding system and method | |
Xue et al. | A multi-modal fusion framework for continuous sign language recognition based on multi-layer self-attention mechanism | |
Zhou et al. | The State of the Art for Cross-Modal Retrieval: A Survey | |
CN113869518A (en) | Visual common sense reasoning method and device, electronic equipment and storage medium | |
Li et al. | Using artificial intelligence assisted learning technology on augmented reality-based manufacture workflow | |
Hu et al. | Overall-Distinctive GCN for Social Relation Recognition on Videos |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |