CN117591866B - Multi-mode false information detection method based on coercion theory - Google Patents

Multi-mode false information detection method based on coercion theory Download PDF

Info

Publication number
CN117591866B
CN117591866B CN202410057274.8A CN202410057274A CN117591866B CN 117591866 B CN117591866 B CN 117591866B CN 202410057274 A CN202410057274 A CN 202410057274A CN 117591866 B CN117591866 B CN 117591866B
Authority
CN
China
Prior art keywords
emotion
news
content
image
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410057274.8A
Other languages
Chinese (zh)
Other versions
CN117591866A (en
Inventor
袁璐
梁钰滢
程南昌
沈浩
石磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Communication University of China
Original Assignee
Communication University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Communication University of China filed Critical Communication University of China
Priority to CN202410057274.8A priority Critical patent/CN117591866B/en
Publication of CN117591866A publication Critical patent/CN117591866A/en
Application granted granted Critical
Publication of CN117591866B publication Critical patent/CN117591866B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Medical Informatics (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a method for detecting multi-mode false information based on co-emotion theory guidance, which is realized by a multi-mode false information detection model based on co-emotion theory guidance, wherein the model is based on two components of co-emotion: the cognition co-emotion and emotion co-emotion improve the accuracy of multi-mode false information detection by fusing the co-emotion theory; the cognitive co-emotion reasoning layer simulates the degree of cognitive co-emotion through comparing consistency of comments and news semantics; the emotion co-emotion reasoning layer simulates the generation degree of emotion co-emotion through similarity analysis of news text emotion and comment emotion; and finally, fusing the cognition co-emotion and the emotion co-emotion by using a design fusion reasoning layer to judge whether the co-emotion is generated for the news content, so that the accuracy of false information detection is improved.

Description

Multi-mode false information detection method based on coercion theory
Technical Field
The invention relates to the technical field of information detection in the technical field of artificial intelligence, in particular to a multi-mode false information detection method based on co-intelligence theory guidance.
Background
The transmission of false news through social media can form destructive threat to the sustainable development of society, and false information not only damages public trust and disturbs social order, but also can prevent the normal operation and progress of society. Most of the existing research on the false news detection method regards the false news detection as a classification task. Currently, from the viewpoint of methods, false news detection can be roughly classified into three types according to the main features used by the classification model: content-based detection methods, social environment-based detection methods, knowledge-based detection methods.
Among them, the content-based false news detection method aims at extracting various semantic features from news content and detecting the authenticity of news through these features. There are some language differences between the false news and the true news, and the false news can be detected by distinguishing the language style of the true and false news text. For example, false news is more subjective than real news, where the first person and the second person are more frequently used, and the false news contains more words (e.g., subject, overrun, and modal), which are used for exaggeration, while real news often uses specific (e.g., digital), objective (e.g., third person), and active words; the author writing style of false news may be more extreme than real news. However, by extracting a complete set of content features from the true news, including the total number of words, the length of the content, the number of capitalized words, special symbols, the number of beginning sentences, and the importance ranking of features listed by experiments on aggressiveness, it can be found that the total number of words, the length of the content, and the number of capitalized words have a great influence on the discrimination of the true news, and therefore, the abbreviations and the total number of words have a great influence on the detection result of the false news.
The method for detecting the false news based on the content can find the language characteristics of the true and false news, but the false news sometimes imitates the writing method of the true news intentionally and misleads readers. Since the content-based false news detection method cannot distinguish the feature difference between such false news and real news, in order to solve this problem, the social environment-based detection method makes full use of hidden information as auxiliary data, such as social background information and propagation paths in a social network. For example, shu et al discuss the relationship between user data and false news on social media and use the user's social engagement as an aid in false information detection. Shu et al propose a framework to simulate the ternary relationships between news publishers, news articles and users, extract valid features from the news publishers and readers' participation, and then capture interactions between them. Research shows that the effect of false news detection can be improved by utilizing social background information, and early effective prediction can be performed on the false news detection. Another research direction detects false news by simulating the propagation path of false news in a network. For example, monti et al found that the way of propagation was an important feature of false news, beyond news content, user data, and other aspects of social behavior. Raza et al propose a false news detection framework based on a Transformer architecture comprising an encoder and decoder part: the encoder section is used to learn a representation of the false news data and the decoder section is used to predict future behavior from past observations, which model exploits features of news content and social context to improve classification accuracy.
Knowledge (KB) based false news detection detects the authenticity of news by verifying false news and facts, and is therefore also referred to as a fact check. The fact checks can be divided into two categories: manual verification and automatic verification. The manual method uses domain expert knowledge or crowdsourcing method, has high precision but low efficiency, and cannot meet the requirement of big data age, so the automatic verification method using natural language processing and machine learning technology has become a popular field of research. The fact checking firstly needs to construct a knowledge base or a knowledge graph from the network through knowledge extraction, then compares and verifies the false news with the knowledge base or the knowledge graph, and judges the authenticity of the news. For example, pan et al use a knowledge graph to detect false news from news content, solving the problem of insufficient computational fact checking. Hu et al developed a heterogeneous graph annotation network to learn the context of news representations and encode the semantics of the news content by comparing the context entity representations with representations derived from a Knowledge Base (KB) using an entity comparison network to capture consistency between the news content and the knowledge base.
In the above-mentioned false information detection problem, researchers usually research directions based on news content, knowledge, social network, etc., and these detection methods based on the appearance information still have defects of easy avoidance by counterfeiters and low false information detection accuracy.
Disclosure of Invention
In view of the problems of poor detection accuracy and the like in the existing false information detection field, the invention aims to provide a multi-mode false information detection method based on co-emotion theory guidance, which utilizes the co-emotion theory to explain the characteristics and modes of false information so as to improve the accuracy of false information detection.
On the one hand, the invention provides a method for detecting multi-mode false information based on co-intelligence guidance, which is realized based on a preset multi-mode false information detection model based on co-intelligence guidance and comprises the following steps:
S110: extracting characteristics of news content and comment content to obtain image semantic characteristics and text semantic characteristics in the news content and the comment content;
S120: based on the image semantic features and the text semantic features, simulating the cognitive co-emotion generation degree and the emotion co-emotion generation degree of the news content and the comment content through cognitive co-emotion reasoning and emotion co-emotion reasoning respectively;
S130: fusing the cognition co-emotion generation degree and the emotion co-emotion generation degree, and judging whether the news content and the comment content generate co-emotion for false news or not based on the fused result;
The method for fusing the cognitive co-emotion generation degree and the emotion co-emotion generation degree comprises the following steps:
the interaction between the cognitive co-emotion and the emotion co-emotion is simulated through the cooperative attention block, and the input of the cooperative attention block is that :
Wherein,Indicating the degree of cognitive co-morbid-Representing the degree of emotional co-emotion,/>And/>Normalization method and feed forward network, respectively,/>Representing variance,/>Is an intermediate value,/>Is the fusion semantics of the interaction block aiming at cognitive co-emotion and emotion co-emotionThe function is used for integrating comprehensive fusion characterization formed by the cognitive cosmopathy and the emotion cosmopathy;
and classifying and outputting through a softmax function to improve the accuracy of false news detection.
The method for extracting the characteristics of the news content and the comment content comprises the following steps:
extracting image semantic features of the image information in the news content and the comment content by adopting a preset ResNet152,152 residual error network; and
And extracting text semantic features of the text information in the news content and the comment content by adopting a preset BERT.
The method for extracting the image semantic features of the image information in the news content and the comment content by adopting a preset ResNet152,152 residual network comprises the following steps:
Adjusting the images in the news content and the comment content to a preset size to obtain standard image information;
Image semantic features in the canonical image information are obtained using ResNet152,152 residual networks.
The optional solution is that the acquiring, using ResNet152,152 residual networks, the image semantic features in the canonical image information includes:
Separating the last fully-connected layer and obtaining the output of the last convolutional layer as a coded image representation Wherein, the method comprises the steps of, wherein,
Wherein each ofIs a 2048-dimensional vector representing a region on the image; image I is represented as:
representing the encoded image by linear transformation The projection is: /(I)
Wherein,Is a trainable parameter,/>Is an encoded representation of the image content.
Before the step S120, the method further includes a comment screening step, where a difference between each comment and other comment brackets is calculated to select a most representative preset number of comments as a reference for performing cognitive co-emotion reasoning and emotion co-emotion reasoning.
The optional scheme is that the cognitive co-emotion reasoning comprises: coding text information and image information in the news content and comment content based on a preset self-attention network; inputting the coded features into a preset cross attention block for semantic interaction.
The self-attention network applies a multi-head self-attention mechanism to respectively learn the global dependency relationship of the text semantic features and the image semantic features extracted by the feature extraction layer at all positions; wherein,
Given query Q, key K, and value V, the formula for scaling dot product attention is:
Wherein, Representing variance,/>Representing a transpose of the key K; setting/>, in text content/>, In image content; Projecting the query Q, the key K and the value V for h times through different linear projections, and then parallelly executing the zoomed dot product attention of the result after the projection for h times; multi-headed attention is formally described as:
Wherein, H=d/H; w 0 is a learnable parameter,/>Conc (·) is a series operation, o=/>And O=/>Respectively, coded text semantic features and image semantic features.
Wherein, alternatively, a global dependency relationship between text and image is captured based on a cross-attention mechanism, wherein,
Semantic features of the encoded textSetting as a query, and carrying out/> -on the encoded image semantic featuresIs set as a key and a value, the multi-mode false information detection model based on the coercion is guided by the text semantic features to pay attention to the consistent image area,
Wherein, all W are trainable parameters,For semantic agreement from news text to images.
Wherein, alternatively, the Softmax function is used to issue a prediction of task learning probability distribution, wherein global loss forces the preset co-intelligence guidance-based multi-modal false information detection model to minimize cross entropy error of training samples with real labels y:
Wherein p is the calculated softmax value, W p and Y is a true value for the learnable parameter.
According to the technical scheme, the multi-mode false information detection method based on the co-emotion theory is used for explaining the characteristics and modes of false information, and based on a deep learning algorithm, the co-emotion theory is fused, so that the accuracy of multi-mode false information detection is improved, and the network environment is maintained. The invention is based on two components of the co-situation: the cognitive co-emotion and emotion co-emotion design and realize a multi-mode false information detection method guided by a co-emotion theory, and the degree of cognitive co-emotion is simulated by comparing consistency of comments and news semantics through a cognitive co-emotion reasoning layer; through the emotion co-emotion reasoning layer, similarity analysis is carried out on news text emotion and comment emotion, and the emotion co-emotion generation degree is simulated; and finally, judging whether the co-emotion is generated for the news content by fusing the cognition co-emotion and the emotion co-emotion through a design fusion reasoning layer, so that the accuracy of false information detection is improved.
Drawings
Other objects and attainments together with a more complete understanding of the invention will become apparent and appreciated by referring to the following description taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 is a schematic flow chart of a method for detecting multi-modal false information based on co-intelligence guidance according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a logic structure of a multi-modal spurious information detection model based on co-intelligence guidance according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a logic framework of a feature extraction layer according to an embodiment of the invention;
FIG. 4 is a schematic diagram of a logic framework of a cognitive co-emotion inference layer according to an embodiment of the present invention;
fig. 5 is a similarity calculation example according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a logical framework of an emotion co-emotion inference layer according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of inputting a vector corresponding to < cls > into a full connection layer classification according to an embodiment of the invention;
FIG. 8 is a schematic diagram of a logical framework for fusing inference layers in accordance with an embodiment of the present invention;
Fig. 9 is a schematic diagram of an electronic device according to an embodiment of the invention.
Detailed Description
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be evident, however, that such embodiment(s) may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing one or more embodiments.
For a plurality of problems existing in the existing false information detection method, the invention provides a solution idea integrating psychology or human cognition, and the authenticity and the credibility of the information are comprehensively evaluated by utilizing the psychology principle so as to improve the accuracy of false information detection.
In order to better explain the technical scheme of the invention, the following will briefly explain some technical terms related to the invention.
BERT: bidirectional Encoder Representation from Transformers, one of the most popular NLP models at present, is assembled from a plurality of Transformer Encoder stacked layer by layer, with a transducer as the core.
ResNet (Residual Neural Network) 152,152: a residual network of depth 152 layers.
Scaling dot product attention (Scaled Dot-Product Attention), which is an attention mechanism, is commonly used in transducer models. Scaled Dot-Product Attention are used in the transducer model to implement Multi-Head Attention. Specifically, multi-Head Attention carries out linear transformation of multiple heads on an input matrix respectively, then Scaled Dot-Product Attention are calculated on transformation results of each Head respectively, and finally the Attention results of each Head are spliced together and output through one linear transformation.
In order to solve the problem that in the existing false information detection scheme, the detection accuracy rate is low in the false information detection from the appearance information based on news content, knowledge, social network and other directions, the invention provides a common-emotion theory-based multi-mode false information detection method, which is based on the common-emotion theory design and realizes a pre-set common-emotion theory-based multi-mode false information detection model (hereinafter also referred to as a model) for false news detection.
Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
It should be noted that the following description of the exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. Techniques and equipment known to those of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate.
In order to illustrate the method for detecting multi-mode false information based on the co-intelligence guidance provided by the invention, fig. 1 and fig. 2 respectively show exemplary flows of the method for detecting multi-mode false information based on the co-intelligence guidance and a model structure adopted by the method. It should be noted that the drawings and the embodiments in the following description are only some implementations of the present invention, and are not limiting. Other figures and implementations may be made by those of ordinary skill in the art without undue burden from these figures and implementations.
Referring to fig. 1, the method for detecting multi-mode false information based on co-intelligence guidance provided in this embodiment is implemented based on a preset multi-mode false information detection model based on co-intelligence guidance, and mainly includes the following steps:
S110: extracting characteristics of news content and comment content to obtain image semantic characteristics and text semantic characteristics in the news content and the comment content;
S120: based on the image semantic features and the text semantic features, simulating the cognitive co-emotion generation degree and the emotion co-emotion generation degree of the news content and the comment content through cognitive co-emotion reasoning and emotion co-emotion reasoning respectively;
s130: and fusing the cognitive co-emotion generation degree and the emotion co-emotion generation degree, and judging whether the news content and the comment content generate co-emotion for false news or not based on the fused result.
According to the steps, the invention mainly utilizes the co-intelligence theory to explain the characteristics and modes of the false information to realize multi-mode false information detection based on the co-intelligence theory guidance, so that the technical problems in the prior art are solved.
The above steps will be described in detail in connection with the network structure employed in the present invention.
In order to realize the method for detecting the multi-mode false information based on the co-intelligence theory, the invention designs a multi-mode false information detection model based on the co-intelligence theory shown in figure 2. The above-mentioned multi-mode false information detection method based on the co-intelligence guidance provided by the invention will be described in more detail with reference to specific model structure embodiments.
Specifically, as an example, fig. 2 shows an overall framework structure diagram of a multi-modal spurious information detection model based on co-intelligence guidance according to an embodiment of the present invention. As shown in fig. 2, the multi-modal false information detection model 200 based on the co-emotion theory guidance provided in this embodiment mainly includes four parts, namely a feature extraction layer 210, a cognitive co-emotion inference layer 220, an emotion co-emotion inference layer 230 and a fusion inference layer 240. The feature extraction layer 210 is mainly used for extracting features of news content and comment content; the cognitive co-emotion inference layer 220 is used for calculating the cognitive co-emotion degree generated by the reader, the emotion co-emotion inference layer 230 is used for calculating the emotion co-emotion degree generated by the reader, and finally the fusion inference layer 240 is used for fusing the cognitive co-emotion and the emotion co-emotion to judge whether the co-emotion is generated for the false news or not, so that the accuracy of false information detection is improved.
Fig. 3 to 7 show the logical frameworks of the four parts of the feature extraction layer 210, the cognitive co-emotion inference layer 220, the emotion co-emotion inference layer 230 and the fusion inference layer 240, respectively, according to an embodiment of the present invention.
Wherein the logical framework of the feature extraction layer 210 is shown in fig. 3.
Due to the variety of information, image information detection and text information detection are included in the false information detection at present. Thus, the types of features common in this class of problems are mainly two: one is the image semantic feature; the other is text semantic features. Thus, in the embodiment shown in fig. 3, at the feature extraction layer 210, the ResNet residual network is used to perform feature extraction on the semantic features of the images in the news content and the comment content, and the BERT is used to perform feature extraction on the semantic features of the texts in the news content and the comment content, where the semantic features of the texts include news texts and comments (excluding images) of the news.
In the process of extracting the semantic features of the image, for the encoding of the image processing, considering the different sizes of the multimedia news images in the social media, in order to reduce the dependence of the model on certain attributes in the images, in one specific embodiment of the invention, the sizes of the images in the news content and the comment content are firstly adjusted to 224×224 pixels, and then the image semantic features are obtained by using ResNet152 residual networks.
Specifically, as an example, the last Full Connection (FC) layer is split and the output of the last convolutional layer is obtained:
(1)
wherein each r i is a 2048-dimensional vector representing a region on the image, so that image I can be represented as ,/>Representing a vector dimension space. In order for the visual features to achieve the same dimensions as the text semantic features, in one embodiment of the invention, the encoded image is represented/>, by linear transformationThe projection is:
(2)
Wherein, Is a trainable parameter. /(I)Is the encoded representation of the image content, i.e. the extracted image semantic features, d is the hidden size of the BERT.
In text semantic feature extraction, a text content sequence is encoded as E X= {x1, x2, ..., xN, whereIs a representation of the ith marker of the pre-trained BERT code, N is the length of the text content sequence. Coding sequence/>The output of the last-layer BERT encoder is the encoded representation of the text content, i.e., the extracted text semantic features, and d is the hidden size of BERT.
Since the text semantic features contain news text and comment content, it is known how many comments are under each news for comments, and each comment is typically a sequence of text. Thus, the representation of each comment is the same as the representation of the news text, represented by BERT, i.eWherein/>Is the length of the ith comment. In order to select the most representative comment, in one embodiment of the present invention, a comment selection mechanism (SELECTED MECHANISM) is designed to extract the previous K (the value of K may be defined according to the specific detection requirement) as the most representative comment for the use of the later cognitive co-emotion inference layer 220 and the emotion co-emotion inference layer 230 to perform co-emotion inference, and the comment selection mechanism calculates the difference between each comment and other comments in an automatic manner. For this reason, the comment selection mechanism of the present embodiment optimizes an inter-sequence attention matrix/>Where N is the number of reviews under a piece of news and R represents the vector dimension space.
The entry (m, N) of the inter-sequence attention matrix U is used to hold the difference between annotation m and annotation N (1.ltoreq.m, n.ltoreq.N, m.noteq.n), and can be formalized as:
(3)
(4)
(5)
Wherein, Is an activation function. All W and b are trainable parameters, +.. Therefore, in the present invention, top-K representative comments having a high variance C K are finally selected as comments for cognitive co-emotion reasoning and emotion co-emotion reasoning.
Fig. 4 illustrates a logical structure of the cognitive co-emotion inference layer 220 according to an embodiment of the present invention. As shown in conjunction with fig. 2 and fig. 4, the image semantic features and the text semantic features extracted by the feature extraction layer 210 are input into the cognitive co-condition reasoning layer 220 to perform cognitive co-condition reasoning.
Cognitive co-emotion refers to the process by which individuals can understand and share the ideas of others. This ability enables one to build deeper emotional links and better understand one's feelings and standpoint. The online de-throttling effect indicates that online users express themselves more publicly in a less restrictive network environment (Suler, 2004). Cognitive co-morbidities are focused on understanding that on social media, when users touch a piece of news that they can or cannot understand, some people may honest express their own views through comments and other behaviors. Standing on the machine perspective, the degree of reader understanding of text can be quantified as the semantic similarity of comments to news content.
When people see a news, people usually see an attractive image first and then see text contents in detail with image information, and the process can be abstracted into multi-modal interaction, so that consistent information between the text contents and the image contents can be learned. To this end, cross-modal alignment is designed in the present invention to simulate this process in news. Specifically, as an example, the text semantic features and the image semantic features extracted by the feature extraction layer are firstly encoded by means of the self-attention network, and then the encoded features are input into a designed cross-attention module for semantic interaction.
In a specific embodiment of the invention, a multi-headed self-attention mechanism is applied to learn all global dependencies of text semantic features and image semantic features extracted by the feature extraction layer at all locations, respectively. Given a query Q (representing the content that the current location needs to be focused on), a key K (representing the content of all locations for comparison with the query to calculate the attention weight), and a value V (for weighted averaging of the content of different locations according to the attention weight), the formula for scaling the dot product attention is:
(6)
Wherein, Representing variance,/>Representing a transpose of the key K; setting/>, in text content. To capture more valuable information from news in parallel, the query Q, key K, value V are projected h times through different linear projections in this embodiment, and then these results are parallel-performing scaled dot product attention. Formally, multiple head attention/>Is described as:
(7)
(8)
Wherein, ,/> ,/>(All/>Where h=d/H), respectively represent trainable parameters of query Q, key K, value V; w 0 is a learnable parameter,/>Conc (·) is a series operation, o=/>And O=/>Is the encoded text semantic features and image semantic features output from the attention network.
Inputting the coded text semantic features and the image semantic features into a preset Cross attention module (Cross-Attn) for semantic interaction so as to capture the global dependency relationship between the coded text semantic features and the image semantic features.
Cross-attention blocks are variants of standard multi-headed attention blocks that can capture global dependencies between text and images. In the co-intelligence-guided multimodal false information detection model of the present invention, a cross-attention module was developed from a text-image perspective. Specifically, the text-image cross attention block in this embodiment (as shown in equation (9) and equation (10)) will encode text semantic featuresSet as query Q, semantically characterizing the encoded image/>Set to key K and value V. In this way, text semantic features can guide the co-morbid-guided multimodal false information detection model to focus on consistent image regions.
(9)
(10)
Wherein,, />,/>(All/>Where h=d/H) are trainable parameters querying Q, key K, value V, respectively; w 0 is a learnable parameter,/>,/>For semantic consistency from news text to images.
To find the audience's opinion of the authenticity of the information of different modalities of the unverified news, the condition of news and comment semantic consistency (namely, cognition co-emotion) is explored, and after the global dependency relationship between the coded text semantic features and the image semantic features is captured through the cross attention module, similarity analysis (Similarity-analysis) is further carried out on the comment semantics and the fused multi-modality news semantics so as to simulate the degree of cognition co-emotion generation. The integrated multi-modal news semantics are the embodiment of global dependency relationship between the coded text semantic features and the image semantic features, and the comment semantics and the integrated multi-modal semantics are compared in consistency, so that the cognitive public condition can be judged. Specifically, as an example, the cognitive co-emotion degree of news content and comment content is finally obtained by calculating cosine similarity between them. Fig. 5 is a similarity calculation example according to an embodiment of the present invention.
As shown in fig. 5, a represents a vector, (x 1, y 1) represents a vector a, b represents a vector, (x 2, y 2) represents a vector b, θ is an angle between a and b, and the similarity analysis calculation result:
(11)
The similarity analysis can account for the similarity between the fused multimodal news semantics and comment semantics.
Fig. 6 illustrates a logical structure of the emotion co-emotion inference layer 230 according to an embodiment of the present invention. As shown in conjunction with fig. 6 and fig. 4, the text semantic features extracted by the feature extraction layer 210 and top-K representative comments with high difference C K selected by the comment selection mechanism are input to the emotion co-emotion inference layer 230 to perform emotion co-emotion inference.
Text semantic features are often preferred over non-text semantic features in emotion analysis or emotion recognition tasks, in part because text semantic features originate from advanced language models or word embeddings trained on massive data sources, while image semantic features are artificially designed and relatively underdeveloped. Therefore, in the present invention, the degree of emotion co-emotion generation is simulated only by the similarity between news text emotion and comment emotion.
BERT, one of the pre-trained natural language processing models, may also be used for emotion analysis tasks. The BERT model is powerful in that it is pre-trained on a large-scale text corpus, so that a rich text representation is learned, making it excellent in various natural language processing tasks, including emotion analysis. The BERT converts the specific NLP tasks processed by the universal model into agnostic NLP tasks under the BERT model, and a user can perform fine-tuning (fine-tune) operation under the BERT pre-training model to enable the user to process various NLP tasks, such as: single text classification (e.g., emotion analysis), text classification (e.g., natural language inference), question-answering, text marking (e.g., named entity recognition), etc. Thus, in one embodiment of the present invention, the BERT model is used for emotion analysis.
According to the emotion analysis method, text classification is performed by using the BERT model, namely the task of the emotion co-emotion inference layer 230, so that classification tasks can be realized by only obtaining the context comprehensive information ('< cls >') of the whole comment sentence and connecting with a full-connection layer.
In this embodiment, the base model bert_base, uncased is used by default, a 12-layer transform encoder block is used, 768 hidden units and 12 self-attention headers. FIG. 7 is a diagram of inputting a vector corresponding to < cls > into a full connection layer classification according to an embodiment of the invention. As shown in fig. 7, the emotion analysis (text classification) effect can be completed by extracting the comprehensive context information '< cls >' from the output information of BERT and connecting a full-connection layer.
After the news text emotion and the comment emotion are respectively calculated by the calculation mode, the similarity of the two emotions is calculated by a formula (11), and the degree of emotion co-emotion generation can be simulated.
After the cognition co-emotion degree and the emotion co-emotion degree are obtained, fusion analysis can be carried out on the cognition co-emotion degree and the emotion co-emotion degree so as to judge whether co-emotion is generated for false news or not, and further judge whether information containing news content and comment content is true or false.
Fig. 8 illustrates a logical structure of the fused inference layer 240 according to an embodiment of the present invention. As shown in fig. 2 and 8 together, the cognitive co-emotion degree obtained through the cognitive co-emotion inference layer 220 and the emotion co-emotion degree obtained through the emotion co-emotion inference layer 230 are input to the fusion inference layer 240 for co-emotion fusion.
To explore the relationship between the two components of a co-emotion (cognitive co-emotion and affective co-emotion), in one embodiment of the invention, a collaborative attention block is designed to simulate interactions between the two co-emotions. Wherein the input of the collaborative attention block (Co-Attn) is < E T, V >:
(12)
(13)
(14)
(16)
(15)
wherein E T represents the degree of cognitive co-emotion and V represents the degree of emotion co-emotion; norms and FFNs are normalization methods and feed forward networks, Representing variance,/>Is an intermediate value,/>The fusion semantics of the interaction block aiming at cognitive co-emotion and emotion co-emotion are provided. The concat function is utilized to integrate two co-conditions to form a comprehensive fusion characterization.
Finally, classifying and outputting through a softmax function to improve the accuracy of false news detection. Wherein the Softmax function issues predictions of task learning probability distributions, where global loss forces a co-intelligence-guided multi-modal false information detection model to minimize cross entropy errors of training samples with real labels y:
(16)
(17)
Wherein p is the calculated softmax value, W p and Y is a true value for the learnable parameter.
As shown in fig. 9, the present invention also provides an electronic device, including:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores a computer program executable by at least one processor to enable the at least one processor to perform the steps of the co-morbid-theory-based multi-modal spurious information detection method described above.
It will be appreciated by those skilled in the art that the structure shown in fig. 9 is not limiting of the electronic device 1 and may include fewer or more components than shown, or may combine certain components, or may be arranged in different components.
For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.
Further, the electronic device 1 may also comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.
The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The co-morbid-oriented multi-modal spurious information detection program 12 stored by the memory 11 in the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
S110: extracting characteristics of news content and comment content to obtain image semantic characteristics and text semantic characteristics in the news content and the comment content;
S120: based on the image semantic features and the text semantic features, simulating the cognitive co-emotion generation degree and the emotion co-emotion generation degree of the news content and the comment content through cognitive co-emotion reasoning and emotion co-emotion reasoning respectively;
s130: and fusing the cognitive co-emotion generation degree and the emotion co-emotion generation degree, and judging whether the news content and the comment content generate co-emotion for false news or not based on the fused result.
Specifically, the specific implementation method of the above instructions by the processor 10 may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.
Further, the modules/units integrated in the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).
The method and system for detecting multi-modal false information based on co-intelligence guidance according to the present invention are described above by way of example with reference to the accompanying drawings. It will be appreciated by those skilled in the art that various modifications may be made to the co-intelligence-guided multi-modal spurious information detection method and system set forth above without departing from the teachings of the present invention. Accordingly, the scope of the invention should be determined from the following claims.

Claims (9)

1. The method for detecting the multi-mode false information based on the co-intelligence guidance is characterized by being realized based on a preset multi-mode false information detection model based on the co-intelligence guidance and comprising the following steps of:
S110: extracting characteristics of news content and comment content to obtain image semantic characteristics and text semantic characteristics in the news content and the comment content;
s120: based on the image semantic features and the text semantic features, simulating the cognitive co-emotion generation degree and the emotion co-emotion generation degree of the news content and the comment content through cognitive co-emotion reasoning and emotion co-emotion reasoning respectively; wherein, the cognitive co-condition is generated to the extent that: after capturing global dependency relationships between the coded text semantic features and the image semantic features through the cross attention module, similarity analysis is further carried out on comment semantics and the fused multi-mode news semantics so as to simulate the degree of cognitive co-occurrence; the emotion co-emotion generation degree is as follows: the method comprises the steps of carrying out text classification on news texts and comments by using a BERT model to obtain news text emotion and comment emotion, and simulating emotion co-emotion generation degree through similarity between the news text emotion and the comment emotion;
S130: fusing the cognition co-emotion generation degree and the emotion co-emotion generation degree, and judging whether the news content and the comment content generate co-emotion for false news or not based on the fused result;
The method for fusing the cognitive co-emotion generation degree and the emotion co-emotion generation degree comprises the following steps:
the interaction between the cognitive co-emotion and the emotion co-emotion is simulated through the cooperative attention block, and the input of the cooperative attention block is that :
Wherein,Indicating the degree of cognitive co-morbid-Representing the degree of emotional co-emotion,/>And/>Normalization method and feed forward network, respectively,/>Representing variance,/>Is an intermediate value,/>Is the fusion semantics of the interaction block aiming at cognitive co-emotion and emotion co-emotionThe function is used for integrating comprehensive fusion characterization formed by the cognitive cosmopathy and the emotion cosmopathy;
and classifying and outputting through a softmax function to improve the accuracy of false news detection.
2. The method for detecting multi-modal spurious information based on co-intelligence guidance as in claim 1 wherein the extracting features of news content and comment content comprises:
extracting image semantic features of the image information in the news content and the comment content by adopting a preset ResNet152,152 residual error network; and
And extracting text semantic features of the text information in the news content and the comment content by adopting a preset BERT.
3. The method for detecting multi-modal false information based on co-intelligence guidance according to claim 2, wherein the image semantic feature extraction of the image information in the news content and the comment content using a preset ResNet residual network comprises:
Adjusting the images in the news content and the comment content to a preset size to obtain standard image information;
Image semantic features in the canonical image information are obtained using ResNet152,152 residual networks.
4. The method for detecting multi-modal false information based on co-intelligence guidance as claimed in claim 3, wherein the obtaining the image semantic features in the canonical image information using ResNet includes:
Separating the last fully-connected layer and obtaining the output of the last convolutional layer as a coded image representation Wherein, the method comprises the steps of, wherein,
; Wherein each/>Is a 2048-dimensional vector representing a region on the image; image I is represented as: /(I)
Representing the encoded image by linear transformationThe projection is: /(I)
Wherein,Is a trainable parameter,/>Is an encoded representation of the image content.
5. The method for detecting multi-modal false information based on co-emotion theory guidance according to claim 4, further comprising a comment screening step for selecting a most representative preset number of comments as references for performing cognitive co-emotion reasoning and emotion co-emotion reasoning by calculating differences between each comment and other comment brackets, prior to the step S120.
6. The method for detecting multi-modal false information based on co-morbid theory according to claim 5, wherein the cognitive co-morbid reasoning includes:
Coding text information and image information in the news content and comment content based on a preset self-attention network;
Inputting the coded features into a preset cross attention module for semantic interaction.
7. The method for detecting multi-modal false information based on co-intelligence guidance as claimed in claim 6, wherein the self-attention network applies a multi-head self-attention mechanism to learn the global dependency of the text semantic features and the image semantic features extracted by the feature extraction layer at all positions, respectively; wherein,
Given query Q, key K, and value V, the formula for scaling dot product attention is:
Wherein, Representing variance,/>Representing a transpose of the key K; setting/>, in text content/>, In image content; Projecting the query Q, the key K and the value V for h times through different linear projections, and then parallelly executing the zoomed dot product attention of the result after the projection for h times; multi-headed attention is formally described as:
Wherein, H=d/H; w 0 is a learnable parameter,/>Conc (·) is a series operation, o=/>And O=/>Respectively, coded text semantic features and image semantic features.
8. The co-morbid, theory-based, multi-modal spurious information detection method of claim 7, wherein a global dependency relationship between text and images is captured based on a cross-attention mechanism, wherein,
Semantic features of the encoded textSetting as a query, and carrying out/> -on the encoded image semantic featuresIs set as a key and a value, the multi-mode false information detection model based on the coercion is guided by the text semantic features to pay attention to the consistent image area,
Wherein, all W are trainable parameters,For semantic agreement from news text to images.
9. The co-intelligence-based guided multi-modal false information detection method of claim 8, wherein the Softmax function is used to issue predictions of task learning probability distributions, wherein global loss forces the preset co-intelligence-based guided multi-modal false information detection model to minimize cross entropy errors of training samples with real labels y:
wherein p is the calculated softmax value, W p and Y is a true value for the learnable parameter.
CN202410057274.8A 2024-01-16 2024-01-16 Multi-mode false information detection method based on coercion theory Active CN117591866B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410057274.8A CN117591866B (en) 2024-01-16 2024-01-16 Multi-mode false information detection method based on coercion theory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410057274.8A CN117591866B (en) 2024-01-16 2024-01-16 Multi-mode false information detection method based on coercion theory

Publications (2)

Publication Number Publication Date
CN117591866A CN117591866A (en) 2024-02-23
CN117591866B true CN117591866B (en) 2024-05-07

Family

ID=89920394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410057274.8A Active CN117591866B (en) 2024-01-16 2024-01-16 Multi-mode false information detection method based on coercion theory

Country Status (1)

Country Link
CN (1) CN117591866B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116150334A (en) * 2022-12-12 2023-05-23 江汉大学 Chinese co-emotion sentence training method and system based on UniLM model and Copy mechanism
CN116910238A (en) * 2023-02-21 2023-10-20 南开大学 Knowledge perception false news detection method based on twin network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11960844B2 (en) * 2017-05-10 2024-04-16 Oracle International Corporation Discourse parsing using semantic and syntactic relations

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116150334A (en) * 2022-12-12 2023-05-23 江汉大学 Chinese co-emotion sentence training method and system based on UniLM model and Copy mechanism
CN116910238A (en) * 2023-02-21 2023-10-20 南开大学 Knowledge perception false news detection method based on twin network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"Multimodal fusion with recurrent neural networks for rumor detection on microblogs";Zhiwei Jin 等;《Multimedia Conference2017》;20171019;全文 *
"基于语义融合和多重相似性学习的跨模态检索";曾奕斌 等;《计算机与现代化》;20220831;全文 *
"教育舆情研判的影响因素及对策分析";申金霞 等;《高教探索》;20200229;全文 *
"用户语音数据情感分析研究";耿佳宁;《中国优秀硕士学位论文全文数据库 信息科技辑》;20210815;全文 *
"网络恐怖主义新动向及其治理分析";佘硕 等;《情报杂志》;20180228;第37卷(第2期);全文 *

Also Published As

Publication number Publication date
CN117591866A (en) 2024-02-23

Similar Documents

Publication Publication Date Title
Gandhi et al. Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions
Pham et al. Found in translation: Learning robust joint representations by cyclic translations between modalities
Yu et al. Multiple level hierarchical network-based clause selection for emotion cause extraction
CN113420807A (en) Multi-mode fusion emotion recognition system and method based on multi-task learning and attention mechanism and experimental evaluation method
Han et al. A review on sentiment discovery and analysis of educational big‐data
Xiao et al. Dense semantic embedding network for image captioning
CN111814454B (en) Multi-mode network spoofing detection model on social network
Yang et al. Image captioning by incorporating affective concepts learned from both visual and textual components
Beinborn et al. Multimodal grounding for language processing
Ji et al. Divergent-convergent attention for image captioning
Liu et al. Fact-based visual question answering via dual-process system
CN115310551A (en) Text analysis model training method and device, electronic equipment and storage medium
Liu et al. Dual-feature-embeddings-based semi-supervised learning for cognitive engagement classification in online course discussions
CN115270807A (en) Method, device and equipment for judging emotional tendency of network user and storage medium
Chaudhary et al. Signnet ii: A transformer-based two-way sign language translation model
Park et al. Survey and challenges of story generation models-A multimodal perspective with five steps: Data embedding, topic modeling, storyline generation, draft story generation, and story evaluation
CN117591866B (en) Multi-mode false information detection method based on coercion theory
CN114386412B (en) Multi-mode named entity recognition method based on uncertainty perception
CN117851894A (en) Multimode false information detection system based on coercion
CN117972497B (en) False information detection method and system based on multi-view feature decomposition
Corizzo et al. One-GPT: A one-class deep fusion model for machine-generated text detection
Aafaq Deep learning for Natural Language Description of Videos
CN116089618B (en) Drawing meaning network text classification model integrating ternary loss and label embedding
Liang Analysis of Emotional Deconstruction and the Role of Emotional Value for Learners in Animation Works Based on Digital Multimedia Technology
Li et al. An image retrieval method based on semantic matching with multiple positional representations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant