CN114528912A - False news detection method and system based on progressive multi-mode converged network - Google Patents

False news detection method and system based on progressive multi-mode converged network Download PDF

Info

Publication number
CN114528912A
CN114528912A CN202210021501.2A CN202210021501A CN114528912A CN 114528912 A CN114528912 A CN 114528912A CN 202210021501 A CN202210021501 A CN 202210021501A CN 114528912 A CN114528912 A CN 114528912A
Authority
CN
China
Prior art keywords
feature
level
fusion
encoder
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210021501.2A
Other languages
Chinese (zh)
Inventor
敬静
吴泓辰
孙杰
房晓畅
张化祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN202210021501.2A priority Critical patent/CN114528912A/en
Publication of CN114528912A publication Critical patent/CN114528912A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a false news detection method and a system based on a progressive multi-mode converged network, wherein the method comprises the following steps: acquiring news data to be detected, wherein the news data comprises image information and text information; detecting news data to be detected based on a pre-trained false news detection model; wherein the false news detection model comprises: the visual feature encoder comprises n-level visual feature extraction blocks which are sequentially connected; a feature fusion device: the system comprises n-level feature fusion blocks which are sequentially connected and a text feature encoder: the output end is connected to the level 1 characteristic fusion block; and the output ends of the ith level visual feature block are connected to the ith level feature fusion block, i is less than n, and the output ends of the nth level visual feature extraction block and the nth-1 level feature fusion block are connected to the nth level feature fusion block. The invention realizes fine-grained multi-mode information fusion and improves the detection precision by a progressive fusion method.

Description

False news detection method and system based on progressive multi-mode converged network
Technical Field
The invention belongs to the technical field of machine learning, and particularly relates to a false news detection method and system based on a progressive multi-mode converged network.
Background
With the rapid development of mobile internet technology, social media are as follows: social applications such as twitter and microblog become important channels for people to acquire massive information, and people can easily release and spread false news on social media. Moreover, the articles with pictures are more and more popular in social media, and compared with the articles with pure words, the pictures have richer information and can attract the attention of readers. False news often has misleading or tampered pictures combined with text. Therefore, the visual content has become a non-negligible important component in the false news detection, and therefore, there is a need to provide a method for automatically detecting the false news to detect the authenticity of the article with the picture, so as to alleviate the serious negative effect caused by the false news.
In recent years, methods for detecting false information are diversified, and one method is manual fact checking, including two methods, namely expert fact checking and crowd-sourced fact checking. The fact checking accuracy of experts is high, but the time and the labor are wasted; the crowd-sourced fact verification is strong in expandability, but not high in verification accuracy. Due to the limitation of the manual fact checking method, some researchers manually extract features from news text contents by using expert knowledge and then train a false news classifier by using a traditional machine learning algorithm, but the method lacks comprehensiveness and flexibility. The existing deep learning model has stronger feature extraction capability, can automatically extract news features from news contents, and obtains better performance.
As false news is more diversified, the authenticity of the article with the picture provides higher requirements and challenges for a false information detection technology, and some deep learning-based methods have been successfully applied to multi-modal false news detection. First, some models such as khottar et al simply extract and fuse features of text and pictures using a multi-modal variational encoder, but they are not fine enough in feature extraction and feature fusion. Second, Jin et al created an end-to-end network, using a RNN designed false news detection model that uses a local attention mechanism in combination with text images and social background features, Wang et al created an event countermeasure neural network (EANN) that uses event discriminators to learn the feature representations of text and images in articles, but adding additional assist features increases the cost of detection. In addition, these methods only consider the spatial domain of the picture, do not consider the frequency domain of the picture, and do not capture the picture information sufficiently. Thirdly, Wu et al propose Multimodal Co-Attentionnetworks (MCAN) for false information detection, and MCAN can learn the interdependence among Multimodal features, and obtain a good effect on false information detection, but MCAN only focuses on fusion of deep-level features.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a false news detection method and system based on a progressive multi-mode converged network. By means of the progressive fusion method, fine-grained multi-mode information fusion is achieved, and detection precision is improved.
In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
a false news detection method based on a progressive multi-mode converged network comprises the following steps:
acquiring news data to be detected, wherein the news data comprises image information and text information;
detecting the news data to be detected based on a pre-trained false news detection model; the false news detection model comprises a text feature encoder, a visual feature encoder, a feature fusion device and a classifier;
the visual feature encoder comprises n-level visual feature extraction blocks which are sequentially connected, the feature fusion device comprises n-level feature fusion blocks which are sequentially connected, and the output end of the text feature encoder is connected to the 1 st-level feature fusion block; the output ends of the ith-level visual feature blocks are connected to the ith-level feature fusion block, and i is less than n; and the output ends of the nth level visual feature extraction block and the nth-1 level feature fusion block are connected to the nth level feature fusion block.
Further, the visual feature encoder includes a spatial domain feature encoder and a frequency domain feature encoder.
Further, after news data to be detected are obtained, image segmentation is carried out on image information in the news data to be detected, and a plurality of non-overlapped patches with the size of k multiplied by k are obtained; unfolding each patch, extracting R, G, B three components to obtain a feature vector with the size of k multiplied by 3, and inputting the feature vector into a spatial domain feature encoder through a linear embedding layer;
in the spatial domain feature encoder, each next-level visual feature extraction block performs downsampling and channel expansion on a feature map obtained by the previous-level visual feature extraction block.
Further, after news data to be detected are obtained, discrete Fourier transform is carried out on image information in the news data to be detected to obtain frequency domain information; and separating and connecting the imaginary part and the real part of the frequency domain information as the input of the frequency domain characteristic encoder.
Further, the text feature encoder adopts a bidirectional Transformer pre-training model to extract features.
Further, the 1 st level feature fusion block performs feature fusion on the obtained spatial domain visual features, frequency domain visual features and text features by using a multilayer perceptron; and then combining the features obtained by fusion with the features T to be used as the input of a feature fusion block of the next stage.
Further, the classifier comprises a fully connected layer, and the output of the fully connected layer generates the distribution condition of the classification labels through a softmax function.
One or more embodiments provide a false news detection system based on a progressive multimodal fusion network, comprising:
the data acquisition module is used for acquiring news data to be detected, and the news data comprises image information and text information;
the false detection module is used for detecting the news data to be detected based on a pre-trained false news detection model; the false news detection model comprises a text feature encoder, a visual feature encoder, a feature fusion device and a classifier;
the visual feature encoder comprises n-level visual feature extraction blocks which are sequentially connected, the feature fusion device comprises n-level feature fusion blocks which are sequentially connected, and the output end of the text feature encoder is connected to the 1 st-level feature fusion block; the output ends of the ith-level visual feature blocks are connected to the ith-level feature fusion block, and i is less than n; and the output ends of the nth level visual feature extraction block and the nth-1 level feature fusion block are connected to the nth level feature fusion block.
One or more embodiments provide an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the progressive multimodal fusion network based false news detection method when executing the program.
One or more embodiments provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the progressive multimodal fusion network based false news detection method.
One or more of the technical schemes have the following beneficial effects:
in the feature extraction stage, the progressive fusion strategy is adopted to capture the representation information of different levels of the image and the text, and the features of each mode can be fused in a finer granularity, so that the information contained in the image and the text is fully mined, and the model detection precision is improved.
For image characteristics, the problem that false information contained in false news is tampered is considered, the image characteristics are extracted from two layers of a space domain and a frequency domain, and the detection sensitivity of the model to the false news is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
FIG. 1 is a flow diagram of a false news detection method based on a progressive multimodal fusion network according to one or more embodiments of the present invention;
FIG. 2 is a block diagram of progressive multimodal feature extraction and fusion in one or more embodiments of the invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Example one
The embodiment discloses a false news detection method based on a progressive multi-modal fusion network, which comprises the following steps as shown in fig. 1:
step 1: acquiring news data to be detected, wherein the news data comprises image information and text information;
step 2: performing discrete Fourier transform on the image information to obtain frequency domain information;
and step 3: detecting the news data to be detected based on a pre-trained false news detection model; the false news detection model comprises a text feature encoder, a visual feature encoder, a feature fusion device and a classifier.
The visual characteristic encoder comprises a spatial domain characteristic encoder and a frequency domain characteristic encoder; the space domain feature encoder and the frequency domain feature encoder both comprise n levels of visual feature extraction blocks, wherein n is a natural number greater than 2;
the characteristic fusion device comprises n-level characteristic fusion blocks, wherein the output end of the ith-level characteristic fusion block is connected to the input end of the (i + 1) th-level characteristic fusion block, and i is less than n;
the text feature encoder comprises a text feature extraction block, and the output end of the text feature extraction block is connected to the level 1 feature fusion block;
the output end of the ith level visual feature extraction block of the spatial domain feature encoder is divided into two paths, one path is connected to the (i + 1) level visual feature extraction block, and the other path is connected to the ith feature fusion block;
the output end of the ith level visual feature extraction block of the frequency domain feature encoder is divided into two paths, one path is connected to the (i + 1) level visual feature extraction block, and the other path is connected to the ith feature fusion block;
the output end of the text feature encoder is connected to a level 1 feature fusion block, and the feature fusion block fuses two paths of visual features and one path of text features;
and the output ends of the nth level visual characteristic extraction block of the spatial domain characteristic encoder and the frequency domain characteristic encoder and the output end of the (n-1) th level characteristic fusion block are connected to the nth level characteristic fusion block.
In this embodiment, the process of feature extraction and fusion will be specifically described with n being 4 as an example.
Text feature encoder
The multi-mode false news detection mainly comprises information of two modes, namely text and image. The text is a main expression mode of news events, and provides an important clue for judging the credibility of news. Most of the existing methods use a recurrent neural network to model the context information of the input text and capture the surface features of the text, but the fact knowledge extracted by the methods is very limited, and the semantic features of false news are difficult to capture. In order to better extract context information and semantic information of text information, a pre-trained BERT model is adopted for text feature extraction. BERT is trained on a large-scale data set, has strong modeling capability, and has a large amount of common knowledge and semantic knowledge learned therein. Moreover, BERT consists of stacked self-attention layers, which can better capture the connection between contexts.
Specifically, the input to the text feature encoder is a word list of a sequence of sentences in the text, embedding the sentences into a vector. We note the k-dimensional vector of the ith word of the f-th sentence as
Figure BDA0003462553610000061
Recording a bidirectional Transformer pre-training model containing a 12-layer encoder as BRET, and then inputting T into BRET to obtain a feature vector related to the sentence, wherein the specific details are as follows:
Figure BDA0003462553610000062
wherein, VfRepresenting the feature vector of the f-th sentence after being encoded by the BERT pre-training model,
Figure BDA0003462553610000063
is the k-dimensional feature vector represented by the nth position word in the f sentence. For each sentence's feature vector, the features F of the entire text are obtained from all words using a mean pooling operationtObtaining the context information contained in the textInformation and semantic information.
(II) visual feature encoder
The image contained in news has important significance for judging the authenticity of the article, and the article containing the image which is not consistent with the image and text and is maliciously tampered is not real. Starting from two aspects, feature extraction is respectively carried out on image space domain information and frequency domain information, semantic extraction of images is emphasized by the space domain information, whether the images are modified or not is emphasized by the frequency domain information, and the modified images are easier to detect in the frequency domain space.
Spatial domain aspect of the image: in recent work, transformers have been widely used and successful in many tasks of image understanding. Here we use SwinT to extract visual spatial semantic features that will be pre-trained in the ImageNet dataset. We used four Swin Transformer blocks for feature extraction of visual features to varying degrees.
Specifically, an image is first divided into non-overlapping patches by a patch division module. Each patch is treated as a marker, where we set the patch size to 4x4, then unroll each RGB patch, we get a 4x4x3 sized feature vector, which is mapped into the feature space of dim 96 using linear embedding layers. Next, after hierarchical representation, by 4 stages, after each stage, each feature map is down-sampled to 2 times before, and the number of channels is expanded to 2 times before, and input to the next stage. Here expressed as:
Stagei=SwinB(σ(W×Stagei-1))
wherein StageiAnd Stagei-1Respectively as the output and input of the i-th layer, SwinB is a swintransformer block, which is composed of stacked self events, and the layer and heads of the self events contained in the stage of 4 layers respectively adopt [2,2,6,2]And [3,6,12,24]And W is the learning parameter of the down sampling. The feature vectors output through layer 4 are mapped to linear vectors by the linear layer.
Frequency domain aspect of the image: some research work has shown that images after tampering are more easily detected in frequency domain space. Considering the problem that false information contained in false news is tampered, image frequency domain information is subjected to feature extraction to guide false news detection. Firstly, the image is converted from a space domain to a frequency domain by using discrete Fourier transform (DCT), and in order to obtain a deeper feature, the VGG19 is adopted as a feature extractor, and an imaginary part and a real part of frequency domain information are separated and connected and then input into the VGG19, so that a deeper semantic vector is obtained.
fF=VGG19(concat(IFimg,IFreal))
Wherein, IFimgRepresenting the imaginary part, IF, of the frequency domain information of the imagerealRepresenting the real part of the image frequency domain information. The features after the discrete fourier transform contain more information than the discrete cosine transform used in the previous work.
(III) feature fusion device
Image information and text information in news are complementary, images and characters are often compared when people read the news, so that fusion between the characters and the image information is a crucial part in false news detection. A progressive fusion mode is designed, shallow information and character information of an image are processed in a staged mode, and the image and the shallow information are fully utilized. An Mlp Mixer Block is used as an image fusion module, and feature information between different modalities is fused in a finer granularity mode.
In the aspect of images, the spatial domain feature extractor obtains features of different depths at different stages. In extractor order we label these 4 features containing different depths as stages 1,2,3,4, respectively. In terms of frequency domain space, we label the 2 nd, 4 th, 8 th, 16 th convolutional layers of VGG19 as V1,2 th, 3 th, 4 th layers, respectively. The text features extracted by the text feature extractor are denoted as T. Taking the fusion as an example of the shallow feature stage1, the level 1 feature fusion block is configured to perform the following operations:
(1) expanding the number c of the stages 1 and the v1 channels to 512 by one convolution layer with convolution kernel of 3; performing average pooling on the expanded feature maps to obtain the feature maps with the sizes of b, 512 and 1; flattening the feature map, and performing linear mapping to obtain 1000-dimensional feature vectors;
(2) expanding three vectors of stage1, v1 and T on dim1 respectively to (B, 3,1000) dimension vectors to balance distribution of different modes, and then splicing the 3 feature vectors into a feature F with the shape of (B,9,1000) on dimension 1;
(3) performing feature fusion on dimension 2 by using two layers of Mlp for F, performing transposition operation on the fused features, performing feature fusion by using the Mlp to realize the fusion operation of the original features on dimension 1, and finally performing inverse transformation to perform feature compression to recover feature vectors with the same size as the feature T;
(4) and combining the features obtained after fusion as residual modules with the features T, reducing model risks and improving feature extraction capability.
Fi=Mlp mixer(cat(stagei,vi,T))+T
Wherein, Mlp mixer represents a feature fusion device based on linear layers. We used ReLu and LayerNorm to improve fusion ability. The features of the image and the text are gradually fused from shallow to deep, and the association degree between the features of different modes is improved.
For the level 2 feature fusion block, fusion is performed on the basis of the output features of the previous level feature fusion block, stage2 and v2, the specific implementation process is referred to level 1, and the difference is that the text feature T is replaced by the output features of the previous level feature fusion block; the 3 rd level feature fusion block performs fusion based on the output features of the previous level feature fusion block, stage3 and v 3.
And the 4 th-level feature fusion block is final fusion, and the 3 rd-level feature fusion block is fused with stage4 and v4 to obtain a final feature fusion result.
(IV) News classifier
We input the feature information after the multi-modal representation fusion into the fully connected layer, and generate the distribution of classification labels by the output of the fully connected layer through the softmax function:
p=softmax(WCx+bC)
wherein, WCAnd bCIs a parameter of the fully-connected layer,here we use the cross entropy loss function:
L=-∑[yflog pf+(1-yf)log(1-pf)]
wherein, yfIs a true tag of the sample, 0 indicates that the sample is predicted to be false news, 1 indicates that the sample is predicted to be true news, pfRepresenting the probability predicted by the sample.
The method is evaluated on two data sets of microblog and twitter, and the detection precision is superior to that of the existing model.
Example two
The embodiment aims to provide a false news detection system based on a progressive multi-mode converged network. The method comprises the following steps:
the data acquisition module is used for acquiring news data to be detected, and the news data comprises image information and text information;
the false detection module is used for detecting the news data to be detected based on a pre-trained false news detection model; the false news detection model comprises a text feature encoder, a visual feature encoder, a feature fusion device and a classifier;
the visual feature encoder comprises n-level visual feature extraction blocks which are sequentially connected, the feature fusion device comprises n-level feature fusion blocks which are sequentially connected, and the output end of the text feature encoder is connected to the 1 st-level feature fusion block; the output ends of the ith-level visual feature blocks are connected to the ith-level feature fusion block, and i is less than n; and the output ends of the nth level visual feature extraction block and the nth-1 level feature fusion block are connected to the nth level feature fusion block.
EXAMPLE III
The embodiment aims to provide an electronic device.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the progressive multimodal fusion network based false news detection method as described in embodiment one.
Example four
An object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a false news detection method based on a progressive multimodal fusion network as described in the first embodiment.
The steps involved in the second to fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present invention.
Those skilled in the art will appreciate that the modules or steps of the present invention described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code that is executable by computing means, such that they are stored in memory means for execution by the computing means, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. A false news detection method based on a progressive multi-mode converged network is characterized by comprising the following steps:
acquiring news data to be detected, wherein the news data comprises image information and text information;
detecting the news data to be detected based on a pre-trained false news detection model; the false news detection model comprises a text feature encoder, a visual feature encoder, a feature fusion device and a classifier;
the visual feature encoder comprises n-level visual feature extraction blocks which are sequentially connected, the feature fusion device comprises n-level feature fusion blocks which are sequentially connected, and the output end of the text feature encoder is connected to the 1 st-level feature fusion block; the output ends of the ith-level visual feature blocks are connected to the ith-level feature fusion block, and i is less than n; and the output ends of the nth level visual feature extraction block and the nth-1 level feature fusion block are connected to the nth level feature fusion block.
2. A false news detection method as claimed in claim 1, wherein the visual feature encoder includes a spatial domain feature encoder and a frequency domain feature encoder.
3. A false news detection method as claimed in claim 2, wherein after the news data to be detected is obtained, image segmentation is performed on image information therein to obtain a plurality of non-overlapping kxk patches; unfolding each patch, extracting R, G, B three components to obtain a feature vector with the size of k multiplied by 3, and inputting the feature vector into a spatial domain feature encoder through a linear embedding layer;
in the spatial domain feature encoder, each next-level visual feature extraction block performs downsampling and channel expansion on a feature map obtained by the previous-level visual feature extraction block.
4. The false news detection method of claim 2, wherein after news data to be detected is acquired, discrete fourier transform is performed on image information therein to obtain frequency domain information; and separating and connecting the imaginary part and the real part of the frequency domain information as the input of the frequency domain characteristic encoder.
5. The false news detection method of claim 1, wherein the text feature encoder performs feature extraction using a two-way Transformer pre-training model.
6. The false news detection method as claimed in claim 2, wherein the level 1 feature fusion block performs feature fusion using a multi-layer perceptron for the obtained spatial domain visual features, frequency domain visual features and text features; and then combining the features obtained by fusion with the features T to be used as the input of a feature fusion block of the next stage.
7. A false news detection method as claimed in claim 1, wherein the classifier includes a fully-connected layer, the output of which produces a distribution of classification tags by means of a softmax function.
8. A false news detection system based on a progressive multi-mode converged network is characterized by comprising:
the data acquisition module is used for acquiring news data to be detected, and the news data comprises image information and text information;
the false detection module is used for detecting the news data to be detected based on a pre-trained false news detection model; the false news detection model comprises a text feature encoder, a visual feature encoder, a feature fusion device and a classifier;
the visual feature encoder comprises n-level visual feature extraction blocks which are sequentially connected, the feature fusion device comprises n-level feature fusion blocks which are sequentially connected, and the output end of the text feature encoder is connected to the 1 st-level feature fusion block; the output ends of the ith-level visual feature blocks are connected to the ith-level feature fusion block, and i is less than n; and the output ends of the nth level visual feature extraction block and the nth-1 level feature fusion block are connected to the nth level feature fusion block.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for false news detection based on a progressive multimodal fusion network according to any one of claims 1 to 7 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the false news detection method based on progressive multimodal fusion network as claimed in any one of claims 1 to 7.
CN202210021501.2A 2022-01-10 2022-01-10 False news detection method and system based on progressive multi-mode converged network Pending CN114528912A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210021501.2A CN114528912A (en) 2022-01-10 2022-01-10 False news detection method and system based on progressive multi-mode converged network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210021501.2A CN114528912A (en) 2022-01-10 2022-01-10 False news detection method and system based on progressive multi-mode converged network

Publications (1)

Publication Number Publication Date
CN114528912A true CN114528912A (en) 2022-05-24

Family

ID=81620277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210021501.2A Pending CN114528912A (en) 2022-01-10 2022-01-10 False news detection method and system based on progressive multi-mode converged network

Country Status (1)

Country Link
CN (1) CN114528912A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100014A (en) * 2022-06-24 2022-09-23 山东省人工智能研究院 Multi-level perception-based social network image copying and moving counterfeiting detection method
CN115330898A (en) * 2022-08-24 2022-11-11 晋城市大锐金马工程设计咨询有限公司 Improved Swin transform-based magazine, book and periodical advertisement embedding method
CN116091709A (en) * 2023-04-10 2023-05-09 北京百度网讯科技有限公司 Three-dimensional reconstruction method and device for building, electronic equipment and storage medium
CN117370679A (en) * 2023-12-06 2024-01-09 之江实验室 Method and device for verifying false messages of multi-mode bidirectional implication social network

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100014A (en) * 2022-06-24 2022-09-23 山东省人工智能研究院 Multi-level perception-based social network image copying and moving counterfeiting detection method
CN115100014B (en) * 2022-06-24 2023-03-24 山东省人工智能研究院 Multi-level perception-based social network image copying and moving counterfeiting detection method
CN115330898A (en) * 2022-08-24 2022-11-11 晋城市大锐金马工程设计咨询有限公司 Improved Swin transform-based magazine, book and periodical advertisement embedding method
CN116091709A (en) * 2023-04-10 2023-05-09 北京百度网讯科技有限公司 Three-dimensional reconstruction method and device for building, electronic equipment and storage medium
CN117370679A (en) * 2023-12-06 2024-01-09 之江实验室 Method and device for verifying false messages of multi-mode bidirectional implication social network
CN117370679B (en) * 2023-12-06 2024-03-26 之江实验室 Method and device for verifying false messages of multi-mode bidirectional implication social network

Similar Documents

Publication Publication Date Title
Rao et al. A deep learning approach to detection of splicing and copy-move forgeries in images
CN114528912A (en) False news detection method and system based on progressive multi-mode converged network
CN108427738B (en) Rapid image retrieval method based on deep learning
CN106919920B (en) Scene recognition method based on convolution characteristics and space vision bag-of-words model
Zhang et al. Patch strategy for deep face recognition
CN111160350B (en) Portrait segmentation method, model training method, device, medium and electronic equipment
Yu et al. Stratified pooling based deep convolutional neural networks for human action recognition
CN111738169B (en) Handwriting formula recognition method based on end-to-end network model
Huang et al. A novel method for detecting image forgery based on convolutional neural network
CN112667841B (en) Weak supervision depth context-aware image characterization method and system
CN112085120B (en) Multimedia data processing method and device, electronic equipment and storage medium
CN114282013A (en) Data processing method, device and storage medium
CN114691864A (en) Text classification model training method and device and text classification method and device
CN114299304B (en) Image processing method and related equipment
Peng et al. Detection of double JPEG compression with the same quantization matrix based on convolutional neural networks
CN113066089A (en) Real-time image semantic segmentation network based on attention guide mechanism
CN116561272A (en) Open domain visual language question-answering method and device, electronic equipment and storage medium
CN116257609A (en) Cross-modal retrieval method and system based on multi-scale text alignment
CN116955707A (en) Content tag determination method, device, equipment, medium and program product
CN111783734B (en) Original edition video recognition method and device
CN115861605A (en) Image data processing method, computer equipment and readable storage medium
Sui et al. Creating visual vocabulary based on SIFT descriptor in compressed domain
CN113792703B (en) Image question-answering method and device based on Co-Attention depth modular network
Zhan et al. Image orientation detection using convolutional neural network
Wei Research on chinese text classification algorithm based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination