CN114662497A

CN114662497A - False news detection method based on cooperative neural network

Info

Publication number: CN114662497A
Application number: CN202210173809.9A
Authority: CN
Inventors: 薛均晓; 翟蓝航; 石磊; 高宇飞; 刘成明
Original assignee: Zhengzhou University
Current assignee: Zhengzhou University
Priority date: 2022-02-24
Filing date: 2022-02-24
Publication date: 2022-06-24

Abstract

The invention is suitable for the technical field of computer vision and graphic image processing, and provides a false news detection method based on a collaborative neural network, which comprises a text feature extraction module, a visual semantic feature extraction module, a visual tampering feature extraction module, a similarity measurement module and a multi-mode fusion module, and further comprises the following steps: step S1: the text feature extraction module and the visual semantic feature extraction module are responsible for extracting text and visual semantic features and mapping the text and the visual semantic features to the same space; step S2: the visual tampering feature extraction module is responsible for extracting visual physical features and tampering features; through the cooperation of the five sub-networks, the similarity of different modal data in the multi-modal news data, semantic hierarchical features of texts and images and some physical hierarchical features of visual patterns are well captured, so that the false news detection in complex scenes is more suitable than the existing models.

Description

False news detection method based on cooperative neural network

Technical Field

The invention belongs to the technical field of computer vision and graphic image processing, and particularly relates to a false news detection method based on a collaborative neural network.

Background

The method based on machine learning comprises the steps of firstly extracting emotion polarity, user influence and geographic positions by utilizing feature engineering, and then classifying events into fake news and real news by utilizing classifiers such as a training decision tree and a support vector machine. And then, adopting a decision tree to detect rumors according to the characteristics of emotional scores, the number of websites on the microblog, the number of days of user registration and the like.

The method can improve the detection precision of false news, but ignores multi-modal data features of news serving as a multi-modal data set, so that text and visual information of the false news cannot be effectively utilized.

A multimodal approach based on deep learning that utilizes an image-text consistency driven multimodal approach to analyze social media sentiment, a new attention recurrent neural network and dynamically interpretable recommendations for visual fusion. The method mainly solves the problem of how to integrate different forms of information, but because the model adopts a pre-trained image description generation model, the similarity of multi-modal data cannot be directly calculated, and thus the use of scenes is greatly limited.

The method is used for detecting the similarity of false news pictures and texts in false news detection and is used as a part of false news identification characteristics in multi-modal data.

And the branch network is designed to better extract the visual semantic vector to obtain better image semantic expression so as to better capture the semantic features of the fake news in the visual expression.

The two technologies are an error level analysis algorithm and a convolutional neural network, and the authenticity of the news pictures can be judged better on a physical level.

Disclosure of Invention

The invention provides a false news detection method based on a collaborative neural network, and aims to solve the problems.

The invention is realized in this way, a false news detection method based on the cooperative neural network, including text characteristic extraction module, visual semantic characteristic extraction module, visual tampering characteristic extraction module, similarity measurement module and multimodal fusion module, also includes the following steps:

step S1: the text feature extraction module and the visual semantic feature extraction module are responsible for extracting text and visual semantic features and mapping the text and the visual semantic features to the same space;

step S2: the visual tampering feature extraction module is responsible for extracting visual physical features and tampering features;

step S3: the similarity measurement module can directly measure the similarity of the news multi-mode data aiming at the problem that the image is not matched with the text.

Preferably, in the text feature extraction module, a BERT pre-training model is used to extract text features, BiGRU is used to extract the BERT extracted features, BiGRU is used to extract time attributes of the text features, and the text features are converted into a text feature sequence.

Preferably, in the visual semantic feature extraction module, the output of the convolutional neural network is used as a low-level feature of the image, and then is fused with the tamper detection part.

Preferably, the input image is encoded by a ResNet50 pre-trained model, and the image features are encoded using a 1024-dimensional fully-connected layer before the classification layer of the pre-trained ResNet50 model.

Preferably, in the visual tampering feature extraction module, the image is processed through visual transformation or tampering, and the ResNet50 model is applied to extract the image tampering feature.

Preferably, in the similarity measurement module, vector representations of the text and the image are obtained through a visual semantic feature extraction module and a text feature extraction module.

Preferably, in the multi-modal fusion module, the fusion features of the image and the text are obtained through the text feature extraction module, the visual semantic feature extraction module, the visual tampering feature extraction module and the similarity measurement module, and attention weight is given.

Preferably, in the multi-modal fusion module, an attention mechanism is used to assign weighted image features and semantic hierarchy features of images and texts to the physical hierarchy.

Compared with the prior art, the invention has the beneficial effects that: the false news detection method based on the cooperative neural network well captures the similarity of different modal data in multi-modal news data, semantic hierarchical features of texts and images and some physical hierarchical features of visual patterns through the common work of five sub-networks (a text feature extraction module, a visual semantic feature extraction module, a visual tampering feature extraction module, a similarity measurement module and a multi-modal fusion module), so that the false news detection in a complex scene is more suitable than the existing model.

Drawings

FIG. 1 is a schematic diagram of the process steps of the present invention;

FIG. 2 is a schematic diagram of a multi-modal data collaborative neural network architecture in accordance with the present invention;

FIG. 3 is a schematic diagram of a comparison diagram of the states before and after processing by an error level analysis algorithm in the present invention;

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1-3, the present invention provides a technical solution: a false news detection method based on a collaborative neural network comprises a text feature extraction module, a visual semantic feature extraction module, a visual tampering feature extraction module, a similarity measurement module and a multi-mode fusion module, and further comprises the following steps:

As shown in fig. 2, the architecture of the cooperative neural network is shown, where a blue network is a text feature extraction module, a red network is a visual semantic feature extraction module, an orange network is a visual tampering feature extraction module, a purple network is a similarity measurement module, and a pink network is a multi-modal fusion module, and the implementation methods of the modules are as follows:

in the present embodiment, in the text feature extraction module, it is not so simple for the sentences that need to be processed in reality, for example, the context of a word should be considered for solving the word ambiguity problem, and for this problem, the text feature is extracted by using the BERT pre-training model, as shown in equation (1):

wherein, t_iRepresenting the input text sequence, h^t _iRepresenting the text feature vector after the BERT was embedded. Then, in order to better capture global feature information and achieve better integration with image semantic information, BiGRU (BiGRU is a neural network model consisting of unidirectional, opposite-direction GRU whose output is determined jointly by the states of two GRUs, at each instant the input will simultaneously provide two opposite-direction GRU's, and the output is determined jointly by the two unidirectional GRUs) is used to extract BERT-extracted features, which BiGRU can further extract temporal attributes of text features, converting the text features into a text feature sequence. As shown in equation (2):

wherein h is^t _iRepresenting the text feature vector after embedding the BERT, f_i ^tRepresenting the text feature sequence extracted by the BiGRU.

In the embodiment, the visual semantic feature extraction module takes the output result of the convolutional neural network as the low-level feature of the image, and fuses with the tampering detection part to realize the analysis of the image physical layer. To obtain a better semantic representation of the visual part, the input image is first encoded using the ResNet50 pre-training model. The image features are encoded using a 1024-dimensional fully-connected layer before the classification layer of the pre-trained ResNet50 model. The image representation is a 1024-dimensional vector. The process is shown in formula (3):

wherein v is_iRepresenting the input original image, h^v _iRepresenting the visual semantic features extracted by the ResNet 50. The hierarchical structure of the input image is computed as a weighted sum of different feature vectors. During this training process, vectors are initialized randomly and learned together. Up to now, input data has been obtained. Semantic features of the image are then accepted through an attention mechanism, highlighting portions of the image with strong emotional expressions. Thus, when a visual modal representation is obtained, each feature is given a weight to indicate its "importance" in the modal representation.

As shown in formulas (4) to (6):

wherein W_iRepresenting a weight matrix, b_iIs a deviation term, U^TRepresenting a transposable weight vector, u is a scoring function that evaluates the importance of each feature vector. And then, carrying out normalization weighting on the ith feature vector i by utilizing a SoftMax function, and representing the high-level semantic meaning of the image. To better represent the images, the BiGRU is used to form the image ordering module. The image sequence module is typically used for the generation of image descriptions, aligning image features with text features. An image sequence module is introduced, and the feature expression of the image is sent to the BiGRU, so that the semantic vector of the visual mode is obtained. The step is equivalent to a commonly used embedding layer in text analysis, and the semantic feature analysis of the image is converted into the hierarchy of a semantic sequence.

Compared with directly utilizing the features of the image, the semantic information of the image is more favorably expressed, and the semantic information is expressed by the following formula (7):

in the present embodiment, the visual tamper feature extraction module performs information processing on an image by visual transformation or tampering, and then extracts an image tamper feature by applying the ResNet50 model, as shown in equation (8) to equation (9):

wherein v is_iRepresenting the original image of the input. v. of^elaIs represented by EOriginal image of LA processing, h^ela _iIndicating 245 tamper features extracted by ResNet 50.

In this embodiment, in the visual tampering feature extraction module, the visual semantic feature extraction module and the text feature extraction module obtain vector representations of texts and images. Proving that two sub-network modules learn the common representation space of image and text patterns, applying a fully connected layer at the last layer of each sub-network module and forcing the two sub-networks to share the weight of the last layer. Through feature sharing, semantic representation of images and texts is obtained. This is to visually present the same category of image and text examples. The cosine similarity is then used to measure the similarity between the image and the text. As shown in equation (10):

wherein s is^t，s^vRespectively representing an image semantic sequence and a text feature sequence.

The value range of s is [ -1, 1], where-1 represents 265 the similarity of the text and the image 0, and 1 represents the similarity of the text and the image 1. To map the similarity between 0 and 1, an attempt is made to select the sigmoid activation function here to map the similarity between [0, 1], as shown by equation (11):

p^s＝sigmoid(s) (11)

where sigmoid is the activation function used to map the similarity between 0 and 1.

Given the sheer similarity analysis, news articles formed by text and visual information mismatches are more susceptible to tampering than news articles formed by image and text matches. Next, a cross-entropy based loss function can be established, as follows:

τ_s＝-E_{(a，y)～(A，Y)}(ylog(1-p^s)+(1-y)logp^a) (12)

in the present embodiment, in the multi-modal fusion module, the feature representation h of the image physical layer is obtained by the four sub-network modules^v,h^elaAnd a feature representation s of the image and text semantic layers^c＝[s^v,s^t]. In the multi-modal fusion module, the attention mechanism is utilized to assign the weighted image features and the semantic hierarchy features of the image and the text to the physical hierarchy, and the weighted image features and the semantic hierarchy features can be expressed as f^c＝[s^c,f^p,h^v,h^ela]. To highlight more valuable features, as shown in formulas (14) -16:

W_irepresenting a weight matrix, b_iIs a deviation term, U^TRepresenting a transposable weight vector, u is a scoring function that weights the importance of each feature vector. Meanwhile, through the step, the fusion characteristics of the image and the text are obtained, and attention weight is given. The goal of (1) is to map text and visual features in the news onto tags, thereby predicting the probability that they are false news. The correspondence between features and tags is achieved by applying the Softmax function, as shown in equation (17)

p^c＝softmax(W_p·s^e+b_p) (17)

And defines a cross-entropy based loss function:

τ_p(θ_t,θ_v,θ_p)＝-E_{(a,y)～(A,Y)}(y·logp^c+(1-y)·logp^c)

in order to better combine the similarity between pictures and texts, the semantic features of graphics and the visual physical hierarchy features, the final loss function is as follows (18):

τ(θ_t,θ_v,θ_p)＝ατ_p(θ_t,θ_v,θ_p)+βτ_s(θ_t,θ_v) (18)

wherein the parameters can be jointly learned by:

referring to fig. 3, a comparison graph of the before and after state processed by the compression or error level analysis algorithm, wherein (a) is the tampered image and (b) is the image processed by ELA, it can be seen that the hastelly is highlighted as the tampered portion. (c) Is an image that has not been recompressed and processed by ELA. (d) After (c) recompression transformation after ELA processing, it can be seen that the recompressed image and the original image after ELA transformation show different features. Therefore, the ELA algorithm can better highlight the malicious splicing and recompression characteristics of the false image.

The above description is intended to be illustrative of the preferred embodiment of the present invention and should not be taken as limiting the invention, but rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims

1. A false news detection method based on a collaborative neural network is characterized in that: the method comprises a text feature extraction module, a visual semantic feature extraction module, a visual tampering feature extraction module, a similarity measurement module and a multi-mode fusion module, and further comprises the following steps:

2. The false news detection method based on the cooperative neural network as claimed in claim 1, wherein: in the text feature extraction module, a BERT pre-training model is adopted to extract text features, a BiGRU is used to extract the features extracted by the BERT, the BiGRU is used to extract the time attributes of the text features, and the text features are converted into a text feature sequence.

3. The false news detection method based on the cooperative neural network as claimed in claim 1, wherein: and in the visual semantic feature extraction module, the output of the convolutional neural network is used as the low-level feature of the image and then is fused with the tampering detection part.

4. A false news detection method based on a collaborative neural network as claimed in claim 3, characterized in that: the input image is encoded by a ResNet50 pre-trained model, and the image features are encoded using a 1024-dimensional fully connected layer before the classification layer of the pre-trained ResNet50 model.

5. The false news detection method based on the cooperative neural network as claimed in claim 1, wherein: in the visual tampering feature extraction module, the image is subjected to information processing through visual transformation or tampering, and an ResNet50 model is applied to extract image tampering features.

6. The false news detection method based on the cooperative neural network as claimed in claim 1, wherein: in the similarity measurement module, vector representation of texts and images is obtained through a visual semantic feature extraction module and a text feature extraction module.

7. The false news detection method based on the cooperative neural network as claimed in claim 1, wherein: in the multi-mode fusion module, fusion characteristics of the image and the text are obtained through a text characteristic extraction module, a visual semantic characteristic extraction module, a visual tampering characteristic extraction module and a similarity measurement module, and attention weight is given.

8. The false news detection method based on the cooperative neural network as claimed in claim 7, wherein: in the multi-mode fusion module, a weight image feature and semantic hierarchy features of images and texts are distributed to a physical hierarchy by using an attention mechanism.