CN111079444A - Network rumor detection method based on multi-modal relationship - Google Patents

Network rumor detection method based on multi-modal relationship Download PDF

Info

Publication number
CN111079444A
CN111079444A CN201911379313.1A CN201911379313A CN111079444A CN 111079444 A CN111079444 A CN 111079444A CN 201911379313 A CN201911379313 A CN 201911379313A CN 111079444 A CN111079444 A CN 111079444A
Authority
CN
China
Prior art keywords
vector
semantic
information
visual feature
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911379313.1A
Other languages
Chinese (zh)
Other versions
CN111079444B (en
Inventor
张勇东
毛震东
邓旭冉
赵博文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Research Institute
University of Science and Technology of China USTC
Original Assignee
Beijing Zhongke Research Institute
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Research Institute, University of Science and Technology of China USTC filed Critical Beijing Zhongke Research Institute
Publication of CN111079444A publication Critical patent/CN111079444A/en
Application granted granted Critical
Publication of CN111079444B publication Critical patent/CN111079444B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a network rumor detection method based on a multi-modal relationship, which comprises the following steps: acquiring an image to be detected and a related text which are issued on a network platform; extracting visual feature vectors containing different classes of objects in the image through a pre-training faster-CNN model; after preprocessing the text, extracting semantic vectors through GRU; capturing the importance degree of the visual feature vector and the semantic vector through an attention mechanism, and realizing cross-mode association between the image and the text, so as to update the visual feature vector and the semantic vector; and for the visual feature vector and the semantic vector, the relationship of internal dynamic information is respectively modeled through an attention mechanism, so that the visual feature vector and the semantic vector are updated; and connecting the visual feature vector obtained by updating the two parts with the semantic vector, and obtaining the probability that the information to be detected is the rumor and the real category through a second classifier. The method can automatically judge whether the information to be detected belongs to the network rumor or, and has high detection accuracy.

Description

Network rumor detection method based on multi-modal relationship
Technical Field
The invention relates to the technical field of network space security, in particular to a network rumor detection method based on multi-modal relations.
Background
The rise of the network society enables opportunities and challenges to coexist, especially, the stability of network space is seriously influenced by the low admission threshold of internet access and the freedom of information dissemination, and the wantonly dissemination of network rumors is one of the problems which must be regarded as important. Today's social networking platform users are already broken by hundreds of millions, highly active, widely spread, quickly spread, widely used, free of time-space constraints, and their magnifier features magnify the impact of information at multiples, especially with sensitive topics, focus events, hot spot problems, major public events, emergencies well known between days, or cause loss of trust, impaired government, corporate image, complaints boiling, so automatic and rapid detection of network rumor is of great importance to cyberspace security.
With the development of multimedia technology, both self-media and professional media start to shift to multimedia news forms based on pictures, texts and short videos. Multimedia content carries richer and more intuitive information, can better describe news events, and is more easily and widely disseminated. Studies have shown that the average number of hops for media with streaming images is 11 times that of plain text. As such, false news or rumors often use highly aggressive pictures to attract and mislead readers, spreading quickly and widely, which has made detection of visual modal content a non-negligible part of dealing with network rumor challenges.
The traditional work of false content detection based on visual modal content mainly utilizes traditional manual characteristics, such as visual definition, visual similarity histogram, double JPEG (joint photographic experts group) compression traces and the like, which usually have good effect on rough picture tampering, but with the continuous improvement of picture generation technology, the methods can not ensure precision and also obviously improve the resource cost requirement.
In recent years, with the rapid development of neural networks and deep learning models, corresponding detection technologies have come into play and have achieved great success. In the false information detection, a multi-mode detection method for distinguishing the authenticity of news by simultaneously utilizing text and visual Morita information is also generated. In the prior work, representative examples include: attRNN, EANN and MVAE. These methods, while providing a heuristic approach in the detection of spurious information in a multimodal form, still have significant drawbacks. Firstly, the extraction process of the image and the text information is still rough, especially the semantic features of the picture; and secondly, in the feature fusion stage, the features of the two modes are simply spliced, so that the interaction and the association among the modes are difficult to express.
Disclosure of Invention
The invention aims to provide a network rumor detection method based on a multi-modal relationship, which can automatically judge whether information to be detected belongs to the network rumor and has higher detection accuracy.
The purpose of the invention is realized by the following technical scheme:
a network rumor detection method based on multi-modal relations comprises the following steps:
acquiring information to be detected, including images and related texts, issued on a network platform;
for the image, extracting visual characteristic vectors containing objects of different classes in the image through a pre-trained fasterR-CNN model;
for the text, after preprocessing, extracting semantic vectors through a gate control circulation unit;
capturing the importance degree of the visual feature vector and the semantic vector through an attention mechanism, and realizing cross-mode association between the image and the text, so as to update the visual feature vector and the semantic vector; based on the updated visual feature vector and semantic vector, the relationship of the internal dynamic information is respectively modeled through an attention mechanism, so that the visual feature vector and the semantic vector are updated again; and connecting the visual feature vector obtained by updating again with the semantic vector, and obtaining the probability that the information to be detected is the rumor category and the real category through a two-classifier.
According to the technical scheme provided by the invention, the text information and the image information are inspected simultaneously by using multi-mode feature fusion, so that the accuracy is higher; meanwhile, the method is different from other multi-mode methods using attention mechanisms, and gives consideration to information in the modes at the same time, so that the model can integrate richer information relation. The method can obtain accurate detection results only by using single information as input, and can quickly detect and process at the initial stage of rumor propagation.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a schematic diagram of a model structure of a network rumor detection method based on a multi-modal relationship according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a network rumor detection method based on a multi-modal relationship. In the feature extraction stage, the image features are extracted by using a fast R-CNN-based target detection model, and specific targets and salient regions in the image can be noted. In the feature fusion stage, different from the prior art that attention is paid to the relation between images and texts, the method applies an attention mechanism to information in the same modality, and the advantage is that the related information in the modality can supplement the information between the modalities. The method provided by the invention has a good effect on the Weibo RumorSet data set, and can find a false information case which is difficult to distinguish by using a single mode in the traditional scheme.
As shown in fig. 1, a schematic model structure diagram of a network rumor detection method based on multi-modal relationships according to an embodiment of the present invention mainly includes the following five parts:
1. multimodal data acquisition.
In the embodiment of the invention, the information to be detected, including images and related texts, issued on the network platform is acquired.
Illustratively, the retrieval may be by a social media platform, e.g., a micro-blogging platform.
In the embodiment of the invention, the text contained in the information to be detected and the text attached when other users forward the information to be detected. For example, the microblog information acquired from the microblog platform includes, in addition to the text including the microblog information itself, text attached to the transfer of the microblog information by another user.
2. And (5) extracting visual features.
In the embodiment of the invention, for the image, the Visual feature vectors containing different classes of objects in the image are extracted through a faster-CNN model pre-trained on Visual Genome.
The fast R-CNN model is a classical model commonly used in target detection, and for a given picture I, the model can output target level information in the picture, namely visual feature vectors V ═ V of objects of different classes1,v2,…,vKIn which v isiThe visual feature vector representing an object, i ═ 1,2, …, K, represents the total number of feature vectors (here 36). Illustratively, the visual feature vector V may be a K × 2048 dimensional visual feature matrix. Compared with the traditional picture feature extraction mode, the method provided by the embodiment of the invention is more concentrated on the target or other salient regions of the image.
3. And preprocessing the text and extracting the features.
In the embodiment of the invention, after the text is preprocessed, the semantic vector is extracted through the gate control circulation unit. It should be noted that the text in the dashed box at the lower left corner of fig. 1 is only schematic.
1) And (4) preprocessing.
For text, due to the complexity and disorder of social media information, many useless redundant information such as symbolic expressions, special characters, URLs (uniform resource locators) and the like are generated, and thus preprocessing is required. Specifically, all the redundant information such as the URL, the special characters, the emoticons and the like is selected to be ignored, only the residual character information is reserved, and then the residual character information is spliced into a text sequence, and separators are used as identifiers in splicing gaps. For example, after the text in the microblog is preprocessed, only the remaining text information is retained, and then the remaining text information of the source microblog and the remaining text information of the subsequent forwarding microblog are sequentially spliced into a sequence L.
2) And (5) feature extraction.
Statistically, 98% of the texts in the data set are not longer than 150 characters after preprocessing, so that for the purpose of computational efficiency limitation, a section of text L contains 150 words at most, and the excessive words are discarded and insufficient completion is performed. And then, performing word feature vectorization by using pre-trained GLOVE (pre-trained in Wikipedia in Chinese), expressing the preprocessed text in a matrix form, and performing feature extraction by using a Gated Round Unit (GRU) to obtain a semantic vector E.
Illustratively, the vectorization of word features is represented as a 150 × 300 matrix, with hidden state size 512 for GRU feature extraction.
4. And (5) feature fusion.
1) Information circulation and interaction among the modalities.
In the embodiment of the invention, the importance degree of the visual feature vector and the semantic vector is captured through an attention mechanism, and cross-modal association between the image and the text is realized, so that the visual feature vector and the semantic vector are updated; specifically, the method comprises the following steps: the visual feature vector and the semantic vector are respectively used as modal information, for the information between the modalities, the importance degree of each (visual feature vector and semantic vector) pair (any pair of the visual feature vector and the semantic vector) is extracted through an attention mechanism, the information between different modalities flows according to the importance degree so as to update the information of each modality, and the cross-modal association between the image and the text is realized through an information flowing process. The operation process is as follows:
respectively carrying out linear transformation on the visual characteristic vector V and the semantic vector E to obtain a k value, a q value and a V value required by an attention mechanism, and then obtaining inter-modal attention weight through vector inner product:
Figure BDA0002341851880000051
Figure BDA0002341851880000052
wherein E represents a semantic vector, and V represents a visual feature vector; q. q.sV、kVQ, k values, q representing the visual feature vector VE、kERepresenting the q and k values of the semantic vector E, and dim represents the vector dimension; interattE→V、InterAttV→EThe attention weight from the semantic vector to the visual feature vector and the attention weight from the visual feature vector to the semantic vector are sequentially represented, and the two bidirectional matrixes contain important information between paired image areas and words.
As will be understood by those skilled in the art, the k, q, and v values are inherent variables in the attention mechanism, respectively, key, query, value; in brief, the attention mechanism calculates similarity by using query of the source end and key of the target end and normalizes to obtain attention weight, and then multiplies value of the target end to obtain an attention update vector of the target end to a source end vector, so that information flows among different modes:
V′=InterAttE→V×vE
E′=InterAttV→E×vV
wherein v isE、vVAnd respectively representing the V values of the semantic vector E and the visual characteristic vector V.
Then, the updated visual feature vector V 'and semantic feature vector E' are connected with the original visual feature vector V and semantic vector E in series through a full connection layer to obtain the visual feature vector V*And semantic feature vector E*And then input to a subsequent intra-modality association module to further learn information flow within the modality.
2) Dynamic information relationship modeling within a modality.
For input visual feature vector V*And semantic vector E*And respectively modeling the relationship of internal dynamic information through an attention mechanism to serve as supplementary information of cross-modal association, and updating the visual feature vector and the semantic vector.
In addition, the association of information within a modality should also utilize information of another modality as a condition, for example, different associations should be made between image regions according to different word phrases. For this purpose,firstly, pooling visual feature vectors and semantic vectors respectively, affine transforming the visual feature vectors and the semantic vectors to the dimensions same as k values, q values and v values, and then calculating a channel type conditional gate vector MV→E,ME→VIntroducing another modality information:
MV→E=Sigmoid(Linear(V* pool))
ME→V=Sigmoid(Linear(E* pool))
wherein, Linear (V)* pool) And Linear (E)* pool) Are respectively visual feature vector V*And semantic vector E*Performing pooling and affine transformation results; sigmoid denotes Sigmoid function.
Next, two channel condition gate vectors modulate k and q values of two modes, which are to be activated or deactivated by a channel condition gate of another mode, and the updated k and q values are:
Figure BDA0002341851880000061
Figure BDA0002341851880000062
Figure BDA0002341851880000063
Figure BDA0002341851880000064
in the above formula, the first and second carbon atoms are,
Figure BDA0002341851880000065
representing updated visual feature vectors
Figure BDA0002341851880000066
The values of q and k of (a),
Figure BDA0002341851880000067
representing updated semantic vectors
Figure BDA0002341851880000068
Q, k values of (1);
Figure BDA0002341851880000069
visual feature vector V representing input*The values of q and k of (a),
Figure BDA00023418518800000610
semantic vector E representing input*Q, k values of (1).
Those skilled in the art will understand that the result M of sigmoid is in the (-1,1) interval, and the value of 1+ M is in the (0,2) interval, and then multiplied by the original q value and k value point to play the role of similar scaling, i.e. corresponding to "activation or deactivation". The updated q value and k value refer to the q value and k value when the information of the other modality is taken as the condition information, and are equivalent to the introduction of the information of the other modality.
After the updated k value and q value are obtained, generating weights by using an attention mechanism, and updating the visual feature vector and the semantic vector, wherein the definition is as follows:
Figure BDA00023418518800000611
Figure BDA00023418518800000612
Figure BDA00023418518800000613
Figure BDA00023418518800000614
wherein, IntraAttV→V、IntraAttE→ESequentially represents the attention weight inside the visual feature vector and the attention weight inside the semantic vector,
Figure BDA00023418518800000615
respectively, the input visual feature vector V*Semantic vector E*The updated visual feature vector and semantic vector are respectively
Figure BDA00023418518800000616
In the specific implementation process, information circulation and interaction among the modes and dynamic information relation modeling in the modes can be realized by one sub-module respectively, the two sub-modules form one basic module, the three basic modules are stacked to obtain final visual and semantic vectors, and finally, the visual feature vectors and the semantic vectors are point-multiplied together to obtain final fusion feature vectors (multi-mode feature information).
5. And outputting the classification.
The network rumor detection problem is regarded as a classification problem, the finally fused multi-modal characteristic information is input into a multi-layer perceptron to serve as a second classifier, and the probability that the information to be detected is a rumor class and a real class is obtained through a Softmax function.
In the embodiment of the invention, the whole method is regarded as a model, the loss function of the whole model in the training process can use a cross entropy loss function, and the classifier can distinguish the rumor category from the real category through the characteristics of multi-modal characteristic information through training.
The probability of the rumor category and the real category can be obtained and then the final detection result can be determined in a conventional manner, for example, the final detection result is judged by a set threshold, and since there are only two categories, when the probability of a certain category is greater than 0.5, the detection result can be judged to belong to the category. Certainly, a higher threshold may be set for obtaining a greater confidence, for example, in a certain example, the probability of the rumor and the real two categories is (0.99, 0.01), that is, the probability of the rumor is 99%, the probability of the real category is 1%, and the probability of the rumor category is greater than the set threshold (e.g., 90%), then the higher confidence may be that the information to be detected is the rumor. Of course, the specific value for the threshold value can be set by the skilled person according to actual conditions or experience.
In the model shown in fig. 1, the loss function may use a cross-entropy loss function during the training process. The data set may use a weibo rumor set. The data set data is collected on a microblog platform, and the specific quantity distribution is as follows:
number of samples Including the number of pictures
Real data 4779 5318
Rumor data 4748 7954
TABLE 1 data set distribution
The scheme of the embodiment of the invention has good effect in the data set shown in the table 1, and can find the false information case which is difficult to distinguish by using a single mode in the traditional scheme.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (7)

1. A network rumor detection method based on multi-modal relations is characterized by comprising the following steps:
acquiring information to be detected, including images and related texts, issued on a network platform;
for the image, extracting visual characteristic vectors containing objects of different classes in the image through a pre-trained fasterR-CNN model;
for the text, after preprocessing, extracting semantic vectors through a gate control circulation unit;
capturing the importance degree of the visual feature vector and the semantic vector through an attention mechanism, and realizing cross-mode association between the image and the text, so as to update the visual feature vector and the semantic vector; based on the updated visual feature vector and semantic vector, the relationship of the internal dynamic information is respectively modeled through an attention mechanism, so that the visual feature vector and the semantic vector are updated again; and connecting the visual feature vector obtained by updating again with the semantic vector, and obtaining the probability that the information to be detected is the rumor category and the real category through a two-classifier.
2. The method of claim 1, wherein the visual feature vectors containing different types of objects are expressed as V ═ V { V } V1,v2,…,vKIn which v isiVisual characteristic vector representing an object, K tableThe total number of feature vectors, i ═ 1,2, …, K.
3. The method of claim 1, wherein the associated text comprises: the text contained in the information to be detected and the text attached when other users forward the information to be detected.
4. The method of claim 2, wherein preprocessing the text comprises: removing redundant information in the text, only reserving character information, splicing the character information into a text sequence, and using separators as identifiers in splicing gaps; the redundant information includes at least one or more of the following information: symbolic expressions, special characters, uniform resource locators.
5. The method for detecting network rumors based on multi-modal relationships according to claim 1, wherein before the semantic vector is extracted through the gated-cycle unit, word features are vectorized by using pre-trained GLOVE, the preprocessed text is expressed in a matrix form, and feature extraction is performed by using the gated-cycle unit, so as to obtain the semantic vector.
6. The method of claim 1, wherein the updating the visual feature vectors and the semantic vectors by capturing importance of the visual feature vectors and the semantic vectors and implementing cross-modal association between images and texts through an attention mechanism comprises:
the visual feature vector and the semantic vector are respectively used as modal information, the importance degree of each (visual feature vector and semantic vector) pair is extracted through an attention mechanism, the flow between different modal information is realized according to the importance degree so as to update the modal information, and the cross-modal association between the image and the text is realized through an information flow process; the operation process is as follows:
respectively carrying out linear transformation on the visual characteristic vector and the semantic vector to obtain a k value, a q value and a v value required by an attention mechanism, and then obtaining inter-modal attention weight through vector inner product:
Figure FDA0002341851870000021
Figure FDA0002341851870000022
wherein E represents a semantic vector, and V represents a visual feature vector; the k value, the q value and the v value are inherent variables in the attention mechanism and are respectively a key value, a query value and a context vector; q. q.sV、kVQ, k values, q representing the visual feature vector VE、kERepresenting the q and k values of the semantic vector E, and dim represents the vector dimension; interattE→V、InterAttV→ESequentially representing the attention weight from the semantic vector to the visual characteristic matrix and the attention weight from the visual characteristic matrix to the semantic vector;
and then updating the feature vector of the modal information by using other modal information according to the attention weight, so as to realize the information flow among different modes:
V′=InterE→V×V
E′=InterV→E×E
wherein v isE、vVRespectively representing the V values of the semantic vector E and the visual characteristic vector V;
then the updated visual feature vector V 'and semantic feature vector E' are connected with the original visual feature vector V and semantic vector E in series through a full connection layer to obtain the visual feature vector V*And semantic feature vector E*
7. The method of claim 6, wherein the step of updating the visual feature vector and the semantic vector again by using an attention mechanism based on the updated visual feature vector and the semantic vector to respectively model the relationship of the internal dynamic information comprises:
to visual feature vector V*And semantic vector E*Pooling and affine transformation are respectively carried out to the dimensions of the k value, the q value and the v value, and then a channel type conditional gate vector M is calculatedV→E,ME→VIntroducing another modality information:
MV→E=Sigmoid(Linear(V* pool))
ME→V=Sigmoid(Linear(E* pool))
wherein, Linear (V)* pool) And Linear (E)* pool) Respectively, are visual feature vectors V*And semantic vector E*Performing pooling and affine transformation results; sigmoid represents Sigmoid function;
the two channel condition gate vectors modulate k values and q values of two modes, the k values and the q values are activated or deactivated through channel condition gates of other modes, and the updated k values and q values are as follows:
Figure FDA0002341851870000031
Figure FDA0002341851870000032
Figure FDA0002341851870000033
Figure FDA0002341851870000034
in the above formula, the first and second carbon atoms are,
Figure FDA0002341851870000035
representing updated visual feature vectors
Figure FDA0002341851870000036
The values of q and k of (a),
Figure FDA0002341851870000037
representing updated semantic vectors
Figure FDA0002341851870000038
Q, k values of (1);
Figure FDA0002341851870000039
visual feature vector V representing input*The values of q and k of (a),
Figure FDA00023418518700000310
semantic vector E representing input*Q, k values of (1);
after the updated k value and q value are obtained, generating weights by using an attention mechanism, and updating respective internal dynamic information of the visual feature vector and the semantic vector:
Figure FDA00023418518700000311
Figure FDA00023418518700000312
Figure FDA00023418518700000313
Figure FDA00023418518700000314
wherein, IntraAttV→V、IntraAttE→ESequentially represents the attention weight in the visual special vector and the attention weight in the semantic vector,
Figure FDA00023418518700000315
are respectively visual feature vector V*Semantic vector E*The value of (d) v.
CN201911379313.1A 2019-12-25 2019-12-27 Network rumor detection method based on multi-modal relationship Active CN111079444B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911357589X 2019-12-25
CN201911357589 2019-12-25

Publications (2)

Publication Number Publication Date
CN111079444A true CN111079444A (en) 2020-04-28
CN111079444B CN111079444B (en) 2020-09-29

Family

ID=70318707

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911379313.1A Active CN111079444B (en) 2019-12-25 2019-12-27 Network rumor detection method based on multi-modal relationship

Country Status (1)

Country Link
CN (1) CN111079444B (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582587A (en) * 2020-05-11 2020-08-25 深圳赋乐科技有限公司 Prediction method and prediction system for video public sentiment
CN111611981A (en) * 2020-06-28 2020-09-01 腾讯科技(深圳)有限公司 Information identification method and device and information identification neural network training method and device
CN111797326A (en) * 2020-05-27 2020-10-20 中国科学院计算技术研究所 False news detection method and system fusing multi-scale visual information
CN111967277A (en) * 2020-08-14 2020-11-20 厦门大学 Translation method based on multi-modal machine translation model
CN111985369A (en) * 2020-08-07 2020-11-24 西北工业大学 Course field multi-modal document classification method based on cross-modal attention convolution neural network
CN112015955A (en) * 2020-09-01 2020-12-01 清华大学 Multi-mode data association method and device
CN112035670A (en) * 2020-09-09 2020-12-04 中国科学技术大学 Multi-modal rumor detection method based on image emotional tendency
CN112035669A (en) * 2020-09-09 2020-12-04 中国科学技术大学 Social media multi-modal rumor detection method based on propagation heterogeneous graph modeling
CN112199606A (en) * 2020-10-30 2021-01-08 福州大学 Social media-oriented rumor detection system based on hierarchical user representation
CN112200197A (en) * 2020-11-10 2021-01-08 天津大学 Rumor detection method based on deep learning and multi-mode
CN112528015A (en) * 2020-10-26 2021-03-19 复旦大学 Method and device for judging rumor in message interactive transmission
CN112926569A (en) * 2021-03-16 2021-06-08 重庆邮电大学 Method for detecting natural scene image text in social network
CN113221872A (en) * 2021-05-28 2021-08-06 北京理工大学 False news detection method for generating convergence of countermeasure network and multi-mode
CN113239730A (en) * 2021-04-09 2021-08-10 哈尔滨工业大学 Method for automatically eliminating structural false modal parameters based on computer vision
CN113239926A (en) * 2021-06-17 2021-08-10 北京邮电大学 Multi-modal false information detection model based on countermeasures
CN113434684A (en) * 2021-07-01 2021-09-24 北京中科研究院 Rumor detection method, system, equipment and storage medium for self-supervision learning
CN113469214A (en) * 2021-05-20 2021-10-01 中国科学院自动化研究所 False news detection method and device, electronic equipment and storage medium
CN113688955A (en) * 2021-10-25 2021-11-23 北京世纪好未来教育科技有限公司 Text recognition method, device, equipment and medium
CN113743522A (en) * 2021-09-13 2021-12-03 五八同城信息技术有限公司 Detection method and device for illegal behavior and electronic equipment
CN113822224A (en) * 2021-10-12 2021-12-21 中国人民解放军国防科技大学 Rumor detection method and device integrating multi-modal learning and multi-granularity structure learning
CN114417001A (en) * 2022-03-29 2022-04-29 山东大学 Chinese writing intelligent analysis method, system and medium based on multi-mode
CN115809327A (en) * 2023-02-08 2023-03-17 四川大学 Real-time social network rumor detection method for multi-mode fusion and topics
CN117574261A (en) * 2023-10-19 2024-02-20 重庆理工大学 Multi-field false news reader cognition detection method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902621A (en) * 2012-12-28 2014-07-02 深圳先进技术研究院 Method and device for identifying network rumor
CN105045857A (en) * 2015-07-09 2015-11-11 中国科学院计算技术研究所 Social network rumor recognition method and system
CN109241379A (en) * 2017-07-11 2019-01-18 北京交通大学 A method of across Modal detection network navy
CN110019812A (en) * 2018-02-27 2019-07-16 中国科学院计算技术研究所 A kind of user is from production content detection algorithm and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902621A (en) * 2012-12-28 2014-07-02 深圳先进技术研究院 Method and device for identifying network rumor
CN105045857A (en) * 2015-07-09 2015-11-11 中国科学院计算技术研究所 Social network rumor recognition method and system
CN109241379A (en) * 2017-07-11 2019-01-18 北京交通大学 A method of across Modal detection network navy
CN110019812A (en) * 2018-02-27 2019-07-16 中国科学院计算技术研究所 A kind of user is from production content detection algorithm and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
石磊 等: "结合自注意力机制和Tree-LSTM的情感分析模型", 《小型微型计算机系统》 *

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582587B (en) * 2020-05-11 2021-06-04 深圳赋乐科技有限公司 Prediction method and prediction system for video public sentiment
CN111582587A (en) * 2020-05-11 2020-08-25 深圳赋乐科技有限公司 Prediction method and prediction system for video public sentiment
CN111797326A (en) * 2020-05-27 2020-10-20 中国科学院计算技术研究所 False news detection method and system fusing multi-scale visual information
CN111797326B (en) * 2020-05-27 2023-05-12 中国科学院计算技术研究所 False news detection method and system integrating multi-scale visual information
CN111611981A (en) * 2020-06-28 2020-09-01 腾讯科技(深圳)有限公司 Information identification method and device and information identification neural network training method and device
CN111985369A (en) * 2020-08-07 2020-11-24 西北工业大学 Course field multi-modal document classification method based on cross-modal attention convolution neural network
CN111985369B (en) * 2020-08-07 2021-09-17 西北工业大学 Course field multi-modal document classification method based on cross-modal attention convolution neural network
CN111967277B (en) * 2020-08-14 2022-07-19 厦门大学 Translation method based on multi-modal machine translation model
CN111967277A (en) * 2020-08-14 2020-11-20 厦门大学 Translation method based on multi-modal machine translation model
CN112015955B (en) * 2020-09-01 2021-07-30 清华大学 Multi-mode data association method and device
CN112015955A (en) * 2020-09-01 2020-12-01 清华大学 Multi-mode data association method and device
CN112035670B (en) * 2020-09-09 2021-05-14 中国科学技术大学 Multi-modal rumor detection method based on image emotional tendency
CN112035669A (en) * 2020-09-09 2020-12-04 中国科学技术大学 Social media multi-modal rumor detection method based on propagation heterogeneous graph modeling
CN112035670A (en) * 2020-09-09 2020-12-04 中国科学技术大学 Multi-modal rumor detection method based on image emotional tendency
CN112528015B (en) * 2020-10-26 2022-11-18 复旦大学 Method and device for judging rumor in message interactive transmission
CN112528015A (en) * 2020-10-26 2021-03-19 复旦大学 Method and device for judging rumor in message interactive transmission
CN112199606A (en) * 2020-10-30 2021-01-08 福州大学 Social media-oriented rumor detection system based on hierarchical user representation
CN112200197A (en) * 2020-11-10 2021-01-08 天津大学 Rumor detection method based on deep learning and multi-mode
CN112926569A (en) * 2021-03-16 2021-06-08 重庆邮电大学 Method for detecting natural scene image text in social network
CN113239730A (en) * 2021-04-09 2021-08-10 哈尔滨工业大学 Method for automatically eliminating structural false modal parameters based on computer vision
CN113239730B (en) * 2021-04-09 2022-04-05 哈尔滨工业大学 Method for automatically eliminating structural false modal parameters based on computer vision
CN113469214A (en) * 2021-05-20 2021-10-01 中国科学院自动化研究所 False news detection method and device, electronic equipment and storage medium
CN113221872B (en) * 2021-05-28 2022-09-20 北京理工大学 False news detection method for generating convergence of countermeasure network and multi-mode
CN113221872A (en) * 2021-05-28 2021-08-06 北京理工大学 False news detection method for generating convergence of countermeasure network and multi-mode
CN113239926B (en) * 2021-06-17 2022-10-25 北京邮电大学 Multi-modal false information detection model system based on countermeasure
CN113239926A (en) * 2021-06-17 2021-08-10 北京邮电大学 Multi-modal false information detection model based on countermeasures
CN113434684A (en) * 2021-07-01 2021-09-24 北京中科研究院 Rumor detection method, system, equipment and storage medium for self-supervision learning
CN113743522A (en) * 2021-09-13 2021-12-03 五八同城信息技术有限公司 Detection method and device for illegal behavior and electronic equipment
CN113822224A (en) * 2021-10-12 2021-12-21 中国人民解放军国防科技大学 Rumor detection method and device integrating multi-modal learning and multi-granularity structure learning
CN113688955A (en) * 2021-10-25 2021-11-23 北京世纪好未来教育科技有限公司 Text recognition method, device, equipment and medium
CN114417001A (en) * 2022-03-29 2022-04-29 山东大学 Chinese writing intelligent analysis method, system and medium based on multi-mode
CN114417001B (en) * 2022-03-29 2022-07-01 山东大学 Chinese writing intelligent analysis method, system and medium based on multi-mode
CN115809327A (en) * 2023-02-08 2023-03-17 四川大学 Real-time social network rumor detection method for multi-mode fusion and topics
CN115809327B (en) * 2023-02-08 2023-05-05 四川大学 Real-time social network rumor detection method based on multimode fusion and topics
CN117574261A (en) * 2023-10-19 2024-02-20 重庆理工大学 Multi-field false news reader cognition detection method
CN117574261B (en) * 2023-10-19 2024-06-21 重庆理工大学 Multi-field false news reader cognition detection method

Also Published As

Publication number Publication date
CN111079444B (en) 2020-09-29

Similar Documents

Publication Publication Date Title
CN111079444B (en) Network rumor detection method based on multi-modal relationship
Kumar et al. Sentiment analysis of multimodal twitter data
Li et al. Visual to text: Survey of image and video captioning
Kumar et al. Identifying clickbait: A multi-strategy approach using neural networks
Roy et al. Automated detection of substance use-related social media posts based on image and text analysis
Salur et al. A soft voting ensemble learning-based approach for multimodal sentiment analysis
Lin et al. Detecting multimedia generated by large ai models: A survey
CN111783903A (en) Text processing method, text model processing method and device and computer equipment
CN116955707A (en) Content tag determination method, device, equipment, medium and program product
Liu et al. Correlation identification in multimodal weibo via back propagation neural network with genetic algorithm
Peng et al. An effective strategy for multi-modal fake news detection
CN117251551A (en) Natural language processing system and method based on large language model
Illendula et al. Which emoji talks best for my picture?
CN117131923A (en) Back door attack method and related device for cross-modal learning
CN116756363A (en) Strong-correlation non-supervision cross-modal retrieval method guided by information quantity
Kim et al. A deep learning approach for identifying user interest from targeted advertising
Wieczorek et al. Semantic Image-Based Profiling of Users' Interests with Neural Networks
CN113283535B (en) False message detection method and device integrating multi-mode characteristics
Shan et al. Multimodal Social Media Fake News Detection Based on Similarity Inference and Adversarial Networks.
CN114443916A (en) Supply and demand matching method and system for test data
CN114817697A (en) Method and device for determining label information, electronic equipment and storage medium
CN117972497B (en) False information detection method and system based on multi-view feature decomposition
Neela et al. An Ensemble Learning Frame Work for Robust Fake News Detection
Li Multimodal visual pattern mining with convolutional neural networks
CN116522895B (en) Text content authenticity assessment method and device based on writing style

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant