CN111079444A - Network rumor detection method based on multi-modal relationship - Google Patents
Network rumor detection method based on multi-modal relationship Download PDFInfo
- Publication number
- CN111079444A CN111079444A CN201911379313.1A CN201911379313A CN111079444A CN 111079444 A CN111079444 A CN 111079444A CN 201911379313 A CN201911379313 A CN 201911379313A CN 111079444 A CN111079444 A CN 111079444A
- Authority
- CN
- China
- Prior art keywords
- vector
- semantic
- information
- visual feature
- feature vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 26
- 239000013598 vector Substances 0.000 claims abstract description 156
- 230000000007 visual effect Effects 0.000 claims abstract description 80
- 238000000034 method Methods 0.000 claims abstract description 26
- 230000007246 mechanism Effects 0.000 claims abstract description 21
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 230000008569 process Effects 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 5
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 4
- 125000004432 carbon atom Chemical group C* 0.000 claims description 2
- 230000014509 gene expression Effects 0.000 claims description 2
- 238000012549 training Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 6
- 230000004927 fusion Effects 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 235000008694 Humulus lupulus Nutrition 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000009835 boiling Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000009849 deactivation Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
- G06V10/464—Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Business, Economics & Management (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a network rumor detection method based on a multi-modal relationship, which comprises the following steps: acquiring an image to be detected and a related text which are issued on a network platform; extracting visual feature vectors containing different classes of objects in the image through a pre-training faster-CNN model; after preprocessing the text, extracting semantic vectors through GRU; capturing the importance degree of the visual feature vector and the semantic vector through an attention mechanism, and realizing cross-mode association between the image and the text, so as to update the visual feature vector and the semantic vector; and for the visual feature vector and the semantic vector, the relationship of internal dynamic information is respectively modeled through an attention mechanism, so that the visual feature vector and the semantic vector are updated; and connecting the visual feature vector obtained by updating the two parts with the semantic vector, and obtaining the probability that the information to be detected is the rumor and the real category through a second classifier. The method can automatically judge whether the information to be detected belongs to the network rumor or, and has high detection accuracy.
Description
Technical Field
The invention relates to the technical field of network space security, in particular to a network rumor detection method based on multi-modal relations.
Background
The rise of the network society enables opportunities and challenges to coexist, especially, the stability of network space is seriously influenced by the low admission threshold of internet access and the freedom of information dissemination, and the wantonly dissemination of network rumors is one of the problems which must be regarded as important. Today's social networking platform users are already broken by hundreds of millions, highly active, widely spread, quickly spread, widely used, free of time-space constraints, and their magnifier features magnify the impact of information at multiples, especially with sensitive topics, focus events, hot spot problems, major public events, emergencies well known between days, or cause loss of trust, impaired government, corporate image, complaints boiling, so automatic and rapid detection of network rumor is of great importance to cyberspace security.
With the development of multimedia technology, both self-media and professional media start to shift to multimedia news forms based on pictures, texts and short videos. Multimedia content carries richer and more intuitive information, can better describe news events, and is more easily and widely disseminated. Studies have shown that the average number of hops for media with streaming images is 11 times that of plain text. As such, false news or rumors often use highly aggressive pictures to attract and mislead readers, spreading quickly and widely, which has made detection of visual modal content a non-negligible part of dealing with network rumor challenges.
The traditional work of false content detection based on visual modal content mainly utilizes traditional manual characteristics, such as visual definition, visual similarity histogram, double JPEG (joint photographic experts group) compression traces and the like, which usually have good effect on rough picture tampering, but with the continuous improvement of picture generation technology, the methods can not ensure precision and also obviously improve the resource cost requirement.
In recent years, with the rapid development of neural networks and deep learning models, corresponding detection technologies have come into play and have achieved great success. In the false information detection, a multi-mode detection method for distinguishing the authenticity of news by simultaneously utilizing text and visual Morita information is also generated. In the prior work, representative examples include: attRNN, EANN and MVAE. These methods, while providing a heuristic approach in the detection of spurious information in a multimodal form, still have significant drawbacks. Firstly, the extraction process of the image and the text information is still rough, especially the semantic features of the picture; and secondly, in the feature fusion stage, the features of the two modes are simply spliced, so that the interaction and the association among the modes are difficult to express.
Disclosure of Invention
The invention aims to provide a network rumor detection method based on a multi-modal relationship, which can automatically judge whether information to be detected belongs to the network rumor and has higher detection accuracy.
The purpose of the invention is realized by the following technical scheme:
a network rumor detection method based on multi-modal relations comprises the following steps:
acquiring information to be detected, including images and related texts, issued on a network platform;
for the image, extracting visual characteristic vectors containing objects of different classes in the image through a pre-trained fasterR-CNN model;
for the text, after preprocessing, extracting semantic vectors through a gate control circulation unit;
capturing the importance degree of the visual feature vector and the semantic vector through an attention mechanism, and realizing cross-mode association between the image and the text, so as to update the visual feature vector and the semantic vector; based on the updated visual feature vector and semantic vector, the relationship of the internal dynamic information is respectively modeled through an attention mechanism, so that the visual feature vector and the semantic vector are updated again; and connecting the visual feature vector obtained by updating again with the semantic vector, and obtaining the probability that the information to be detected is the rumor category and the real category through a two-classifier.
According to the technical scheme provided by the invention, the text information and the image information are inspected simultaneously by using multi-mode feature fusion, so that the accuracy is higher; meanwhile, the method is different from other multi-mode methods using attention mechanisms, and gives consideration to information in the modes at the same time, so that the model can integrate richer information relation. The method can obtain accurate detection results only by using single information as input, and can quickly detect and process at the initial stage of rumor propagation.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a schematic diagram of a model structure of a network rumor detection method based on a multi-modal relationship according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a network rumor detection method based on a multi-modal relationship. In the feature extraction stage, the image features are extracted by using a fast R-CNN-based target detection model, and specific targets and salient regions in the image can be noted. In the feature fusion stage, different from the prior art that attention is paid to the relation between images and texts, the method applies an attention mechanism to information in the same modality, and the advantage is that the related information in the modality can supplement the information between the modalities. The method provided by the invention has a good effect on the Weibo RumorSet data set, and can find a false information case which is difficult to distinguish by using a single mode in the traditional scheme.
As shown in fig. 1, a schematic model structure diagram of a network rumor detection method based on multi-modal relationships according to an embodiment of the present invention mainly includes the following five parts:
1. multimodal data acquisition.
In the embodiment of the invention, the information to be detected, including images and related texts, issued on the network platform is acquired.
Illustratively, the retrieval may be by a social media platform, e.g., a micro-blogging platform.
In the embodiment of the invention, the text contained in the information to be detected and the text attached when other users forward the information to be detected. For example, the microblog information acquired from the microblog platform includes, in addition to the text including the microblog information itself, text attached to the transfer of the microblog information by another user.
2. And (5) extracting visual features.
In the embodiment of the invention, for the image, the Visual feature vectors containing different classes of objects in the image are extracted through a faster-CNN model pre-trained on Visual Genome.
The fast R-CNN model is a classical model commonly used in target detection, and for a given picture I, the model can output target level information in the picture, namely visual feature vectors V ═ V of objects of different classes1,v2,…,vKIn which v isiThe visual feature vector representing an object, i ═ 1,2, …, K, represents the total number of feature vectors (here 36). Illustratively, the visual feature vector V may be a K × 2048 dimensional visual feature matrix. Compared with the traditional picture feature extraction mode, the method provided by the embodiment of the invention is more concentrated on the target or other salient regions of the image.
3. And preprocessing the text and extracting the features.
In the embodiment of the invention, after the text is preprocessed, the semantic vector is extracted through the gate control circulation unit. It should be noted that the text in the dashed box at the lower left corner of fig. 1 is only schematic.
1) And (4) preprocessing.
For text, due to the complexity and disorder of social media information, many useless redundant information such as symbolic expressions, special characters, URLs (uniform resource locators) and the like are generated, and thus preprocessing is required. Specifically, all the redundant information such as the URL, the special characters, the emoticons and the like is selected to be ignored, only the residual character information is reserved, and then the residual character information is spliced into a text sequence, and separators are used as identifiers in splicing gaps. For example, after the text in the microblog is preprocessed, only the remaining text information is retained, and then the remaining text information of the source microblog and the remaining text information of the subsequent forwarding microblog are sequentially spliced into a sequence L.
2) And (5) feature extraction.
Statistically, 98% of the texts in the data set are not longer than 150 characters after preprocessing, so that for the purpose of computational efficiency limitation, a section of text L contains 150 words at most, and the excessive words are discarded and insufficient completion is performed. And then, performing word feature vectorization by using pre-trained GLOVE (pre-trained in Wikipedia in Chinese), expressing the preprocessed text in a matrix form, and performing feature extraction by using a Gated Round Unit (GRU) to obtain a semantic vector E.
Illustratively, the vectorization of word features is represented as a 150 × 300 matrix, with hidden state size 512 for GRU feature extraction.
4. And (5) feature fusion.
1) Information circulation and interaction among the modalities.
In the embodiment of the invention, the importance degree of the visual feature vector and the semantic vector is captured through an attention mechanism, and cross-modal association between the image and the text is realized, so that the visual feature vector and the semantic vector are updated; specifically, the method comprises the following steps: the visual feature vector and the semantic vector are respectively used as modal information, for the information between the modalities, the importance degree of each (visual feature vector and semantic vector) pair (any pair of the visual feature vector and the semantic vector) is extracted through an attention mechanism, the information between different modalities flows according to the importance degree so as to update the information of each modality, and the cross-modal association between the image and the text is realized through an information flowing process. The operation process is as follows:
respectively carrying out linear transformation on the visual characteristic vector V and the semantic vector E to obtain a k value, a q value and a V value required by an attention mechanism, and then obtaining inter-modal attention weight through vector inner product:
wherein E represents a semantic vector, and V represents a visual feature vector; q. q.sV、kVQ, k values, q representing the visual feature vector VE、kERepresenting the q and k values of the semantic vector E, and dim represents the vector dimension; interattE→V、InterAttV→EThe attention weight from the semantic vector to the visual feature vector and the attention weight from the visual feature vector to the semantic vector are sequentially represented, and the two bidirectional matrixes contain important information between paired image areas and words.
As will be understood by those skilled in the art, the k, q, and v values are inherent variables in the attention mechanism, respectively, key, query, value; in brief, the attention mechanism calculates similarity by using query of the source end and key of the target end and normalizes to obtain attention weight, and then multiplies value of the target end to obtain an attention update vector of the target end to a source end vector, so that information flows among different modes:
V′=InterAttE→V×vE
E′=InterAttV→E×vV
wherein v isE、vVAnd respectively representing the V values of the semantic vector E and the visual characteristic vector V.
Then, the updated visual feature vector V 'and semantic feature vector E' are connected with the original visual feature vector V and semantic vector E in series through a full connection layer to obtain the visual feature vector V*And semantic feature vector E*And then input to a subsequent intra-modality association module to further learn information flow within the modality.
2) Dynamic information relationship modeling within a modality.
For input visual feature vector V*And semantic vector E*And respectively modeling the relationship of internal dynamic information through an attention mechanism to serve as supplementary information of cross-modal association, and updating the visual feature vector and the semantic vector.
In addition, the association of information within a modality should also utilize information of another modality as a condition, for example, different associations should be made between image regions according to different word phrases. For this purpose,firstly, pooling visual feature vectors and semantic vectors respectively, affine transforming the visual feature vectors and the semantic vectors to the dimensions same as k values, q values and v values, and then calculating a channel type conditional gate vector MV→E,ME→VIntroducing another modality information:
MV→E=Sigmoid(Linear(V* pool))
ME→V=Sigmoid(Linear(E* pool))
wherein, Linear (V)* pool) And Linear (E)* pool) Are respectively visual feature vector V*And semantic vector E*Performing pooling and affine transformation results; sigmoid denotes Sigmoid function.
Next, two channel condition gate vectors modulate k and q values of two modes, which are to be activated or deactivated by a channel condition gate of another mode, and the updated k and q values are:
in the above formula, the first and second carbon atoms are,representing updated visual feature vectorsThe values of q and k of (a),representing updated semantic vectorsQ, k values of (1);visual feature vector V representing input*The values of q and k of (a),semantic vector E representing input*Q, k values of (1).
Those skilled in the art will understand that the result M of sigmoid is in the (-1,1) interval, and the value of 1+ M is in the (0,2) interval, and then multiplied by the original q value and k value point to play the role of similar scaling, i.e. corresponding to "activation or deactivation". The updated q value and k value refer to the q value and k value when the information of the other modality is taken as the condition information, and are equivalent to the introduction of the information of the other modality.
After the updated k value and q value are obtained, generating weights by using an attention mechanism, and updating the visual feature vector and the semantic vector, wherein the definition is as follows:
wherein, IntraAttV→V、IntraAttE→ESequentially represents the attention weight inside the visual feature vector and the attention weight inside the semantic vector,respectively, the input visual feature vector V*Semantic vector E*The updated visual feature vector and semantic vector are respectively
In the specific implementation process, information circulation and interaction among the modes and dynamic information relation modeling in the modes can be realized by one sub-module respectively, the two sub-modules form one basic module, the three basic modules are stacked to obtain final visual and semantic vectors, and finally, the visual feature vectors and the semantic vectors are point-multiplied together to obtain final fusion feature vectors (multi-mode feature information).
5. And outputting the classification.
The network rumor detection problem is regarded as a classification problem, the finally fused multi-modal characteristic information is input into a multi-layer perceptron to serve as a second classifier, and the probability that the information to be detected is a rumor class and a real class is obtained through a Softmax function.
In the embodiment of the invention, the whole method is regarded as a model, the loss function of the whole model in the training process can use a cross entropy loss function, and the classifier can distinguish the rumor category from the real category through the characteristics of multi-modal characteristic information through training.
The probability of the rumor category and the real category can be obtained and then the final detection result can be determined in a conventional manner, for example, the final detection result is judged by a set threshold, and since there are only two categories, when the probability of a certain category is greater than 0.5, the detection result can be judged to belong to the category. Certainly, a higher threshold may be set for obtaining a greater confidence, for example, in a certain example, the probability of the rumor and the real two categories is (0.99, 0.01), that is, the probability of the rumor is 99%, the probability of the real category is 1%, and the probability of the rumor category is greater than the set threshold (e.g., 90%), then the higher confidence may be that the information to be detected is the rumor. Of course, the specific value for the threshold value can be set by the skilled person according to actual conditions or experience.
In the model shown in fig. 1, the loss function may use a cross-entropy loss function during the training process. The data set may use a weibo rumor set. The data set data is collected on a microblog platform, and the specific quantity distribution is as follows:
number of samples | Including the number of pictures | |
Real data | 4779 | 5318 |
Rumor data | 4748 | 7954 |
TABLE 1 data set distribution
The scheme of the embodiment of the invention has good effect in the data set shown in the table 1, and can find the false information case which is difficult to distinguish by using a single mode in the traditional scheme.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (7)
1. A network rumor detection method based on multi-modal relations is characterized by comprising the following steps:
acquiring information to be detected, including images and related texts, issued on a network platform;
for the image, extracting visual characteristic vectors containing objects of different classes in the image through a pre-trained fasterR-CNN model;
for the text, after preprocessing, extracting semantic vectors through a gate control circulation unit;
capturing the importance degree of the visual feature vector and the semantic vector through an attention mechanism, and realizing cross-mode association between the image and the text, so as to update the visual feature vector and the semantic vector; based on the updated visual feature vector and semantic vector, the relationship of the internal dynamic information is respectively modeled through an attention mechanism, so that the visual feature vector and the semantic vector are updated again; and connecting the visual feature vector obtained by updating again with the semantic vector, and obtaining the probability that the information to be detected is the rumor category and the real category through a two-classifier.
2. The method of claim 1, wherein the visual feature vectors containing different types of objects are expressed as V ═ V { V } V1,v2,…,vKIn which v isiVisual characteristic vector representing an object, K tableThe total number of feature vectors, i ═ 1,2, …, K.
3. The method of claim 1, wherein the associated text comprises: the text contained in the information to be detected and the text attached when other users forward the information to be detected.
4. The method of claim 2, wherein preprocessing the text comprises: removing redundant information in the text, only reserving character information, splicing the character information into a text sequence, and using separators as identifiers in splicing gaps; the redundant information includes at least one or more of the following information: symbolic expressions, special characters, uniform resource locators.
5. The method for detecting network rumors based on multi-modal relationships according to claim 1, wherein before the semantic vector is extracted through the gated-cycle unit, word features are vectorized by using pre-trained GLOVE, the preprocessed text is expressed in a matrix form, and feature extraction is performed by using the gated-cycle unit, so as to obtain the semantic vector.
6. The method of claim 1, wherein the updating the visual feature vectors and the semantic vectors by capturing importance of the visual feature vectors and the semantic vectors and implementing cross-modal association between images and texts through an attention mechanism comprises:
the visual feature vector and the semantic vector are respectively used as modal information, the importance degree of each (visual feature vector and semantic vector) pair is extracted through an attention mechanism, the flow between different modal information is realized according to the importance degree so as to update the modal information, and the cross-modal association between the image and the text is realized through an information flow process; the operation process is as follows:
respectively carrying out linear transformation on the visual characteristic vector and the semantic vector to obtain a k value, a q value and a v value required by an attention mechanism, and then obtaining inter-modal attention weight through vector inner product:
wherein E represents a semantic vector, and V represents a visual feature vector; the k value, the q value and the v value are inherent variables in the attention mechanism and are respectively a key value, a query value and a context vector; q. q.sV、kVQ, k values, q representing the visual feature vector VE、kERepresenting the q and k values of the semantic vector E, and dim represents the vector dimension; interattE→V、InterAttV→ESequentially representing the attention weight from the semantic vector to the visual characteristic matrix and the attention weight from the visual characteristic matrix to the semantic vector;
and then updating the feature vector of the modal information by using other modal information according to the attention weight, so as to realize the information flow among different modes:
V′=InterE→V×V
E′=InterV→E×E
wherein v isE、vVRespectively representing the V values of the semantic vector E and the visual characteristic vector V;
then the updated visual feature vector V 'and semantic feature vector E' are connected with the original visual feature vector V and semantic vector E in series through a full connection layer to obtain the visual feature vector V*And semantic feature vector E*。
7. The method of claim 6, wherein the step of updating the visual feature vector and the semantic vector again by using an attention mechanism based on the updated visual feature vector and the semantic vector to respectively model the relationship of the internal dynamic information comprises:
to visual feature vector V*And semantic vector E*Pooling and affine transformation are respectively carried out to the dimensions of the k value, the q value and the v value, and then a channel type conditional gate vector M is calculatedV→E,ME→VIntroducing another modality information:
MV→E=Sigmoid(Linear(V* pool))
ME→V=Sigmoid(Linear(E* pool))
wherein, Linear (V)* pool) And Linear (E)* pool) Respectively, are visual feature vectors V*And semantic vector E*Performing pooling and affine transformation results; sigmoid represents Sigmoid function;
the two channel condition gate vectors modulate k values and q values of two modes, the k values and the q values are activated or deactivated through channel condition gates of other modes, and the updated k values and q values are as follows:
in the above formula, the first and second carbon atoms are,representing updated visual feature vectorsThe values of q and k of (a),representing updated semantic vectorsQ, k values of (1);visual feature vector V representing input*The values of q and k of (a),semantic vector E representing input*Q, k values of (1);
after the updated k value and q value are obtained, generating weights by using an attention mechanism, and updating respective internal dynamic information of the visual feature vector and the semantic vector:
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911357589X | 2019-12-25 | ||
CN201911357589 | 2019-12-25 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111079444A true CN111079444A (en) | 2020-04-28 |
CN111079444B CN111079444B (en) | 2020-09-29 |
Family
ID=70318707
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911379313.1A Active CN111079444B (en) | 2019-12-25 | 2019-12-27 | Network rumor detection method based on multi-modal relationship |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111079444B (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111582587A (en) * | 2020-05-11 | 2020-08-25 | 深圳赋乐科技有限公司 | Prediction method and prediction system for video public sentiment |
CN111611981A (en) * | 2020-06-28 | 2020-09-01 | 腾讯科技(深圳)有限公司 | Information identification method and device and information identification neural network training method and device |
CN111797326A (en) * | 2020-05-27 | 2020-10-20 | 中国科学院计算技术研究所 | False news detection method and system fusing multi-scale visual information |
CN111967277A (en) * | 2020-08-14 | 2020-11-20 | 厦门大学 | Translation method based on multi-modal machine translation model |
CN111985369A (en) * | 2020-08-07 | 2020-11-24 | 西北工业大学 | Course field multi-modal document classification method based on cross-modal attention convolution neural network |
CN112015955A (en) * | 2020-09-01 | 2020-12-01 | 清华大学 | Multi-mode data association method and device |
CN112035670A (en) * | 2020-09-09 | 2020-12-04 | 中国科学技术大学 | Multi-modal rumor detection method based on image emotional tendency |
CN112035669A (en) * | 2020-09-09 | 2020-12-04 | 中国科学技术大学 | Social media multi-modal rumor detection method based on propagation heterogeneous graph modeling |
CN112199606A (en) * | 2020-10-30 | 2021-01-08 | 福州大学 | Social media-oriented rumor detection system based on hierarchical user representation |
CN112200197A (en) * | 2020-11-10 | 2021-01-08 | 天津大学 | Rumor detection method based on deep learning and multi-mode |
CN112528015A (en) * | 2020-10-26 | 2021-03-19 | 复旦大学 | Method and device for judging rumor in message interactive transmission |
CN112926569A (en) * | 2021-03-16 | 2021-06-08 | 重庆邮电大学 | Method for detecting natural scene image text in social network |
CN113221872A (en) * | 2021-05-28 | 2021-08-06 | 北京理工大学 | False news detection method for generating convergence of countermeasure network and multi-mode |
CN113239730A (en) * | 2021-04-09 | 2021-08-10 | 哈尔滨工业大学 | Method for automatically eliminating structural false modal parameters based on computer vision |
CN113239926A (en) * | 2021-06-17 | 2021-08-10 | 北京邮电大学 | Multi-modal false information detection model based on countermeasures |
CN113434684A (en) * | 2021-07-01 | 2021-09-24 | 北京中科研究院 | Rumor detection method, system, equipment and storage medium for self-supervision learning |
CN113469214A (en) * | 2021-05-20 | 2021-10-01 | 中国科学院自动化研究所 | False news detection method and device, electronic equipment and storage medium |
CN113688955A (en) * | 2021-10-25 | 2021-11-23 | 北京世纪好未来教育科技有限公司 | Text recognition method, device, equipment and medium |
CN113743522A (en) * | 2021-09-13 | 2021-12-03 | 五八同城信息技术有限公司 | Detection method and device for illegal behavior and electronic equipment |
CN113822224A (en) * | 2021-10-12 | 2021-12-21 | 中国人民解放军国防科技大学 | Rumor detection method and device integrating multi-modal learning and multi-granularity structure learning |
CN114417001A (en) * | 2022-03-29 | 2022-04-29 | 山东大学 | Chinese writing intelligent analysis method, system and medium based on multi-mode |
CN115809327A (en) * | 2023-02-08 | 2023-03-17 | 四川大学 | Real-time social network rumor detection method for multi-mode fusion and topics |
CN117574261A (en) * | 2023-10-19 | 2024-02-20 | 重庆理工大学 | Multi-field false news reader cognition detection method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902621A (en) * | 2012-12-28 | 2014-07-02 | 深圳先进技术研究院 | Method and device for identifying network rumor |
CN105045857A (en) * | 2015-07-09 | 2015-11-11 | 中国科学院计算技术研究所 | Social network rumor recognition method and system |
CN109241379A (en) * | 2017-07-11 | 2019-01-18 | 北京交通大学 | A method of across Modal detection network navy |
CN110019812A (en) * | 2018-02-27 | 2019-07-16 | 中国科学院计算技术研究所 | A kind of user is from production content detection algorithm and system |
-
2019
- 2019-12-27 CN CN201911379313.1A patent/CN111079444B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902621A (en) * | 2012-12-28 | 2014-07-02 | 深圳先进技术研究院 | Method and device for identifying network rumor |
CN105045857A (en) * | 2015-07-09 | 2015-11-11 | 中国科学院计算技术研究所 | Social network rumor recognition method and system |
CN109241379A (en) * | 2017-07-11 | 2019-01-18 | 北京交通大学 | A method of across Modal detection network navy |
CN110019812A (en) * | 2018-02-27 | 2019-07-16 | 中国科学院计算技术研究所 | A kind of user is from production content detection algorithm and system |
Non-Patent Citations (1)
Title |
---|
石磊 等: "结合自注意力机制和Tree-LSTM的情感分析模型", 《小型微型计算机系统》 * |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111582587B (en) * | 2020-05-11 | 2021-06-04 | 深圳赋乐科技有限公司 | Prediction method and prediction system for video public sentiment |
CN111582587A (en) * | 2020-05-11 | 2020-08-25 | 深圳赋乐科技有限公司 | Prediction method and prediction system for video public sentiment |
CN111797326A (en) * | 2020-05-27 | 2020-10-20 | 中国科学院计算技术研究所 | False news detection method and system fusing multi-scale visual information |
CN111797326B (en) * | 2020-05-27 | 2023-05-12 | 中国科学院计算技术研究所 | False news detection method and system integrating multi-scale visual information |
CN111611981A (en) * | 2020-06-28 | 2020-09-01 | 腾讯科技(深圳)有限公司 | Information identification method and device and information identification neural network training method and device |
CN111985369A (en) * | 2020-08-07 | 2020-11-24 | 西北工业大学 | Course field multi-modal document classification method based on cross-modal attention convolution neural network |
CN111985369B (en) * | 2020-08-07 | 2021-09-17 | 西北工业大学 | Course field multi-modal document classification method based on cross-modal attention convolution neural network |
CN111967277B (en) * | 2020-08-14 | 2022-07-19 | 厦门大学 | Translation method based on multi-modal machine translation model |
CN111967277A (en) * | 2020-08-14 | 2020-11-20 | 厦门大学 | Translation method based on multi-modal machine translation model |
CN112015955B (en) * | 2020-09-01 | 2021-07-30 | 清华大学 | Multi-mode data association method and device |
CN112015955A (en) * | 2020-09-01 | 2020-12-01 | 清华大学 | Multi-mode data association method and device |
CN112035670B (en) * | 2020-09-09 | 2021-05-14 | 中国科学技术大学 | Multi-modal rumor detection method based on image emotional tendency |
CN112035669A (en) * | 2020-09-09 | 2020-12-04 | 中国科学技术大学 | Social media multi-modal rumor detection method based on propagation heterogeneous graph modeling |
CN112035670A (en) * | 2020-09-09 | 2020-12-04 | 中国科学技术大学 | Multi-modal rumor detection method based on image emotional tendency |
CN112528015B (en) * | 2020-10-26 | 2022-11-18 | 复旦大学 | Method and device for judging rumor in message interactive transmission |
CN112528015A (en) * | 2020-10-26 | 2021-03-19 | 复旦大学 | Method and device for judging rumor in message interactive transmission |
CN112199606A (en) * | 2020-10-30 | 2021-01-08 | 福州大学 | Social media-oriented rumor detection system based on hierarchical user representation |
CN112200197A (en) * | 2020-11-10 | 2021-01-08 | 天津大学 | Rumor detection method based on deep learning and multi-mode |
CN112926569A (en) * | 2021-03-16 | 2021-06-08 | 重庆邮电大学 | Method for detecting natural scene image text in social network |
CN113239730A (en) * | 2021-04-09 | 2021-08-10 | 哈尔滨工业大学 | Method for automatically eliminating structural false modal parameters based on computer vision |
CN113239730B (en) * | 2021-04-09 | 2022-04-05 | 哈尔滨工业大学 | Method for automatically eliminating structural false modal parameters based on computer vision |
CN113469214A (en) * | 2021-05-20 | 2021-10-01 | 中国科学院自动化研究所 | False news detection method and device, electronic equipment and storage medium |
CN113221872B (en) * | 2021-05-28 | 2022-09-20 | 北京理工大学 | False news detection method for generating convergence of countermeasure network and multi-mode |
CN113221872A (en) * | 2021-05-28 | 2021-08-06 | 北京理工大学 | False news detection method for generating convergence of countermeasure network and multi-mode |
CN113239926B (en) * | 2021-06-17 | 2022-10-25 | 北京邮电大学 | Multi-modal false information detection model system based on countermeasure |
CN113239926A (en) * | 2021-06-17 | 2021-08-10 | 北京邮电大学 | Multi-modal false information detection model based on countermeasures |
CN113434684A (en) * | 2021-07-01 | 2021-09-24 | 北京中科研究院 | Rumor detection method, system, equipment and storage medium for self-supervision learning |
CN113743522A (en) * | 2021-09-13 | 2021-12-03 | 五八同城信息技术有限公司 | Detection method and device for illegal behavior and electronic equipment |
CN113822224A (en) * | 2021-10-12 | 2021-12-21 | 中国人民解放军国防科技大学 | Rumor detection method and device integrating multi-modal learning and multi-granularity structure learning |
CN113688955A (en) * | 2021-10-25 | 2021-11-23 | 北京世纪好未来教育科技有限公司 | Text recognition method, device, equipment and medium |
CN114417001A (en) * | 2022-03-29 | 2022-04-29 | 山东大学 | Chinese writing intelligent analysis method, system and medium based on multi-mode |
CN114417001B (en) * | 2022-03-29 | 2022-07-01 | 山东大学 | Chinese writing intelligent analysis method, system and medium based on multi-mode |
CN115809327A (en) * | 2023-02-08 | 2023-03-17 | 四川大学 | Real-time social network rumor detection method for multi-mode fusion and topics |
CN115809327B (en) * | 2023-02-08 | 2023-05-05 | 四川大学 | Real-time social network rumor detection method based on multimode fusion and topics |
CN117574261A (en) * | 2023-10-19 | 2024-02-20 | 重庆理工大学 | Multi-field false news reader cognition detection method |
CN117574261B (en) * | 2023-10-19 | 2024-06-21 | 重庆理工大学 | Multi-field false news reader cognition detection method |
Also Published As
Publication number | Publication date |
---|---|
CN111079444B (en) | 2020-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111079444B (en) | Network rumor detection method based on multi-modal relationship | |
Kumar et al. | Sentiment analysis of multimodal twitter data | |
Li et al. | Visual to text: Survey of image and video captioning | |
Kumar et al. | Identifying clickbait: A multi-strategy approach using neural networks | |
Roy et al. | Automated detection of substance use-related social media posts based on image and text analysis | |
Salur et al. | A soft voting ensemble learning-based approach for multimodal sentiment analysis | |
Lin et al. | Detecting multimedia generated by large ai models: A survey | |
CN111783903A (en) | Text processing method, text model processing method and device and computer equipment | |
CN116955707A (en) | Content tag determination method, device, equipment, medium and program product | |
Liu et al. | Correlation identification in multimodal weibo via back propagation neural network with genetic algorithm | |
Peng et al. | An effective strategy for multi-modal fake news detection | |
CN117251551A (en) | Natural language processing system and method based on large language model | |
Illendula et al. | Which emoji talks best for my picture? | |
CN117131923A (en) | Back door attack method and related device for cross-modal learning | |
CN116756363A (en) | Strong-correlation non-supervision cross-modal retrieval method guided by information quantity | |
Kim et al. | A deep learning approach for identifying user interest from targeted advertising | |
Wieczorek et al. | Semantic Image-Based Profiling of Users' Interests with Neural Networks | |
CN113283535B (en) | False message detection method and device integrating multi-mode characteristics | |
Shan et al. | Multimodal Social Media Fake News Detection Based on Similarity Inference and Adversarial Networks. | |
CN114443916A (en) | Supply and demand matching method and system for test data | |
CN114817697A (en) | Method and device for determining label information, electronic equipment and storage medium | |
CN117972497B (en) | False information detection method and system based on multi-view feature decomposition | |
Neela et al. | An Ensemble Learning Frame Work for Robust Fake News Detection | |
Li | Multimodal visual pattern mining with convolutional neural networks | |
CN116522895B (en) | Text content authenticity assessment method and device based on writing style |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |