CN111079444B - Network rumor detection method based on multi-modal relationship - Google Patents

Network rumor detection method based on multi-modal relationship Download PDF

Info

Publication number
CN111079444B
CN111079444B CN201911379313.1A CN201911379313A CN111079444B CN 111079444 B CN111079444 B CN 111079444B CN 201911379313 A CN201911379313 A CN 201911379313A CN 111079444 B CN111079444 B CN 111079444B
Authority
CN
China
Prior art keywords
vector
visual feature
feature vector
information
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911379313.1A
Other languages
Chinese (zh)
Other versions
CN111079444A (en
Inventor
张勇东
毛震东
邓旭冉
赵博文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Research Institute
University of Science and Technology of China USTC
Original Assignee
Beijing Zhongke Research Institute
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Research Institute, University of Science and Technology of China USTC filed Critical Beijing Zhongke Research Institute
Publication of CN111079444A publication Critical patent/CN111079444A/en
Application granted granted Critical
Publication of CN111079444B publication Critical patent/CN111079444B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations

Abstract

The invention discloses a network rumor detection method based on a multi-modal relationship, which comprises the following steps: acquiring an image to be detected and a related text which are issued on a network platform; extracting visual feature vectors containing different classes of objects in the image through a pre-training faster-CNN model; after preprocessing the text, extracting semantic vectors through GRU; capturing the importance degree of the visual feature vector and the semantic vector through an attention mechanism, and realizing cross-mode association between the image and the text, so as to update the visual feature vector and the semantic vector; and for the visual feature vector and the semantic vector, the relationship of internal dynamic information is respectively modeled through an attention mechanism, so that the visual feature vector and the semantic vector are updated; and connecting the visual feature vector obtained by updating the two parts with the semantic vector, and obtaining the probability that the information to be detected is the rumor and the real category through a second classifier. The method can automatically judge whether the information to be detected belongs to the network rumor or, and has high detection accuracy.

Description

Network rumor detection method based on multi-modal relationship
Technical Field
The invention relates to the technical field of network space security, in particular to a network rumor detection method based on multi-modal relations.
Background
The rise of the network society enables opportunities and challenges to coexist, especially, the stability of network space is seriously influenced by the low admission threshold of internet access and the freedom of information dissemination, and the wantonly dissemination of network rumors is one of the problems which must be regarded as important. Today's social networking platform users are already broken by hundreds of millions, highly active, widely spread, quickly spread, widely used, free of time-space constraints, and their magnifier features magnify the impact of information at multiples, especially with sensitive topics, focus events, hot spot problems, major public events, emergencies well known between days, or cause loss of trust, impaired government, corporate image, complaints boiling, so automatic and rapid detection of network rumor is of great importance to cyberspace security.
With the development of multimedia technology, both self-media and professional media start to shift to multimedia news forms based on pictures, texts and short videos. Multimedia content carries richer and more intuitive information, can better describe news events, and is more easily and widely disseminated. Studies have shown that the average number of hops for media with streaming images is 11 times that of plain text. As such, false news or rumors often use highly aggressive pictures to attract and mislead readers, spreading quickly and widely, which has made detection of visual modal content a non-negligible part of dealing with network rumor challenges.
The traditional work of false content detection based on visual modal content mainly utilizes traditional manual characteristics, such as visual definition, visual similarity histogram, double JPEG (joint photographic experts group) compression traces and the like, which usually have good effect on rough picture tampering, but with the continuous improvement of picture generation technology, the methods can not ensure precision and also obviously improve the resource cost requirement.
In recent years, with the rapid development of neural networks and deep learning models, corresponding detection technologies have come into play and have achieved great success. In the false information detection, a multi-mode detection method for distinguishing the authenticity of news by simultaneously utilizing text and visual modal information is also generated. In the prior work, representative examples include: attRNN, EANN and MVAE. These methods, while providing a heuristic approach in the detection of spurious information in a multimodal form, still have significant drawbacks. Firstly, the extraction process of the image and the text information is still rough, especially the semantic features of the picture; and secondly, in the feature fusion stage, the features of the two modes are simply spliced, so that the interaction and the association among the modes are difficult to express.
Disclosure of Invention
The invention aims to provide a network rumor detection method based on a multi-modal relationship, which can automatically judge whether information to be detected belongs to the network rumor and has higher detection accuracy.
The purpose of the invention is realized by the following technical scheme:
a network rumor detection method based on multi-modal relations comprises the following steps:
acquiring information to be detected, including images and related texts, issued on a network platform;
for the image, extracting visual characteristic vectors containing objects of different classes in the image through a pre-trained fasterR-CNN model;
for the text, after preprocessing, extracting semantic vectors through a gate control circulation unit;
capturing the importance degree of the visual feature vector and the semantic vector through an attention mechanism, and realizing cross-mode association between the image and the text, so as to update the visual feature vector and the semantic vector; based on the updated visual feature vector and semantic vector, the relationship of the internal dynamic information is respectively modeled through an attention mechanism, so that the visual feature vector and the semantic vector are updated again; and connecting the visual feature vector obtained by updating again with the semantic vector, and obtaining the probability that the information to be detected is the rumor category and the real category through a two-classifier.
According to the technical scheme provided by the invention, the text information and the image information are inspected simultaneously by using multi-mode feature fusion, so that the accuracy is higher; meanwhile, the method is different from other multi-mode methods using attention mechanisms, and gives consideration to information in the modes at the same time, so that the model can integrate richer information relation. The method can obtain accurate detection results only by using single information as input, and can quickly detect and process at the initial stage of rumor propagation.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a schematic diagram of a model structure of a network rumor detection method based on a multi-modal relationship according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a network rumor detection method based on a multi-modal relationship. In the feature extraction stage, the image features are extracted by using a fast R-CNN-based target detection model, and specific targets and salient regions in the image can be noted. In the feature fusion stage, different from the prior art that attention is paid to the relation between images and texts, the method applies an attention mechanism to information in the same modality, and the advantage is that the related information in the modality can supplement the information between the modalities. The method provided by the invention has a good effect on the Weibo RumorSet data set, and can find a false information case which is difficult to distinguish by using a single mode in the traditional scheme.
As shown in fig. 1, a schematic model structure diagram of a network rumor detection method based on multi-modal relationships according to an embodiment of the present invention mainly includes the following five parts:
1. multimodal data acquisition.
In the embodiment of the invention, the information to be detected, including images and related texts, issued on the network platform is acquired.
Illustratively, the retrieval may be by a social media platform, e.g., a micro-blogging platform.
In the embodiment of the invention, the text contained in the information to be detected and the text attached when other users forward the information to be detected. For example, the microblog information acquired from the microblog platform includes, in addition to the text including the microblog information itself, text attached to the transfer of the microblog information by another user.
2. And (5) extracting visual features.
In the embodiment of the invention, for the image, the Visual feature vectors containing different classes of objects in the image are extracted through a faster-CNN model pre-trained on Visual Genome.
The fast R-CNN model is a classical model commonly used in target detection, and for a given picture I, the model can output target level information in the picture, namely visual feature vectors V ═ V of objects of different classes1,v2,…,vKIn which v isiThe visual feature vector representing an object, i-1, 2, …, K represents the total number of feature vectors (36 here), for example, the visual feature vector V may be a K × 2048 dimensional visual feature matrixThe method of (3) is more focused on objects or other salient regions of the image.
3. And preprocessing the text and extracting the features.
In the embodiment of the invention, after the text is preprocessed, the semantic vector is extracted through the gate control circulation unit. It should be noted that the text in the dashed box at the lower left corner of fig. 1 is only schematic.
1) And (4) preprocessing.
For text, due to the complexity and disorder of social media information, many useless redundant information such as symbolic expressions, special characters, URLs (uniform resource locators) and the like are generated, and thus preprocessing is required. Specifically, all the redundant information such as the URL, the special characters, the emoticons and the like is selected to be ignored, only the residual character information is reserved, and then the residual character information is spliced into a text sequence, and separators are used as identifiers in splicing gaps. For example, after the text in the microblog is preprocessed, only the remaining text information is retained, and then the remaining text information of the source microblog and the remaining text information of the subsequent forwarding microblog are sequentially spliced into a sequence L.
2) And (5) feature extraction.
Statistically, 98% of the texts in the data set are not longer than 150 characters after preprocessing, so that for the purpose of computational efficiency limitation, a section of text L contains 150 words at most, and the excessive words are discarded and insufficient completion is performed. And then, performing word feature vectorization by using pre-trained GLOVE (pre-trained in Wikipedia in Chinese), expressing the preprocessed text in a matrix form, and performing feature extraction by using a Gated Round Unit (GRU) to obtain a semantic vector E.
Illustratively, the vectorization of word features is represented as a 150 × 300 matrix, with hidden state size 512 for GRU feature extraction.
4. And (5) feature fusion.
1) Information circulation and interaction among the modalities.
In the embodiment of the invention, the importance degree of the visual feature vector and the semantic vector is captured through an attention mechanism, and cross-modal association between the image and the text is realized, so that the visual feature vector and the semantic vector are updated; specifically, the method comprises the following steps: the visual feature vector and the semantic vector are respectively used as modal information, for the information between the modalities, the importance degree of each (visual feature vector and semantic vector) pair (any pair of the visual feature vector and the semantic vector) is extracted through an attention mechanism, the information between different modalities flows according to the importance degree so as to update the information of each modality, and the cross-modal association between the image and the text is realized through an information flowing process. The operation process is as follows:
respectively carrying out linear transformation on the visual characteristic vector V and the semantic vector E to obtain a k value, a q value and a V value required by an attention mechanism, and then obtaining inter-modal attention weight through vector inner product:
Figure GDA0002628965730000051
Figure GDA0002628965730000052
wherein E represents a semantic vector, and V represents a visual feature vector; q. q.sV、kVQ, k values, q representing the visual feature vector VE、kERepresenting the q and k values of the semantic vector E, and dim represents the vector dimension; interattE→V、InterAttV→EThe attention weight from the semantic vector to the visual feature vector and the attention weight from the visual feature vector to the semantic vector are sequentially represented, and the two bidirectional matrixes contain important information between paired image areas and words.
As will be understood by those skilled in the art, the k, q, and v values are inherent variables in the attention mechanism, respectively, key, query, value; in brief, the attention mechanism calculates similarity by using query of the source end and key of the target end and normalizes to obtain attention weight, and then multiplies value of the target end to obtain an attention update vector of the target end to a source end vector, so that information flows among different modes:
V′=InterAttE→V×vE
E′=InterAttV→E×vV
wherein v isE、vvAnd respectively representing the V values of the semantic vector E and the visual characteristic vector V.
Then, the updated visual feature vector V 'and semantic feature vector E' are connected with the original visual feature vector V and semantic vector E in series through a full connection layer to obtain the visual feature vector V*And semantic feature vector E*And then input to a subsequent intra-modality association module to further learn information flow within the modality.
2) Dynamic information relationship modeling within a modality.
For input visual feature vector V*And semantic vector E*And respectively modeling the relationship of internal dynamic information through an attention mechanism to serve as supplementary information of cross-modal association, and updating the visual feature vector and the semantic vector.
In addition, the association of information within a modality should also utilize information of another modality as a condition, for example, different associations should be made between image regions according to different word phrases. For this purpose, firstly, the visual feature vector and the semantic vector are respectively pooled and affine transformed to the dimension same as the k value, the q value and the v value, and then the channel type conditional gate vector M is calculatedV→E,ME→VIntroducing another modality information:
MV→E=Sigmoid(Linear(V* pool))
ME→V=Sigmoid(Linear(E* pool))
wherein, Linear (V)* pool) And Linear (E)* pool) Are respectively visual feature vector V*And semantic vector E*Performing pooling and affine transformation results; sigmoid denotes Sigmoid function.
Next, two channel condition gate vectors modulate k and q values of two modes, which are to be activated or deactivated by a channel condition gate of another mode, and the updated k and q values are:
Figure GDA0002628965730000061
Figure GDA0002628965730000062
Figure GDA0002628965730000063
Figure GDA0002628965730000064
in the above formula, the first and second carbon atoms are,
Figure GDA0002628965730000065
representing the updated visual feature vector
Figure GDA0002628965730000066
The values of q and k of (a),
Figure GDA0002628965730000067
representing the semantic vector after the re-update
Figure GDA0002628965730000068
Q, k values of (1);
Figure GDA0002628965730000069
visual feature vector V representing input*The values of q and k of (a),
Figure GDA00026289657300000610
semantic vector E representing input*Q, k values of (1).
Those skilled in the art will understand that the result M of sigmoid is in the (-1,1) interval, and the value of 1+ M is in the (0,2) interval, and then multiplied by the original q value and k value point to play the role of similar scaling, i.e. corresponding to "activation or deactivation". The updated q value and k value refer to the q value and k value when the information of the other modality is taken as the condition information, and are equivalent to the introduction of the information of the other modality.
After the k value and the q value which are updated again are obtained, generating weights by using an attention mechanism, and updating the visual feature vector and the semantic vector, wherein the definition is as follows:
Figure GDA00026289657300000611
Figure GDA00026289657300000612
Figure GDA00026289657300000613
Figure GDA00026289657300000614
wherein, IntraAttV→V、IntraAttE→ESequentially represents the attention weight inside the visual feature vector and the attention weight inside the semantic vector,
Figure GDA00026289657300000615
respectively, the input visual feature vector V*Semantic vector E*The updated visual feature vector and semantic vector are respectively
Figure GDA00026289657300000616
And
Figure GDA00026289657300000617
in the specific implementation process, information circulation and interaction among the modes and dynamic information relation modeling in the modes can be realized by one sub-module respectively, the two sub-modules form one basic module, the three basic modules are stacked to obtain final visual and semantic vectors, and finally, the visual feature vectors and the semantic vectors are point-multiplied together to obtain final fusion feature vectors (multi-mode feature information).
5. And outputting the classification.
The network rumor detection problem is regarded as a classification problem, the finally fused multi-modal characteristic information is input into a multi-layer perceptron to serve as a second classifier, and the probability that the information to be detected is a rumor class and a real class is obtained through a Softmax function.
In the embodiment of the invention, the whole method is regarded as a model, the loss function of the whole model in the training process can use a cross entropy loss function, and the classifier can distinguish the rumor category from the real category through the characteristics of multi-modal characteristic information through training.
The probability of the rumor category and the real category can be obtained and then the final detection result can be determined in a conventional manner, for example, the final detection result is judged by a set threshold, and since there are only two categories, when the probability of a certain category is greater than 0.5, the detection result can be judged to belong to the category. Certainly, a higher threshold may be set for obtaining a greater confidence, for example, in a certain example, the probability of the rumor and the real two categories is (0.99, 0.01), that is, the probability of the rumor is 99%, the probability of the real category is 1%, and the probability of the rumor category is greater than the set threshold (e.g., 90%), then the higher confidence may be that the information to be detected is the rumor. Of course, the specific value for the threshold value can be set by the skilled person according to actual conditions or experience.
In the model shown in fig. 1, the loss function may use a cross-entropy loss function during the training process. The data set may use a weibo rumor set. The data set data is collected on a microblog platform, and the specific quantity distribution is as follows:
number of samples Including the number of pictures
Real data 4779 5318
Rumor data 4748 7954
TABLE 1 data set distribution
The scheme of the embodiment of the invention has good effect in the data set shown in the table 1, and can find the false information case which is difficult to distinguish by using a single mode in the traditional scheme.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1. A network rumor detection method based on multi-modal relations is characterized by comprising the following steps:
acquiring information to be detected, including images and related texts, issued on a network platform;
for the image, extracting visual characteristic vectors containing objects of different classes in the image through a pre-trained fasterR-CNN model;
for the text, after preprocessing, extracting semantic vectors through a gate control circulation unit;
capturing the importance degree of the visual feature vector and the semantic vector through an attention mechanism, and realizing cross-mode association between the image and the text, so as to update the visual feature vector and the semantic vector; based on the updated visual feature vector and semantic vector, the relationship of the internal dynamic information is respectively modeled through an attention mechanism, so that the visual feature vector and the semantic vector are updated again; connecting the visual feature vector obtained by updating again with the semantic vector, and obtaining the probability that the information to be detected is a rumor category and a real category through a two-classifier;
wherein, through the attention mechanism, capturing importance degrees of the visual feature vector and the semantic vector, and realizing cross-modal association between the image and the text, so as to update the visual feature vector and the semantic vector, comprises:
the visual feature vector and the semantic vector are respectively used as modal information, the importance degree of each (visual feature vector and semantic vector) pair is extracted through an attention mechanism, the flow between different modal information is realized according to the importance degree so as to update the modal information, and the cross-modal association between the image and the text is realized through an information flow process; the operation process is as follows:
respectively carrying out linear transformation on the visual characteristic vector and the semantic vector to obtain a k value, a q value and a v value required by an attention mechanism, and then obtaining inter-modal attention weight through vector inner product:
Figure FDA0002641532340000011
Figure FDA0002641532340000012
wherein E represents a semantic vector, and V represents a visual feature vector; the k value, the q value and the v value are inherent variables in the attention mechanism and are respectively a key value, a query value and a context vector; q. q.sV、kVQ, k values, q representing the visual feature vector VE、kERepresenting the q and k values of the semantic vector E, and dim represents the vector dimension; interattE→V、InterAttV→ESequentially representing the attention weight from the semantic vector to the visual feature vector and the attention weight from the visual feature vector to the semantic vector;
and then updating the feature vector of the modal information by using other modal information according to the attention weight, so as to realize the information flow among different modes:
V′=InterAttE→V×vE
E′=InterAttV→E×vV
wherein v isE、vVRespectively representing the V values of the semantic vector E and the visual characteristic vector V;
then the updated visual feature vector V 'and semantic feature vector E' are connected with the original visual feature vector V and semantic vector E in series through a full connection layer to obtain the visual feature vector V*And semantic feature vector E*
2. The method of claim 1, wherein the visual feature vectors containing different types of objects are expressed as V ═ V { V } V1,v2,...,vKIn which v isiRepresents the visual feature vector of an object, K represents the total number of feature vectors, i is 1, 2.
3. The method of claim 1, wherein the associated text comprises: the text contained in the information to be detected and the text attached when other users forward the information to be detected.
4. The method of claim 2, wherein preprocessing the text comprises: removing redundant information in the text, only reserving character information, splicing the character information into a text sequence, and using separators as identifiers in splicing gaps; the redundant information includes at least one or more of the following information: symbolic expressions, special characters, uniform resource locators.
5. The method for detecting network rumors based on multi-modal relationships according to claim 1, wherein before the semantic vector is extracted through the gated-cycle unit, word features are vectorized by using pre-trained GLOVE, the preprocessed text is expressed in a matrix form, and feature extraction is performed by using the gated-cycle unit, so as to obtain the semantic vector.
6. The method of claim 1, wherein the step of updating the visual feature vector and the semantic vector again by using an attention mechanism based on the updated visual feature vector and the semantic vector to respectively model the relationship of the internal dynamic information comprises:
to visual feature vector V*And semantic vector E*Pooling and affine transformation are respectively carried out to the dimensions of the k value, the q value and the v value, and then a channel type conditional gate vector M is calculatedV→E,ME→VIntroducing another modality information:
MV→E=Sigmoid(Linear(V* pool))
ME→V=Sigmoid(Linear(E* pool))
wherein, Linear (V)* pool) And Linear (E)* pool) Are respectively visual feature vector V*And semantic vector E*Performing pooling and affine transformation results; sigmoid represents Sigmoid function;
the two channel condition gate vectors modulate k values and q values of two modes, the k values and the q values are activated or deactivated through channel condition gates of other modes, and the updated k values and q values are as follows:
Figure FDA0002641532340000031
Figure FDA0002641532340000032
Figure FDA0002641532340000033
Figure FDA0002641532340000034
in the above formula, the first and second carbon atoms are,
Figure FDA0002641532340000035
representing the updated visual feature vector
Figure FDA0002641532340000036
The values of q and k of (a),
Figure FDA0002641532340000037
representing the semantic vector after the re-update
Figure FDA0002641532340000038
Q, k values of (1);
Figure FDA00026415323400000313
visual feature vector V representing input*The values of q and k of (a),
Figure FDA00026415323400000314
semantic vector E representing input*Q, k values of (1);
after the k value and the q value which are updated again are obtained, generating weights by using an attention mechanism, and updating respective internal dynamic information of the visual feature vector and the semantic vector:
Figure FDA0002641532340000039
Figure FDA00026415323400000310
Figure FDA00026415323400000311
Figure FDA00026415323400000312
wherein, IntraAttV→V、IntraAttE→ESequentially represents the attention weight inside the visual feature vector and the attention weight inside the semantic vector,
Figure FDA00026415323400000315
are respectively visual feature vector V*Semantic vector E*The value of (d) v.
CN201911379313.1A 2019-12-25 2019-12-27 Network rumor detection method based on multi-modal relationship Active CN111079444B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911357589X 2019-12-25
CN201911357589 2019-12-25

Publications (2)

Publication Number Publication Date
CN111079444A CN111079444A (en) 2020-04-28
CN111079444B true CN111079444B (en) 2020-09-29

Family

ID=70318707

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911379313.1A Active CN111079444B (en) 2019-12-25 2019-12-27 Network rumor detection method based on multi-modal relationship

Country Status (1)

Country Link
CN (1) CN111079444B (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582587B (en) * 2020-05-11 2021-06-04 深圳赋乐科技有限公司 Prediction method and prediction system for video public sentiment
CN111797326B (en) * 2020-05-27 2023-05-12 中国科学院计算技术研究所 False news detection method and system integrating multi-scale visual information
CN111611981A (en) * 2020-06-28 2020-09-01 腾讯科技(深圳)有限公司 Information identification method and device and information identification neural network training method and device
CN111985369B (en) * 2020-08-07 2021-09-17 西北工业大学 Course field multi-modal document classification method based on cross-modal attention convolution neural network
CN111967277B (en) * 2020-08-14 2022-07-19 厦门大学 Translation method based on multi-modal machine translation model
CN112015955B (en) * 2020-09-01 2021-07-30 清华大学 Multi-mode data association method and device
CN112035669B (en) * 2020-09-09 2021-05-14 中国科学技术大学 Social media multi-modal rumor detection method based on propagation heterogeneous graph modeling
CN112035670B (en) * 2020-09-09 2021-05-14 中国科学技术大学 Multi-modal rumor detection method based on image emotional tendency
CN112528015B (en) * 2020-10-26 2022-11-18 复旦大学 Method and device for judging rumor in message interactive transmission
CN112199606B (en) * 2020-10-30 2022-06-03 福州大学 Social media-oriented rumor detection system based on hierarchical user representation
CN112200197A (en) * 2020-11-10 2021-01-08 天津大学 Rumor detection method based on deep learning and multi-mode
CN112926569B (en) * 2021-03-16 2022-10-18 重庆邮电大学 Method for detecting natural scene image text in social network
CN113239730B (en) * 2021-04-09 2022-04-05 哈尔滨工业大学 Method for automatically eliminating structural false modal parameters based on computer vision
CN113469214A (en) * 2021-05-20 2021-10-01 中国科学院自动化研究所 False news detection method and device, electronic equipment and storage medium
CN113221872B (en) * 2021-05-28 2022-09-20 北京理工大学 False news detection method for generating convergence of countermeasure network and multi-mode
CN113239926B (en) * 2021-06-17 2022-10-25 北京邮电大学 Multi-modal false information detection model system based on countermeasure
CN113434684B (en) * 2021-07-01 2022-03-08 北京中科研究院 Rumor detection method, system, equipment and storage medium for self-supervision learning
CN113743522A (en) * 2021-09-13 2021-12-03 五八同城信息技术有限公司 Detection method and device for illegal behavior and electronic equipment
CN113822224B (en) * 2021-10-12 2023-12-26 中国人民解放军国防科技大学 Rumor detection method and device integrating multi-mode learning and multi-granularity structure learning
CN113688955B (en) * 2021-10-25 2022-02-15 北京世纪好未来教育科技有限公司 Text recognition method, device, equipment and medium
CN114417001B (en) * 2022-03-29 2022-07-01 山东大学 Chinese writing intelligent analysis method, system and medium based on multi-mode
CN115809327B (en) * 2023-02-08 2023-05-05 四川大学 Real-time social network rumor detection method based on multimode fusion and topics

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902621A (en) * 2012-12-28 2014-07-02 深圳先进技术研究院 Method and device for identifying network rumor
CN105045857A (en) * 2015-07-09 2015-11-11 中国科学院计算技术研究所 Social network rumor recognition method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241379A (en) * 2017-07-11 2019-01-18 北京交通大学 A method of across Modal detection network navy
CN110019812B (en) * 2018-02-27 2021-08-20 中国科学院计算技术研究所 User self-production content detection method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902621A (en) * 2012-12-28 2014-07-02 深圳先进技术研究院 Method and device for identifying network rumor
CN105045857A (en) * 2015-07-09 2015-11-11 中国科学院计算技术研究所 Social network rumor recognition method and system

Also Published As

Publication number Publication date
CN111079444A (en) 2020-04-28

Similar Documents

Publication Publication Date Title
CN111079444B (en) Network rumor detection method based on multi-modal relationship
Li et al. Visual to text: Survey of image and video captioning
Rohrbach et al. Grounding of textual phrases in images by reconstruction
Kumar et al. Identifying clickbait: A multi-strategy approach using neural networks
Ortis et al. Exploiting objective text description of images for visual sentiment analysis
CN111783903A (en) Text processing method, text model processing method and device and computer equipment
Liu et al. Fact-based visual question answering via dual-process system
Peng et al. An effective strategy for multi-modal fake news detection
CN113627550A (en) Image-text emotion analysis method based on multi-mode fusion
Lin et al. Detecting multimedia generated by large ai models: A survey
Illendula et al. Which emoji talks best for my picture?
Maynard et al. Entity-based opinion mining from text and multimedia
CN110895656A (en) Text similarity calculation method and device, electronic equipment and storage medium
Demidova et al. Semantic image-based profiling of users’ interests with neural networks
Kim et al. A deep learning approach for identifying user interest from targeted advertising
Li et al. Multi-modal fusion network for rumor detection with texts and images
CN114443916A (en) Supply and demand matching method and system for test data
Kumari et al. Emotion aided multi-task framework for video embedded misinformation detection
CN110765108A (en) False message early detection method based on crowd-sourcing data fusion
Li Multimodal visual pattern mining with convolutional neural networks
Garg et al. On-Device Document Classification using multimodal features
CN116522895B (en) Text content authenticity assessment method and device based on writing style
Shetty et al. Deep Learning Photograph Caption Generator
Srivastava et al. Improving scene text image captioning using transformer-based multilevel attention
CN113283535B (en) False message detection method and device integrating multi-mode characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant