CN112035670B - Multi-modal rumor detection method based on image emotional tendency - Google Patents
Multi-modal rumor detection method based on image emotional tendency Download PDFInfo
- Publication number
- CN112035670B CN112035670B CN202010940956.5A CN202010940956A CN112035670B CN 112035670 B CN112035670 B CN 112035670B CN 202010940956 A CN202010940956 A CN 202010940956A CN 112035670 B CN112035670 B CN 112035670B
- Authority
- CN
- China
- Prior art keywords
- image
- text
- emotional tendency
- features
- rumor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Abstract
The invention discloses a multi-modal rumor detection method based on image emotion tendencies. Meanwhile, a method for extracting the emotional tendency of the image is proposed based on a conditional variational self-encoder (CVAE) and is different from a conventional method using emotional analysis, and the effectiveness of the method can be observed through tests. The method can obtain accurate detection results only by using a single image as input, and can quickly detect and process at the initial stage of rumor propagation.
Description
Technical Field
The invention relates to the technical field of network space security, in particular to a multi-modal rumor detection method based on image emotional tendency.
Background
The development of social media accelerates information dissemination and brings about the flooding of false rumor information, which often causes a plurality of unstable factors and has great influence on economy and society. The social network platform users are hundreds of millions of years old today, and the social network platform users have wide spread, rapid spread and wide use range, are not limited by time and space and have magnifying information influence by the characteristics of a magnifier. Unrealistic rumors, "manipulate" public opinion feelings, mislead public judgment, and influence social stability, so that automatic and rapid detection for network rumors has important significance for network space safety.
Social media rumors often have some characteristics with obvious incidences, and from this perspective, the method of emotion analysis based on text greatly expands the heteroscedasticity in rumor detection, but with the development of multimedia production technology, the rumors gradually attract and mislead readers in a way of picture and text integration, and pictures often have strong visual impact and abundant potential information can be mined. In addition, in massive social media data, the image and text information are not presented in a completely separated form, but a part of image data still contains a large amount of text, and the part of text often contains semantic information closely related to a theme, which is helpful for establishing the relationship between the image and emotional tendency, but the conventional multi-modal detection method cannot well hold the auxiliary information.
Disclosure of Invention
The invention aims to provide a multi-modal rumor detection method based on image emotion tendencies, which can obtain an accurate detection result by only using a single image as input and can quickly detect and process the rumor at the initial stage of rumor propagation.
The purpose of the invention is realized by the following technical scheme:
a multi-modal rumor detection method based on image emotion tendencies comprises the following steps:
in the training stage, texts and images containing character information are used as training data; the method for extracting the multi-modal features of each training sample group consisting of texts and images comprises the following steps: text features, image features, and text information features in the image; updating prior distribution and a classifier by combining image characteristics, character information characteristics in the image, text characteristics, hidden variables of a semantic space and a given emotional tendency label based on a conditional variation self-encoder, wherein the hidden variables are the semantics of the image;
and in the testing stage, character information features are extracted from the image for the image to be detected and the corresponding text, the emotion tendency is generated by combining the updated hidden variable obtained by prior distribution sampling, and then the emotion tendency is spliced with the text features, and the probability that the image to be detected is rumor is obtained through a classifier.
According to the technical scheme provided by the invention, on one hand, the method has better pertinence to the samples with characters in the drawings. Meanwhile, a method for extracting the emotional tendency of the image is proposed based on a conditional variational self-encoder (CVAE) and is different from a conventional method using emotional analysis, and the effectiveness of the method can be observed through tests. The method can obtain accurate detection results only by using a single image as input, and can quickly detect and process at the initial stage of rumor propagation.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a schematic diagram of a multi-modal rumor detection method based on image emotional tendency according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a relationship between an image, a text, a hidden variable and an emotional tendency provided in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a multi-modal rumor detection method based on image emotional tendency, wherein the emotional analysis of texts can often start from certain keywords, but the emotional tendency of the images can not be generally extracted or explicitly represented from a specific certain area. To this end, the invention trains an emotional tendency characteristic judgment model by taking emotional tendency as a label based on a conditional variational self-encoder (CVAE), thereby implicitly learning the emotional 'characteristic' in an image, and simultaneously acquiring the text in the image as additional information to assist learning by utilizing an Optical Character Recognition (OCR) technology. In the testing stage, the social media to be tested is input into the model, and whether the social media is a rumor is judged according to the learned emotional tendency characteristics. The method provided by the patent has a good effect on the social media rumors containing the text pictures, and shows certain pertinence and effectiveness.
The general technical framework of the method is shown in fig. 1, mainly as follows:
firstly, a training stage.
In the training stage, texts and images containing text information are used as training data (which can be directly obtained from a social platform); the method for extracting the multi-modal features of each training sample group consisting of texts and images comprises the following steps: text features, image features, and text information features in the image; based on a conditional variation self-encoder, image features, character information features in the image, text features, hidden variables of an image semantic space and given emotional tendency labels are combined to update a prior distribution and a classifier, and the hidden variables are the semantics of the image.
The training stage mainly comprises the following parts:
1. and (4) preprocessing data.
1) The symbol expression, the special character, the URL and the like in the original text content are redundant, all the information is selected to be ignored, only the character information is reserved through redundancy removing operation and is spliced into a text sequence, and the splicing gap uses a separator as an identifier.
2) In order to facilitate extraction of text information, denoising processing is performed on the image.
2. And (5) extracting the multi-mode features.
1) And extracting text features.
Statistics shows that the length of 98% of texts in the data set does not exceed 150 characters after preprocessing, so that a section of text is set to contain 150 words at most for limiting the calculation efficiency, and the excessive words are discarded and the insufficient words are supplemented; the numerical value 150 given here is only an example, and the character length can be set by itself in practical applications according to circumstances.
And performing word feature vectorization on the text by using GLoVe pre-trained in Chinese Wikipedia, sending the text to a GRU (gated Loop Unit) for feature extraction, wherein the hidden layer state size is 512, and the obtained semantic vector is the text feature E.
2) And (5) extracting image features.
Since the features based on the target level are not needed, the method is different from the prior method, and the general feature representation is extracted by adopting the pre-training model ResnexXt. ResnexXt performs well in many tasks in the computer vision domain, and is unique in the structure of first grouping convolutions and then residuals.
In the embodiment of the invention, only the part extracted by the ResnexXt feature is reserved, and the global feature expression vector of the image is obtained after the last pooling and is used as the image feature I.
3) Character information feature extraction in a picture
In the embodiment of the invention, a set of OCR tokens in an image is obtained through an open-source Chinese optical character recognition suite CNOCR, and the set comprises semantic information of characters in the image; and then carrying out vectorization on the text by utilizing GLoVe pre-trained on Chinese Wikipedia, and finally obtaining character information characteristics O through linear transformation.
3. Picture emotional tendency feature extraction based on CVAE
The matching of rumors is usually strong in visual impact, but the emotional tendency of the whole picture cannot be obtained from a certain local area, unlike the situation that some words with obvious incidences exist in the text can start, so how to extract the emotional tendency in the image is the key point and the difficulty of the research of the invention.
It is assumed that there is a certain distribution of emotional tendency Y of the image and the image feature I, but the distribution cannot be expressed by an explicit formula. For this reason, based on the design mechanism of the conditional variational auto-encoder, it can be reasonably assumed that the factors determining the emotional tendency Y to be generated are given image features I and an implicit variable Z in a semantic space (Z can be understood as the semantic in the matching graph, in the form of a multivariate gaussian distribution with a diagonal covariance matrix, isotropy), and Z and I satisfy a certain prior distribution. Meanwhile, the character O extracted in the figure is generated from the image, contributing to the generation of Y, and therefore, the above-described relationship is as shown in fig. 2.
Given the image characteristics I as conditions, the character information characteristics O can be extracted by using CNOCR, and the prior distribution p of a hidden variable can be determinedθ(Z | I). Each possible emotional tendency Y can be obtained by sampling an implicit variable Z from the prior distribution and extracting character information characteristics O through a decoder pθ(Y | I, Z, O) is generated. According to the inputs currently available: image characteristics I and given label labels of emotional tendency Y (divided into positive direction and negative direction, corresponding to truth and rumor respectively); based on the principle of a conditional variational encoder, a priori distribution pθ(Z | I) is not straightforward to calculate, so the present invention uses a posterior distributionAnd (4) removing approximate fitting, wherein the form is a deep neural network, and the optimization target is the KL divergence of prior and posterior distribution. The training strategy is as follows:
1) obtaining posterior distribution by using image characteristics I and given emotional tendency label YAnd initializing a prior distribution p of hidden variablesθ(Z | I), sampling a hidden variable, decoding and predicting the emotional tendency by combining the character information characteristic O, and calculating the reconstruction error of the emotional tendency label.
2) Minimizing posterior distribution by KL divergenceAnd a prior distribution pθ(Z | I) distance, thereby modifying the prior distribution pθ(Z | I); and, minimizing the reconstruction error of the emotional tendency label Y, the two process loss functions are:
wherein, the ratio of theta,respectively representing the relevant adjustable parameters in the prior distribution and the posterior distribution,representing the reconstruction error of the emotional tendency label Y, p (Y | I) is a fixed form in the CVAE loss function. According to the Bayes rule of random gradient variation (SGVB), the lower bound function on the left side of the inequality is maximized during training.
3) From a trained a priori distribution pθSampling an implicit variable in (Z | I)Generating emotional tendency by decoding with character information characteristic OAnd then the text features E are spliced together and input into a classifier to judge whether the rumor is the rumor, and the classifier is trained based on a set loss function (such as a cross entropy loss function).
And II, testing.
The flow of the testing stage is similar to that of the training stage, as shown in fig. 1, the input of the testing stage is an image containing text information and a corresponding text, for the image to be detected, text information features are extracted, then hidden variables obtained by combining updated prior distribution sampling are decoded to generate emotional tendency, the emotional tendency and the text features are spliced, and the probability that the image to be detected is rumor is obtained through an updated classifier.
And then, the final detection result can be determined in a conventional mode, and because only two types exist, the detection result can be judged to belong to a certain type when the probability of the certain type is higher.
Of course, a higher threshold value may be set for obtaining a greater degree of confidence, and the specific value may be set by the technician according to actual conditions or experience. For example, in an example, the probability of the rumor and the real two categories is (0.99, 0.01), that is, the probability of the rumor is 99%, the probability of the real is 1%, and the probability of the rumor category is greater than a set threshold (e.g., 90%), then the information to be detected is considered to be the rumor with a higher confidence.
According to the scheme provided by the embodiment of the invention, on one hand, the method has better pertinence to the samples with characters in the graph. Meanwhile, a method for extracting the emotional tendency of the image is proposed based on a conditional variational self-encoder (CVAE) and is different from a conventional method using emotional analysis, and the effectiveness of the method can be observed through tests. The method can obtain accurate detection results only by using a single image as input, and can quickly detect and process at the initial stage of rumor propagation.
In addition, in order to illustrate the effects of the above-described aspects of the present invention, related experiments were also performed.
In the experiment, the data set uses a microblog which comprises pictures and texts, and the pictures contain text information. Data of the data set is collected on a microblog platform, supplement is mainly carried out on the basis of a Weibo RumorSet data set, each microblog is matched with 1.47 pictures on average, and meanwhile, an emotional tendency label is also marked on each picture. The specific number distribution is shown in table 1:
number of samples | Including the number of pictures | |
Real data | 2729 | 4262 |
Rumor data | 2555 | 3517 |
TABLE 1 data set distribution
The training set and the test set are divided according to a ratio of 9:1, the accuracy of a final model on the test set can reach 77.8%, the recall ratio is 76.3%, the highest accuracy of the traditional multi-mode method is only 54.6%, and the method has certain pertinence and effectiveness on multi-mode microblog information with characters in the graph.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (4)
1. A multi-modal rumor detection method based on image emotion tendencies is characterized by comprising the following steps:
in the training stage, texts and images containing character information are used as training data; the method for extracting the multi-modal features of each training sample group consisting of texts and images comprises the following steps: text features, image features, and text information features in the image; based on conditional variation self-encoder, combining image characteristics, character information characteristics in image, text characteristics, latent variables of semantic space and given emotion tendencyUpdating the prior distribution and classifier to the label, comprising: obtaining posterior distribution by using image characteristics I and given emotional tendency label YAnd initializing a prior distribution p of hidden variablesθ(Z | I), sampling a hidden variable, decoding and predicting the emotional tendency by combining the character information characteristic O, and calculating the reconstruction error of the emotional tendency label; obtaining updated prior distribution p by minimizing the distance between the posterior distribution and the prior distribution through KL divergence and minimizing the reconstruction error of the emotional tendency label Yθ(Z | I); from a trained a priori distribution pθSampling an implicit variable in (Z | I)Generating emotional tendency by decoding with character information characteristic OThen the text characteristic E is spliced together and input into a classifier to judge whether the rumor is rumor; training the classifier by using a set loss function; the latent variable is the semantic of the image;
and in the testing stage, character information features are extracted from the image for the image to be detected and the corresponding text, the emotion tendency is generated by combining the updated hidden variable obtained by prior distribution sampling, and then the emotion tendency is spliced with the text features, and the probability that the image to be detected is rumor is obtained through a classifier.
2. The method for detecting multi-modal rumors based on emotional tendency of images according to claim 1, wherein the pre-processing of data before the multi-modal feature extraction comprises:
performing redundancy removal operation on the text, only reserving character information, and splicing the character information into a text sequence;
and carrying out denoising processing on the image.
3. The method of claim 1, wherein the image emotion tendencies are used as basis for multi-modal rumors detection,
vectorizing word features of the text through pre-trained GLoVe, and sending the words into GRU for feature extraction to obtain semantic vectors which are text features;
extracting general characteristic representation of the image through a pre-training model ResnexXt, and taking the characteristic output by the last pooling layer of the pre-training model ResnexXt as the image characteristic;
acquiring a set of OCR tokens in an image through an open-source Chinese optical character recognition suite (CNOCR), wherein the set comprises semantic information of characters in the image; and finally, vectorizing the text by using the pre-trained GLoVe, and finally obtaining character information characteristics through linear transformation.
4. The method of claim 1, wherein the posterior distribution is minimized by KL divergenceAnd a prior distribution pθ(Z | I) distance, thereby modifying the prior distribution pθ(Z | I); and, minimizing the reconstruction error of the emotional tendency label Y, the loss function of this process is:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010940956.5A CN112035670B (en) | 2020-09-09 | 2020-09-09 | Multi-modal rumor detection method based on image emotional tendency |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010940956.5A CN112035670B (en) | 2020-09-09 | 2020-09-09 | Multi-modal rumor detection method based on image emotional tendency |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112035670A CN112035670A (en) | 2020-12-04 |
CN112035670B true CN112035670B (en) | 2021-05-14 |
Family
ID=73584556
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010940956.5A Active CN112035670B (en) | 2020-09-09 | 2020-09-09 | Multi-modal rumor detection method based on image emotional tendency |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112035670B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116502092A (en) * | 2023-06-26 | 2023-07-28 | 国网智能电网研究院有限公司 | Semantic alignment method, device, equipment and storage medium for multi-source heterogeneous data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109829499A (en) * | 2019-01-31 | 2019-05-31 | 中国科学院信息工程研究所 | Image, text and data fusion sensibility classification method and device based on same feature space |
CN110580501A (en) * | 2019-08-20 | 2019-12-17 | 天津大学 | Zero sample image classification method based on variational self-coding countermeasure network |
CN111079444A (en) * | 2019-12-25 | 2020-04-28 | 北京中科研究院 | Network rumor detection method based on multi-modal relationship |
CN111160452A (en) * | 2019-12-25 | 2020-05-15 | 北京中科研究院 | Multi-modal network rumor detection method based on pre-training language model |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5008024B2 (en) * | 2006-12-28 | 2012-08-22 | 独立行政法人情報通信研究機構 | Reputation information extraction device and reputation information extraction method |
US9959365B2 (en) * | 2015-01-16 | 2018-05-01 | The Trustees Of The Stevens Institute Of Technology | Method and apparatus to identify the source of information or misinformation in large-scale social media networks |
-
2020
- 2020-09-09 CN CN202010940956.5A patent/CN112035670B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109829499A (en) * | 2019-01-31 | 2019-05-31 | 中国科学院信息工程研究所 | Image, text and data fusion sensibility classification method and device based on same feature space |
CN110580501A (en) * | 2019-08-20 | 2019-12-17 | 天津大学 | Zero sample image classification method based on variational self-coding countermeasure network |
CN111079444A (en) * | 2019-12-25 | 2020-04-28 | 北京中科研究院 | Network rumor detection method based on multi-modal relationship |
CN111160452A (en) * | 2019-12-25 | 2020-05-15 | 北京中科研究院 | Multi-modal network rumor detection method based on pre-training language model |
Non-Patent Citations (2)
Title |
---|
MVAE: Multimodal Variational Autoencoder for Fake NewsDetection;Dhruv Khattar等;《The Web Conference-2019》;20190531;第1-8页 * |
基于情感分析的网络谣言识别方法;首欢容等;《数据分析与知识发现》;20170725(第7期);第44-51页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112035670A (en) | 2020-12-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110096570B (en) | Intention identification method and device applied to intelligent customer service robot | |
CN110188194B (en) | False news detection method and system based on multitask learning model | |
CN112035669B (en) | Social media multi-modal rumor detection method based on propagation heterogeneous graph modeling | |
CN113283551B (en) | Training method and training device of multi-mode pre-training model and electronic equipment | |
CN106886580B (en) | Image emotion polarity analysis method based on deep learning | |
CN111814454B (en) | Multi-mode network spoofing detection model on social network | |
CN111831790A (en) | False news identification method based on low threshold integration and text content matching | |
CN112270196A (en) | Entity relationship identification method and device and electronic equipment | |
CN112069312B (en) | Text classification method based on entity recognition and electronic device | |
CN111325237B (en) | Image recognition method based on attention interaction mechanism | |
CN110858217A (en) | Method and device for detecting microblog sensitive topics and readable storage medium | |
CN113569050A (en) | Method and device for automatically constructing government affair field knowledge map based on deep learning | |
CN115238688B (en) | Method, device, equipment and storage medium for analyzing association relation of electronic information data | |
CN114170411A (en) | Picture emotion recognition method integrating multi-scale information | |
CN112434164A (en) | Network public opinion analysis method and system considering topic discovery and emotion analysis | |
CN113282754A (en) | Public opinion detection method, device, equipment and storage medium for news events | |
CN111563373A (en) | Attribute-level emotion classification method for focused attribute-related text | |
Islam et al. | Deep Learning for Multi-Labeled Cyberbully Detection: Enhancing Online Safety | |
CN112035670B (en) | Multi-modal rumor detection method based on image emotional tendency | |
CN113486143A (en) | User portrait generation method based on multi-level text representation and model fusion | |
CN116935411A (en) | Radical-level ancient character recognition method based on character decomposition and reconstruction | |
CN114297390B (en) | Aspect category identification method and system in long tail distribution scene | |
Kikkisetti et al. | Using LLMs to discover emerging coded antisemitic hate-speech emergence in extremist social media | |
CN114065749A (en) | Text-oriented Guangdong language recognition model and training and recognition method of system | |
CN114416991A (en) | Method and system for analyzing text emotion reason based on prompt |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |