CN114936285B - Crisis information detection method and system based on antagonistic multi-mode automatic encoder - Google Patents
Crisis information detection method and system based on antagonistic multi-mode automatic encoder Download PDFInfo
- Publication number
- CN114936285B CN114936285B CN202210575072.3A CN202210575072A CN114936285B CN 114936285 B CN114936285 B CN 114936285B CN 202210575072 A CN202210575072 A CN 202210575072A CN 114936285 B CN114936285 B CN 114936285B
- Authority
- CN
- China
- Prior art keywords
- representation
- modal
- mode
- feature representation
- antagonistic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 56
- 230000003042 antagnostic effect Effects 0.000 title claims abstract description 41
- 230000008485 antagonism Effects 0.000 claims abstract description 17
- 238000000034 method Methods 0.000 claims description 52
- 230000000007 visual effect Effects 0.000 claims description 45
- 238000005457 optimization Methods 0.000 claims description 24
- 230000008569 process Effects 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 13
- 238000003860 storage Methods 0.000 claims description 9
- 238000010276 construction Methods 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 2
- 230000011218 segmentation Effects 0.000 claims description 2
- 238000000605 extraction Methods 0.000 abstract description 5
- 238000004458 analytical method Methods 0.000 abstract description 2
- 230000010365 information processing Effects 0.000 abstract description 2
- 238000002474 experimental method Methods 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000002679 ablation Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000005065 mining Methods 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 101100409308 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) adv-1 gene Proteins 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Machine Translation (AREA)
Abstract
The invention belongs to the technical field of information processing, provides a crisis information detection method and system based on an antagonistic multi-mode automatic encoder, and provides an end-to-end AMAE model for detecting and analyzing crisis related information on social media, wherein the crisis information detection method comprises four modules: the device comprises a feature extraction module, an automatic encoder module, an antagonism module and a detection module. The feature extraction module and the automatic encoder module are used for extracting multi-modal representation, the detection module is used for obtaining detection and analysis decisions, the countermeasure module optimizes the multi-modal representation by reducing heterogeneous modal heterogeneity differences and loss of single-peak information, and validity of the model is verified on a large real data set.
Description
Technical Field
The invention belongs to the technical field of information processing, and particularly relates to a crisis information detection method and system based on an antagonistic multi-mode automatic encoder.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
In recent years, social media has become an important channel for communication and discussion during various public events by virtue of its real-time and convenience. Social media greatly simplifies the way people acquire and disseminate information, has become an important source of information and plays a complementary role to traditional media.
Massive social media tweets have become an important resource for data mining and analysis, and particularly during crisis events, users often publish tweets containing infrastructure damage and casualties, and if the potential information in such tweets can be effectively detected and analyzed, the tweets will contribute to situation awareness and rescue decision making.
Existing push detection includes single mode rumor detection and multi-mode rumor detection.
Rumor detection based on single modality has been widely studied, neppali et al detect informative suites during crisis by combining deep neural networks with naive bayes classifier and show excellent performance on real data sets; alam et al propose a semi-supervised framework of Convolutional Neural Network (CNN) combined with graph neural network, which has better learning effect on unlabeled data, and verifies the performance of the model using two real crisis-related datasets on the twitter. Aiming at the unimodal visual information in the crisis-related push text, alam et al adopt a migration learning mode, and use a plurality of pre-trained convolutional neural networks to classify the crisis-related push text;
In multi-modal rumor detection, ma et al propose a rumor detection framework based on a Recurrent Neural Network (RNN) that mines changes in contextual information in a push by using different circulatory elements. Ma et al mine content semantics and propagation cues in the tweets based on a tree structure and learn joint feature representations of social media rumors in combination with the RNNs.
The multi-modal content in the push can provide complementary information, so that a part of work also comprehensively analyzes the multi-modal content to detect and analyze the push. Gao et al propose a Multimodal Antagonistic Neural Network (MANN) that captures transferable, disaster-invariant feature representations through antagonistic training and verifies the validity of the method on a large real dataset. Jin et al propose an end-to-end attention mechanism based recurrent neural network model (att-RNN), fusing features from text, images and social contexts to generate a joint representation, they use the attention mechanism to mine relationships between multimodal features in the model and verify the validity of the model by component analysis. Wang et al propose a EVENT ADVERSARIAL neural network for social media rumor detection, which uses a resistant event discrimination module to remove event-specific features while preserving shared features in order to be able to learn transferable, event-invariant feature representations, which verify the performance of the model through extensive experimentation and quantitative analysis. Abavisani et al propose a multimodal framework (SCBD) that fuses visual and text input, which uses a Cross-attention module to filter information in heterogeneous modal feature representations that is not informative or misleading. Khatter et al propose a multi-modal variation automatic encoder that digs false news and rumors in social media by learning joint representations of multi-modal content in a push.
The inventor finds that the following technical problems exist in the current text detection: in one aspect, the above-described method is directed to capturing a shared semantic representation of teletext data, ignoring to some extent the effects of its modality specific information.
On the other hand, the above-described fusion learning model based on the encoder-decoder framework fails to effectively solve the problem of data redundancy in the mapping process.
Thus, there is a lack of technical means necessary to reduce loss of modality specific information and to solve the problem of data redundancy in the mapping process.
Disclosure of Invention
In order to solve at least one technical problem in the background art, the invention provides a crisis information detection method and system based on an antagonistic multi-modal automatic encoder, which optimize multi-modal representation by reducing heterogeneous modal heterogeneity differences and reducing loss of single-modal information, and can effectively identify and analyze a text on social media.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
A first aspect of the present invention provides a crisis information detection method based on an antagonistic multi-modal automatic encoder, comprising the steps of:
Acquiring multi-mode data to be detected;
Based on the multi-modal data to be detected, learning image modal data and text modal data by adopting an antagonistic multi-modal automatic encoder to obtain multi-modal joint representation;
the construction process of the antagonistic multi-mode automatic encoder comprises the following steps: obtaining an initial multi-modal joint representation based on the text feature representation and the visual feature representation and the automatic encoder module;
Based on the initial multi-mode joint representation and the antagonism module, adopting a minimum maximum game method to perform direct optimization and indirect optimization to obtain a direct multi-mode joint representation and an indirect multi-mode joint representation, and obtaining the multi-mode joint representation based on the direct multi-mode joint representation and the indirect multi-mode joint representation;
and detecting and obtaining crisis information based on the multi-mode joint representation and the detection model.
A second aspect of the present invention provides a crisis information detection system based on an antagonistic multimodal automatic encoder, comprising:
the data acquisition module is used for acquiring multi-mode data to be detected;
the antagonistic multi-modal automatic coding module is used for obtaining multi-modal joint representation by adopting an antagonistic multi-modal automatic coder to learn image modal data and text modal data based on multi-modal data to be detected;
the construction process of the antagonistic multi-mode automatic encoder comprises the following steps: obtaining an initial multi-modal joint representation based on the original text feature representation and the visual feature representation and the automatic encoder module;
Based on the initial multi-mode joint representation and the antagonism module, adopting a minimum maximum game method to perform direct optimization and indirect optimization to obtain a direct multi-mode joint representation and an indirect multi-mode joint representation, and obtaining the multi-mode joint representation based on the direct multi-mode joint representation and the indirect multi-mode joint representation;
And the text detection module is used for detecting and obtaining crisis information based on the multi-mode joint representation and the detection model.
A third aspect of the present invention provides a computer-readable storage medium.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps in a method of antagonistic multimodal context detection based on crisis information detection as described above.
A fourth aspect of the invention provides a computer device.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the method of antagonistic multimodal context detection based on crisis information detection as described above when the program is executed.
Compared with the prior art, the invention has the beneficial effects that:
Aiming at the problem that the combined framework of image-text fusion tends to learn related semantic content among cross-modal data and neglect modal specific content, the invention adopts a minimum maximum game method to conduct direct optimization and indirect optimization to obtain direct multi-modal combined representation and indirect multi-modal combined representation based on original text feature representation and visual feature representation, obtains multi-modal combined representation based on the direct multi-modal combined representation and the indirect multi-modal combined representation, and can effectively reduce heterogeneous difference among heterogeneous modal feature representations and reduce loss of single-mode information. The heterogeneous difference among heterogeneous modal data is effectively reduced, and meanwhile loss of modal specific information is further reduced.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
FIG. 1 is a flow chart of a method for detecting antagonistic multimodal messages based on crisis information detection according to the present invention;
FIG. 2 is a diagram of a method for detecting antagonistic multimodal messages based on crisis information detection according to the present invention.
Detailed Description
The invention will be further described with reference to the drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Example 1
The embodiment provides an antagonistic multi-mode text detection method based on crisis information detection, which comprises the following steps:
Step 1: acquiring real multi-mode text content of a user on social media;
Step 2: extracting original text characteristic representation and visual characteristic representation of the multi-mode text content respectively;
step 3: based on the multi-mode text content, learning image mode data and text mode data by adopting an antagonistic multi-mode automatic encoder to obtain multi-mode joint representation;
the construction process of the antagonistic multi-mode automatic encoder comprises the following steps: obtaining an initial multi-modal joint representation based on the original text feature representation and the visual feature representation and the automatic encoder module;
Based on the initial multi-mode joint representation and the antagonism module, adopting a minimum maximum game method to perform direct optimization and indirect optimization to obtain a direct multi-mode joint representation and an indirect multi-mode joint representation, and obtaining the multi-mode joint representation based on the direct multi-mode joint representation and the indirect multi-mode joint representation;
Step 4: and detecting and obtaining crisis information based on the multi-mode joint representation and the detection model.
As one or more examples, in step 1, the present example conducted an experiment on CRISIS MMD datasets. The data set is obtained by capturing the true push of a user on social media, wherein the push Tw= { T, V } of multi-mode content is obtained, V represents visual content in the push, T represents text content in the push, each push contains data of two modes of images and texts, and the data set structure and the division thereof are shown in table 1:
Wherein Task-1 is intended to determine whether a tweet in the dataset contains potential information related to a crisis event, if so, it is labeled informative, otherwise it is labeled not-informative.
Task-2 aims at further mining the information types of the tweets, and comprises affected-individuals、recue-volunteering or donation-effect、infrastructure and utility-damage、other-relevant-information、not-humanitarian information types in total. For convenience, the data is abbreviated A, R, I, O, N.
Table 1 data set structure and partitioning thereof
As one or more embodiments, in step 2, in the extracting process of the original text feature representation and the visual feature representation of the multi-mode push content:
wherein the original text feature representation is extracted by text content and fine tuned Bert, comprising:
Preprocessing is needed before extracting text modal data characteristics: deleting the user handle of the forwarding title, the stop words and punctuation marks in the text sentence, calling Bert tokenizer to perform word segmentation operation on the sentence, and finally adding set characters into the beginning and the end of each sentence respectively to generate preprocessed text modal data.
And processing the preprocessed text content into an input form required by the Bert, extracting text feature representation by using the Bert, and constraining the dimension of the output text feature representation by using a full-connection layer.
The calculation formula is shown as formula (1):
Rt=σ(wtpt+bt) (1)
Where w t represents the weight of the fully connected layer, p t represents the output of Bert, and b t represents the bias matrix of the fully connected layer.
The original visual feature representation is extracted by visual content and ResNet, including:
the image size is uniformly adjusted to 224 multiplied by 224, and the operations such as random horizontal overturn and random vertical overturn are used for augmentation, so that the generalization capability of the model can be enhanced while the number of samples is increased.
When the visual characteristics are extracted, the visual characteristics are expressed in dimensions by changing the dimension constraint output of the full connection layer, and a calculation formula is shown in a formula (2):
Rv=σ(wvpv+bv) (2)
Where σ represents the activation function ReLU, w v represents the weight of the fully connected layer, p v represents the output of Adaptivepool layers in ResNet, and b v represents the bias matrix of the fully connected layer.
As one or more embodiments, in step 3, the antagonistic multimode automatic encoder includes two encoder components and two decoder components, and a bidirectional gated loop unit structure component (BGA block) based on an attention mechanism is adopted, where each component is composed of N B BGA blocks and a full connection layer.
The encoder components therein are collectively referred to as C enc for encoding and aggregating the visual and text feature representations Rv, rt into a multimodal joint representation Rj.
The decoder components therein are collectively referred to as C dec for reconstructing the aggregated multi-modal joint representation Rj into a visual feature representationWith text feature representationLoss of unimodal information is prevented by the reconstruction process to optimize the multi-modal joint variable.
The calculation process is shown in formula (3):
Rj=Cenc({Rv,Rt},θenc)
Wherein θ enc and θ dec represent parameters of the encoder and decoder components, respectively.
One BGA block contains a layer of Bi-gating cyclic units (Bi-GRU) and a layer of Self-attribute, which can be used for extracting the characteristics of the enhanced context information.
The method specifically comprises the following steps:
(1) Acquiring a feature representation x G with context information using Bi-GRU;
(2) Obtaining a characteristic representation x A of enhanced context information by using Self-Attention mechanism, weighting x G of the information with the context to obtain three intermediate variable matrixes of q, k and v, calculating Attention score, normalizing by using softmax to obtain an Attention value corresponding to v, multiplying the Attention value by the corresponding v and accumulating to obtain an output value of the ith position
The calculation process is as shown in formula (4):
(3) Finally, x G and x A are added and normalized using a process similar to residual error.
As one or more embodiments, in step 3, the process of obtaining the direct multi-modal joint representation and the indirect multi-modal joint representation by using the minimum maximum game method to obtain the direct optimization and the indirect optimization based on the original text feature representation and the visual feature representation includes:
encoding the original text feature representation and the visual feature representation, and directly optimizing the original text feature representation and the visual feature representation based on the encoded text feature representation and visual feature representation to obtain a direct multi-mode joint representation; reconstructing based on the direct multi-modal joint representation to obtain a reconstructed text feature representation and a visual feature representation; performing indirect optimization based on the original text feature representation and the visual feature representation and the reconstructed text feature representation and the visual feature representation to obtain an indirect multi-modal joint representation;
The minimum maximum game method ensures that the best possible outcome is obtained among the worst possible outcomes, regardless of how other decision makers do.
The method specifically comprises the following steps:
In order to be able to effectively reduce the heterogeneity differences between heterogeneous modal feature representations and to reduce the loss of unimodal information, an antagonism module is added for directly or indirectly optimizing the multimodal joint representation.
The structure is shown in fig. 2, and is named AMAE-M (added in the middle for directly optimizing the multi-modal joint representation) and AMAE-L (added in the later position for indirectly optimizing the multi-modal joint representation) respectively according to the difference of the adding positions.
For AMAE-M, as shown in fig. 1, a first resistance module is added before generating the multi-modal joint representation, the input of the first resistance module is an encoded visual feature representation Rv 'and an encoded text feature representation Rt', and an authentication tag is output for distinguishing Rv 'from Rt'. This allows for the direct optimization of the polymerization process for the multi-modal joint representation.
The first resistance module is intended to distinguish Rv 'from Rt' and the encoder component (as a generator) attempts to confuse ADVERSARIAL MODULE to classify Rv 'and Rt' into one class.
The heterogeneity difference of Rv 'and Rt' is reduced by such a minimum and maximum game, so that the joint representation can be directly optimized.
Defining the resistance module herein as C adv, the calculation process can be expressed as equation (5), wherein,Representing a predictive decision tag, θ adv represents a parameter of the resistance module here. The loss of resistance module L adv can be expressed as equation (6), where y a represents a true discrimination tag.
Reconstruction losses L rct and L rcv for the unimodal information are calculated using MSELoss, where N v and N t represent the length of the visual feature representation and the length of the text feature representation, respectively, and the calculation process can be expressed as shown in equation (7):
the final loss function L fin can be expressed as equation (8):
Lfin(θenc,θdec,θadv,θdet)=Ldet+Lrcv+Lrct-Ladv (8)
For AMAE-L, as shown in fig. 1, the present embodiment adds a second antagonism module during reconstruction, so that the reconstruction loss of the unimodal information can be optimized by the second antagonism module, thereby indirectly optimizing the multi-modal joint representation.
The second counterresistance module is used for reconstructing the characteristicsDistinguishing from source features { Rv, rt }, while the encoder and decoder components (as generators) attempt to confuse the second resistance module to reconstruct featuresAnd { Rv, rt } are classified as one type. Loss of unimodal information is reduced by such a minimum and maximum game and the multimodal joint representation is optimized indirectly.
The second resistance modules herein are defined as C adv1 and C adv2, respectively. The calculation process can be expressed as formula (9)
Wherein,Representing a predictive decision tag, θ adv(i) represents a parameter of the second resistance module here. Losses L adv1 and L adv2 of the second resistance module can be expressed as formula (10), where y a1 and y a2 represent true discrimination tags.
The final loss function may be expressed as L fin:
Lfin(θenc,θdec,θadv,θdet)=Ldet-Ladv1-Ladv2
For AMAE-ML, we add the antagonism module simultaneously before and during the reconstruction to generate the multi-modal joint representation, then the loss function can be expressed as equation (11):
Lfin(θenc,θdec,θadv,θdet)=Ldet-Ladv-Ladv1-Ladv2 (11)
For AMAE-M, AMAE-L and AMAE-ML, we find the best parameters by minimizing the final loss L fin
As one or more embodiments, in step 4, crisis information is detected based on the direct multi-modal joint representation and the indirect multi-modal joint representation and the detection model;
the detection model comprises three full-connection layers, dropout is added between a first full-connection layer and a second full-connection layer to prevent overfitting, and a tanh activation function is added between the second full-connection layer and a third full-connection layer.
Defining the detection model as C det, and calculating as shown in formula (12), whereinRepresenting the predictive label, θ det represents a parameter of the detection model.
The cross entropy function is used to calculate the detection loss L det, as shown in equation (13), where y represents the real label.
The specific experimental verification process is as follows:
The dimension of the feature representation is constrained using a full-join layer of dimension 128, the number of BGA-blocks N B that make up the encoder and decoder is set to 3, the full-join layer dimension in the encoder assembly is set to 128, and the full-join layer dimension in the decoder assembly is set to 2048 and 49, respectively. For the detection model, the dimensions of the three fully connected layers are set to 512,128,2 or 5 (corresponding to the number of categories of Task-1 or Task-2), respectively. For the resistance module, the GRU dimension is set to 128 and the dimension of the fully connected layer is set to 512,128,2.
A batch of 128 instances was used in the training of the entire model, which was trained for 500 epochs. The initial learning rate was set to 1e -3, the weight decay period was set to 50 epochs, and the weight decay rate was 1e -4. To find the best parameters of the model Adam was used as an optimizer.
In order to study deeply the proposed AMAE model and the functions of the individual component modules, several ablation experiments were designed, which are individual components or simplified variants in AMAE.
Texual the text-in-text detection is done based on unimodal text content using only the Bert model in the feature extraction module.
Vi sua l uses only ResNet model in the feature extraction module to accomplish the tweet detection based on unimodal visual content.
MAE eliminates the auto encoder module of the resistance module.
AMAE-M adds an antagonism module prior to generating the multimodal joint representation for directly optimizing the multimodal joint representation.
AMAE-L incorporates an antagonism module during reconstruction for indirectly optimizing the multimodal joint representation.
AMAE-ML adds the complete model of the resistance module simultaneously before and during reconstruction to generate the multi-modal joint representation.
The performance results of all the models described above are shown in table 2. It can be observed that the AMAE model presented herein is significantly better than all baseline models in both tasks. Compared with a single-peak baseline detection model, the AMAE model improves the accuracy by 4% -8%. Compared with a multimodal baseline detection model, the AMAE model improves the accuracy by 2% -4%. This is sufficient to demonstrate that the use of the antagonism module herein can effectively optimize the multimodal joint representation, enabling further mining of the complementarity information of the multimodal content.
To further verify the performance of the AMAE model, the performance of the proposed model was compared with a representative method in the relevant work, specifically including:
att_rnn is a recurrent neural network with added attention mechanisms aimed at fusing text, visual nuclear social context features to generate joint representations for mining rumors on social media. In the experiments herein, parts dealing with social context features were removed.
EANN includes three parts, a multi-modal feature extractor, a false news detector, and an event discriminator. Wherein the event discriminator is operable to generate a transferable, event-invariant feature. In the experiments herein, the complete structure thereof was used.
MVAE is a multi-modal variation auto-encoder aimed at fusing text with visual features to generate a joint representation to mine rumors on social media. In the experiments herein, the complete structure thereof was used.
The SCBD comprises a feature extraction module, a cross attention module and three modules which are embedded in a random sharing way. In the experiments herein, the complete structure thereof was used.
Table 3 shows the results of the ablation experiments and table 4 shows the results of the comparative experiments, and it can be observed that the AMAE model presented herein is significantly better than the related work. Compared with MVAE, the accuracy is improved by 2% -3%, which proves that the antagonism strategy can effectively optimize the multi-mode joint representation. Compared with EANN using the antagonism strategy, the accuracy is improved by 1% -3%. This demonstrates that AMAE can learn better multimodal joint representations. Overall, the overall performance of AMAE exceeded all baselines, which verifies the validity of the model presented herein.
Table 3 ablation experimental results
Table 4 results of comparative experiments
Example two
The embodiment provides an antagonistic multimodal text detection system based on crisis information detection, comprising:
the data acquisition module is used for acquiring multi-mode data to be detected;
the antagonistic multi-modal automatic coding module is used for obtaining multi-modal joint representation by adopting an antagonistic multi-modal automatic coder to learn image modal data and text modal data based on multi-modal data to be detected;
the construction process of the antagonistic multi-mode automatic encoder comprises the following steps: obtaining an initial multi-modal joint representation based on the original text feature representation and the visual feature representation and the automatic encoder module;
Based on the initial multi-mode joint representation and the antagonism module, adopting a minimum maximum game method to perform direct optimization and indirect optimization to obtain a direct multi-mode joint representation and an indirect multi-mode joint representation, and obtaining the multi-mode joint representation based on the direct multi-mode joint representation and the indirect multi-mode joint representation;
And the text detection module is used for detecting and obtaining crisis information based on the multi-mode joint representation and the detection model.
Example III
The present embodiment provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps in a method of antagonistic multimodal context detection based on crisis information detection as described above.
Example IV
The embodiment provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps in the method for detecting the antagonistic multimodal context based on crisis information when executing the program.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disc, a Read-Only Memory (ROM), a Random access Memory (Random AccessMemory, RAM), or the like.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (9)
1. The crisis information detection method based on the antagonistic multi-mode automatic encoder is characterized by comprising the following steps of:
Acquiring multi-mode data to be detected;
Based on the multi-modal data to be detected, learning image modal data and text modal data by adopting an antagonistic multi-modal automatic encoder to obtain multi-modal joint representation;
the construction process of the antagonistic multi-mode automatic encoder comprises the following steps: obtaining an initial multi-modal joint representation based on the text feature representation and the visual feature representation and the automatic encoder module;
Based on the initial multi-mode joint representation and the antagonism module, adopting a minimum maximum game method to perform direct optimization and indirect optimization to obtain a direct multi-mode joint representation and an indirect multi-mode joint representation, and obtaining the multi-mode joint representation based on the direct multi-mode joint representation and the indirect multi-mode joint representation;
obtaining crisis information based on multi-mode joint representation and detection model detection;
the method for obtaining the direct multi-mode joint representation and the indirect multi-mode joint representation by adopting the minimum and maximum game method for direct optimization and indirect optimization comprises the following steps:
Encoding the original text feature representation and the visual feature representation, and directly optimizing the original text feature representation and the visual feature representation based on the encoded text feature representation and visual feature representation to obtain a direct multi-mode joint representation; reconstructing based on the direct multi-modal joint representation to obtain a reconstructed text feature representation and a visual feature representation; and performing indirect optimization based on the original text feature representation and the visual feature representation and the reconstructed text feature representation and the visual feature representation to obtain an indirect multi-modal joint representation.
2. The method for detecting crisis information based on an antagonistic multimodal automatic encoder according to claim 1, wherein the original text feature representation is extracted by text content in a multimodal content and fine-tuned Bert, and the visual feature representation is extracted by visual content in a multimodal content and ResNet.
3. The method for detecting crisis information based on an antagonistic multimodal automatic encoder according to claim 1, wherein preprocessing is required before extracting the text modality data features: deleting the user handle of the forwarding title, the stop words and punctuation marks in the text sentence, then performing word segmentation operation on the sentence, and finally adding set characters into the beginning and the end of each sentence respectively to generate preprocessed text modal data.
4. The crisis information detection method based on an antagonistic multimodal automatic encoder according to claim 1, wherein in the encoding of the original text feature representation and the visual feature representation, encoder components are used for encoding, each component comprises a plurality of BGA blocks and a fully connected layer, wherein each BGA block comprises a Bi-GRU and a Self-attribute, and the context information is acquired through the Bi-GRUEnhanced context information acquisition by Self-attribute。
5. The crisis information detection method based on an antagonistic multi-mode automatic encoder according to claim 1, wherein the detection model comprises three fully connected layers, dropout is added between a first fully connected layer and a second fully connected layer, and a tanh activation function is added between the second fully connected layer and a third fully connected layer.
6. The method for detecting crisis information based on an antagonistic multi-modal automatic encoder according to claim 1, wherein the detection loss is calculated using a cross entropy function.
7. Crisis information detecting system based on antagonism multimode automatic encoder, characterized by comprising:
the data acquisition module is used for acquiring multi-mode data to be detected;
the antagonistic multi-modal automatic coding module is used for obtaining multi-modal joint representation by adopting an antagonistic multi-modal automatic coder to learn image modal data and text modal data based on multi-modal data to be detected;
the construction process of the antagonistic multi-mode automatic encoder comprises the following steps: obtaining an initial multi-modal joint representation based on the text feature representation and the visual feature representation and the automatic encoder module;
Based on the initial multi-mode joint representation and the antagonism module, adopting a minimum maximum game method to perform direct optimization and indirect optimization to obtain a direct multi-mode joint representation and an indirect multi-mode joint representation, and obtaining the multi-mode joint representation based on the direct multi-mode joint representation and the indirect multi-mode joint representation;
the method for obtaining the direct multi-mode joint representation and the indirect multi-mode joint representation by adopting the minimum and maximum game method for direct optimization and indirect optimization comprises the following steps:
encoding the original text feature representation and the visual feature representation, and directly optimizing the original text feature representation and the visual feature representation based on the encoded text feature representation and visual feature representation to obtain a direct multi-mode joint representation; reconstructing based on the direct multi-modal joint representation to obtain a reconstructed text feature representation and a visual feature representation; performing indirect optimization based on the original text feature representation and the visual feature representation and the reconstructed text feature representation and the visual feature representation to obtain an indirect multi-modal joint representation;
And the text detection module is used for detecting and obtaining crisis information based on the multi-mode joint representation and the detection model.
8. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps in the method for detecting crisis information based on an antagonistic multimodal automatic encoder according to any one of claims 1 to 6.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps in the method for detecting crisis information based on an antagonistic multimodal automatic encoder according to any of the claims 1-6 when the program is executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210575072.3A CN114936285B (en) | 2022-05-25 | 2022-05-25 | Crisis information detection method and system based on antagonistic multi-mode automatic encoder |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210575072.3A CN114936285B (en) | 2022-05-25 | 2022-05-25 | Crisis information detection method and system based on antagonistic multi-mode automatic encoder |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114936285A CN114936285A (en) | 2022-08-23 |
CN114936285B true CN114936285B (en) | 2024-07-12 |
Family
ID=82865057
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210575072.3A Active CN114936285B (en) | 2022-05-25 | 2022-05-25 | Crisis information detection method and system based on antagonistic multi-mode automatic encoder |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114936285B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111858928A (en) * | 2020-06-17 | 2020-10-30 | 北京邮电大学 | Social media rumor detection method and device based on graph structure counterstudy |
CN112148997A (en) * | 2020-08-07 | 2020-12-29 | 江汉大学 | Multi-modal confrontation model training method and device for disaster event detection |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108319686B (en) * | 2018-02-01 | 2021-07-30 | 北京大学深圳研究生院 | Antagonism cross-media retrieval method based on limited text space |
WO2022101515A1 (en) * | 2020-11-16 | 2022-05-19 | UMNAI Limited | Method for an explainable autoencoder and an explainable generative adversarial network |
CN113515634B (en) * | 2021-07-09 | 2023-08-01 | 福州大学 | Social media rumor detection method and system based on hierarchical heterogeneous graph neural network |
-
2022
- 2022-05-25 CN CN202210575072.3A patent/CN114936285B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111858928A (en) * | 2020-06-17 | 2020-10-30 | 北京邮电大学 | Social media rumor detection method and device based on graph structure counterstudy |
CN112148997A (en) * | 2020-08-07 | 2020-12-29 | 江汉大学 | Multi-modal confrontation model training method and device for disaster event detection |
Also Published As
Publication number | Publication date |
---|---|
CN114936285A (en) | 2022-08-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109983483B (en) | Computer-implemented method and computing device for managing machine learning models | |
CN111600919B (en) | Method and device for constructing intelligent network application protection system model | |
CN106302522B (en) | A kind of network safety situation analysis method and system based on neural network and big data | |
CN113596007B (en) | Vulnerability attack detection method and device based on deep learning | |
CN112148997B (en) | Training method and device for multi-modal countermeasure model for disaster event detection | |
CN108549658A (en) | A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree | |
CN116957049B (en) | Unsupervised internal threat detection method based on countermeasure self-encoder | |
CN112329816A (en) | Data classification method and device, electronic equipment and readable storage medium | |
CN113239357B (en) | Webshell detection method, storage medium and system | |
CN116861258B (en) | Model processing method, device, equipment and storage medium | |
CN108446562B (en) | Intrusion detection method based on tabu and artificial bee colony bidirectional optimization support vector machine | |
CN111241550B (en) | Vulnerability detection method based on binary mapping and deep learning | |
CN106649743A (en) | Method and system for storing and sharing creative idea classified brain library resources | |
CN114936285B (en) | Crisis information detection method and system based on antagonistic multi-mode automatic encoder | |
CN117370980A (en) | Malicious code detection model generation and detection method, device, equipment and medium | |
CN114528908B (en) | Network request data classification model training method, classification method and storage medium | |
CN113901801B (en) | Text content safety detection method based on deep learning | |
CN115169293A (en) | Text steganalysis method, system, device and storage medium | |
CN114169540A (en) | Webpage user behavior detection method and system based on improved machine learning | |
Kartik et al. | Decoding of graphically encoded numerical digits using deep learning and edge detection techniques | |
CN113011875A (en) | Text processing method and device, computer equipment and storage medium | |
CN114338058A (en) | Information processing method, device and storage medium | |
Fei et al. | False News Detection Based on Adaptive Fusion of Multimodal Features | |
CN115361307B (en) | Data center anomaly detection method, device and related products | |
Zheng et al. | Detection Approach of Malicious JavaScript Code Based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: 250353 University Road, Changqing District, Ji'nan, Shandong Province, No. 3501 Patentee after: Qilu University of Technology (Shandong Academy of Sciences) Country or region after: China Address before: 250353 University Road, Changqing District, Ji'nan, Shandong Province, No. 3501 Patentee before: Qilu University of Technology Country or region before: China |