CN114936285B

CN114936285B - Crisis information detection method and system based on antagonistic multi-mode automatic encoder

Info

Publication number: CN114936285B
Application number: CN202210575072.3A
Authority: CN
Inventors: 王新刚; 周金岩; 吕建东
Original assignee: Qilu University of Technology
Current assignee: Qilu University of Technology
Priority date: 2022-05-25
Filing date: 2022-05-25
Publication date: 2024-07-12
Anticipated expiration: 2042-05-25
Also published as: CN114936285A

Abstract

The invention belongs to the technical field of information processing, provides a crisis information detection method and system based on an antagonistic multi-mode automatic encoder, and provides an end-to-end AMAE model for detecting and analyzing crisis related information on social media, wherein the crisis information detection method comprises four modules: the device comprises a feature extraction module, an automatic encoder module, an antagonism module and a detection module. The feature extraction module and the automatic encoder module are used for extracting multi-modal representation, the detection module is used for obtaining detection and analysis decisions, the countermeasure module optimizes the multi-modal representation by reducing heterogeneous modal heterogeneity differences and loss of single-peak information, and validity of the model is verified on a large real data set.

Description

Crisis information detection method and system based on antagonistic multi-mode automatic encoder

Technical Field

The invention belongs to the technical field of information processing, and particularly relates to a crisis information detection method and system based on an antagonistic multi-mode automatic encoder.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

In recent years, social media has become an important channel for communication and discussion during various public events by virtue of its real-time and convenience. Social media greatly simplifies the way people acquire and disseminate information, has become an important source of information and plays a complementary role to traditional media.

Massive social media tweets have become an important resource for data mining and analysis, and particularly during crisis events, users often publish tweets containing infrastructure damage and casualties, and if the potential information in such tweets can be effectively detected and analyzed, the tweets will contribute to situation awareness and rescue decision making.

Existing push detection includes single mode rumor detection and multi-mode rumor detection.

Rumor detection based on single modality has been widely studied, neppali et al detect informative suites during crisis by combining deep neural networks with naive bayes classifier and show excellent performance on real data sets; alam et al propose a semi-supervised framework of Convolutional Neural Network (CNN) combined with graph neural network, which has better learning effect on unlabeled data, and verifies the performance of the model using two real crisis-related datasets on the twitter. Aiming at the unimodal visual information in the crisis-related push text, alam et al adopt a migration learning mode, and use a plurality of pre-trained convolutional neural networks to classify the crisis-related push text;

In multi-modal rumor detection, ma et al propose a rumor detection framework based on a Recurrent Neural Network (RNN) that mines changes in contextual information in a push by using different circulatory elements. Ma et al mine content semantics and propagation cues in the tweets based on a tree structure and learn joint feature representations of social media rumors in combination with the RNNs.

The multi-modal content in the push can provide complementary information, so that a part of work also comprehensively analyzes the multi-modal content to detect and analyze the push. Gao et al propose a Multimodal Antagonistic Neural Network (MANN) that captures transferable, disaster-invariant feature representations through antagonistic training and verifies the validity of the method on a large real dataset. Jin et al propose an end-to-end attention mechanism based recurrent neural network model (att-RNN), fusing features from text, images and social contexts to generate a joint representation, they use the attention mechanism to mine relationships between multimodal features in the model and verify the validity of the model by component analysis. Wang et al propose a EVENT ADVERSARIAL neural network for social media rumor detection, which uses a resistant event discrimination module to remove event-specific features while preserving shared features in order to be able to learn transferable, event-invariant feature representations, which verify the performance of the model through extensive experimentation and quantitative analysis. Abavisani et al propose a multimodal framework (SCBD) that fuses visual and text input, which uses a Cross-attention module to filter information in heterogeneous modal feature representations that is not informative or misleading. Khatter et al propose a multi-modal variation automatic encoder that digs false news and rumors in social media by learning joint representations of multi-modal content in a push.

The inventor finds that the following technical problems exist in the current text detection: in one aspect, the above-described method is directed to capturing a shared semantic representation of teletext data, ignoring to some extent the effects of its modality specific information.

On the other hand, the above-described fusion learning model based on the encoder-decoder framework fails to effectively solve the problem of data redundancy in the mapping process.

Thus, there is a lack of technical means necessary to reduce loss of modality specific information and to solve the problem of data redundancy in the mapping process.

Disclosure of Invention

In order to solve at least one technical problem in the background art, the invention provides a crisis information detection method and system based on an antagonistic multi-modal automatic encoder, which optimize multi-modal representation by reducing heterogeneous modal heterogeneity differences and reducing loss of single-modal information, and can effectively identify and analyze a text on social media.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

A first aspect of the present invention provides a crisis information detection method based on an antagonistic multi-modal automatic encoder, comprising the steps of:

Acquiring multi-mode data to be detected;

Based on the multi-modal data to be detected, learning image modal data and text modal data by adopting an antagonistic multi-modal automatic encoder to obtain multi-modal joint representation;

the construction process of the antagonistic multi-mode automatic encoder comprises the following steps: obtaining an initial multi-modal joint representation based on the text feature representation and the visual feature representation and the automatic encoder module;

Based on the initial multi-mode joint representation and the antagonism module, adopting a minimum maximum game method to perform direct optimization and indirect optimization to obtain a direct multi-mode joint representation and an indirect multi-mode joint representation, and obtaining the multi-mode joint representation based on the direct multi-mode joint representation and the indirect multi-mode joint representation;

and detecting and obtaining crisis information based on the multi-mode joint representation and the detection model.

A second aspect of the present invention provides a crisis information detection system based on an antagonistic multimodal automatic encoder, comprising:

the data acquisition module is used for acquiring multi-mode data to be detected;

the antagonistic multi-modal automatic coding module is used for obtaining multi-modal joint representation by adopting an antagonistic multi-modal automatic coder to learn image modal data and text modal data based on multi-modal data to be detected;

the construction process of the antagonistic multi-mode automatic encoder comprises the following steps: obtaining an initial multi-modal joint representation based on the original text feature representation and the visual feature representation and the automatic encoder module;

And the text detection module is used for detecting and obtaining crisis information based on the multi-mode joint representation and the detection model.

A third aspect of the present invention provides a computer-readable storage medium.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps in a method of antagonistic multimodal context detection based on crisis information detection as described above.

A fourth aspect of the invention provides a computer device.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the method of antagonistic multimodal context detection based on crisis information detection as described above when the program is executed.

Compared with the prior art, the invention has the beneficial effects that:

Aiming at the problem that the combined framework of image-text fusion tends to learn related semantic content among cross-modal data and neglect modal specific content, the invention adopts a minimum maximum game method to conduct direct optimization and indirect optimization to obtain direct multi-modal combined representation and indirect multi-modal combined representation based on original text feature representation and visual feature representation, obtains multi-modal combined representation based on the direct multi-modal combined representation and the indirect multi-modal combined representation, and can effectively reduce heterogeneous difference among heterogeneous modal feature representations and reduce loss of single-mode information. The heterogeneous difference among heterogeneous modal data is effectively reduced, and meanwhile loss of modal specific information is further reduced.

Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a flow chart of a method for detecting antagonistic multimodal messages based on crisis information detection according to the present invention;

FIG. 2 is a diagram of a method for detecting antagonistic multimodal messages based on crisis information detection according to the present invention.

Detailed Description

The invention will be further described with reference to the drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Example 1

The embodiment provides an antagonistic multi-mode text detection method based on crisis information detection, which comprises the following steps:

Step 1: acquiring real multi-mode text content of a user on social media;

Step 2: extracting original text characteristic representation and visual characteristic representation of the multi-mode text content respectively;

step 3: based on the multi-mode text content, learning image mode data and text mode data by adopting an antagonistic multi-mode automatic encoder to obtain multi-mode joint representation;

Step 4: and detecting and obtaining crisis information based on the multi-mode joint representation and the detection model.

As one or more examples, in step 1, the present example conducted an experiment on CRISIS MMD datasets. The data set is obtained by capturing the true push of a user on social media, wherein the push Tw= { T, V } of multi-mode content is obtained, V represents visual content in the push, T represents text content in the push, each push contains data of two modes of images and texts, and the data set structure and the division thereof are shown in table 1:

Wherein Task-1 is intended to determine whether a tweet in the dataset contains potential information related to a crisis event, if so, it is labeled informative, otherwise it is labeled not-informative.

Task-2 aims at further mining the information types of the tweets, and comprises affected-individuals、recue-volunteering or donation-effect、infrastructure and utility-damage、other-relevant-information、not-humanitarian information types in total. For convenience, the data is abbreviated A, R, I, O, N.

Table 1 data set structure and partitioning thereof

As one or more embodiments, in step 2, in the extracting process of the original text feature representation and the visual feature representation of the multi-mode push content:

wherein the original text feature representation is extracted by text content and fine tuned Bert, comprising:

Preprocessing is needed before extracting text modal data characteristics: deleting the user handle of the forwarding title, the stop words and punctuation marks in the text sentence, calling Bert tokenizer to perform word segmentation operation on the sentence, and finally adding set characters into the beginning and the end of each sentence respectively to generate preprocessed text modal data.

And processing the preprocessed text content into an input form required by the Bert, extracting text feature representation by using the Bert, and constraining the dimension of the output text feature representation by using a full-connection layer.

The calculation formula is shown as formula (1):

Rt＝σ(w_tp_t+b_t) (1)

Where w _t represents the weight of the fully connected layer, p _t represents the output of Bert, and b _t represents the bias matrix of the fully connected layer.

The original visual feature representation is extracted by visual content and ResNet, including:

the image size is uniformly adjusted to 224 multiplied by 224, and the operations such as random horizontal overturn and random vertical overturn are used for augmentation, so that the generalization capability of the model can be enhanced while the number of samples is increased.

When the visual characteristics are extracted, the visual characteristics are expressed in dimensions by changing the dimension constraint output of the full connection layer, and a calculation formula is shown in a formula (2):

Rv＝σ(w_vp_v+b_v) (2)

Where σ represents the activation function ReLU, w _v represents the weight of the fully connected layer, p _v represents the output of Adaptivepool layers in ResNet, and b _v represents the bias matrix of the fully connected layer.

As one or more embodiments, in step 3, the antagonistic multimode automatic encoder includes two encoder components and two decoder components, and a bidirectional gated loop unit structure component (BGA block) based on an attention mechanism is adopted, where each component is composed of N _B BGA blocks and a full connection layer.

The encoder components therein are collectively referred to as C _enc for encoding and aggregating the visual and text feature representations Rv, rt into a multimodal joint representation Rj.

The decoder components therein are collectively referred to as C _dec for reconstructing the aggregated multi-modal joint representation Rj into a visual feature representationWith text feature representationLoss of unimodal information is prevented by the reconstruction process to optimize the multi-modal joint variable.

The calculation process is shown in formula (3):

Rj＝C_enc({Rv,Rt},θ_enc)

Wherein θ _enc and θ _dec represent parameters of the encoder and decoder components, respectively.

One BGA block contains a layer of Bi-gating cyclic units (Bi-GRU) and a layer of Self-attribute, which can be used for extracting the characteristics of the enhanced context information.

The method specifically comprises the following steps:

(1) Acquiring a feature representation x _G with context information using Bi-GRU;

(2) Obtaining a characteristic representation x _A of enhanced context information by using Self-Attention mechanism, weighting x _G of the information with the context to obtain three intermediate variable matrixes of q, k and v, calculating Attention score, normalizing by using softmax to obtain an Attention value corresponding to v, multiplying the Attention value by the corresponding v and accumulating to obtain an output value of the ith position

The calculation process is as shown in formula (4):

(3) Finally, x _G and x _A are added and normalized using a process similar to residual error.

As one or more embodiments, in step 3, the process of obtaining the direct multi-modal joint representation and the indirect multi-modal joint representation by using the minimum maximum game method to obtain the direct optimization and the indirect optimization based on the original text feature representation and the visual feature representation includes:

encoding the original text feature representation and the visual feature representation, and directly optimizing the original text feature representation and the visual feature representation based on the encoded text feature representation and visual feature representation to obtain a direct multi-mode joint representation; reconstructing based on the direct multi-modal joint representation to obtain a reconstructed text feature representation and a visual feature representation; performing indirect optimization based on the original text feature representation and the visual feature representation and the reconstructed text feature representation and the visual feature representation to obtain an indirect multi-modal joint representation;

The minimum maximum game method ensures that the best possible outcome is obtained among the worst possible outcomes, regardless of how other decision makers do.

The method specifically comprises the following steps:

In order to be able to effectively reduce the heterogeneity differences between heterogeneous modal feature representations and to reduce the loss of unimodal information, an antagonism module is added for directly or indirectly optimizing the multimodal joint representation.

The structure is shown in fig. 2, and is named AMAE-M (added in the middle for directly optimizing the multi-modal joint representation) and AMAE-L (added in the later position for indirectly optimizing the multi-modal joint representation) respectively according to the difference of the adding positions.

For AMAE-M, as shown in fig. 1, a first resistance module is added before generating the multi-modal joint representation, the input of the first resistance module is an encoded visual feature representation Rv 'and an encoded text feature representation Rt', and an authentication tag is output for distinguishing Rv 'from Rt'. This allows for the direct optimization of the polymerization process for the multi-modal joint representation.

The first resistance module is intended to distinguish Rv 'from Rt' and the encoder component (as a generator) attempts to confuse ADVERSARIAL MODULE to classify Rv 'and Rt' into one class.

The heterogeneity difference of Rv 'and Rt' is reduced by such a minimum and maximum game, so that the joint representation can be directly optimized.

Defining the resistance module herein as C _adv, the calculation process can be expressed as equation (5), wherein,Representing a predictive decision tag, θ _adv represents a parameter of the resistance module here. The loss of resistance module L _adv can be expressed as equation (6), where y _a represents a true discrimination tag.

Reconstruction losses L _rct and L _rcv for the unimodal information are calculated using MSELoss, where N _v and N _t represent the length of the visual feature representation and the length of the text feature representation, respectively, and the calculation process can be expressed as shown in equation (7):

the final loss function L _fin can be expressed as equation (8):

L_fin(θ_enc,θ_dec,θ_adv,θ_det)＝L_det+L_rcv+L_rct-L_adv (8)

For AMAE-L, as shown in fig. 1, the present embodiment adds a second antagonism module during reconstruction, so that the reconstruction loss of the unimodal information can be optimized by the second antagonism module, thereby indirectly optimizing the multi-modal joint representation.

The second counterresistance module is used for reconstructing the characteristicsDistinguishing from source features { Rv, rt }, while the encoder and decoder components (as generators) attempt to confuse the second resistance module to reconstruct featuresAnd { Rv, rt } are classified as one type. Loss of unimodal information is reduced by such a minimum and maximum game and the multimodal joint representation is optimized indirectly.

The second resistance modules herein are defined as C _adv1 and C _adv2, respectively. The calculation process can be expressed as formula (9)

Wherein,Representing a predictive decision tag, θ _adv(i) represents a parameter of the second resistance module here. Losses L _adv1 and L _adv2 of the second resistance module can be expressed as formula (10), where y _a1 and y _a2 represent true discrimination tags.

The final loss function may be expressed as L _fin:

L_fin(θ_enc,θ_dec,θ_adv,θ_det)＝L_det-L_adv1-L_adv2

For AMAE-ML, we add the antagonism module simultaneously before and during the reconstruction to generate the multi-modal joint representation, then the loss function can be expressed as equation (11):

L_fin(θ_enc,θ_dec,θ_adv,θ_det)＝L_det-L_adv-L_adv1-L_adv2 (11)

For AMAE-M, AMAE-L and AMAE-ML, we find the best parameters by minimizing the final loss L _fin

As one or more embodiments, in step 4, crisis information is detected based on the direct multi-modal joint representation and the indirect multi-modal joint representation and the detection model;

the detection model comprises three full-connection layers, dropout is added between a first full-connection layer and a second full-connection layer to prevent overfitting, and a tanh activation function is added between the second full-connection layer and a third full-connection layer.

Defining the detection model as C _det, and calculating as shown in formula (12), whereinRepresenting the predictive label, θ _det represents a parameter of the detection model.

The cross entropy function is used to calculate the detection loss L _det, as shown in equation (13), where y represents the real label.

The specific experimental verification process is as follows:

The dimension of the feature representation is constrained using a full-join layer of dimension 128, the number of BGA-blocks N _B that make up the encoder and decoder is set to 3, the full-join layer dimension in the encoder assembly is set to 128, and the full-join layer dimension in the decoder assembly is set to 2048 and 49, respectively. For the detection model, the dimensions of the three fully connected layers are set to 512,128,2 or 5 (corresponding to the number of categories of Task-1 or Task-2), respectively. For the resistance module, the GRU dimension is set to 128 and the dimension of the fully connected layer is set to 512,128,2.

A batch of 128 instances was used in the training of the entire model, which was trained for 500 epochs. The initial learning rate was set to 1e ^-3, the weight decay period was set to 50 epochs, and the weight decay rate was 1e ^-4. To find the best parameters of the model Adam was used as an optimizer.

In order to study deeply the proposed AMAE model and the functions of the individual component modules, several ablation experiments were designed, which are individual components or simplified variants in AMAE.

Texual the text-in-text detection is done based on unimodal text content using only the Bert model in the feature extraction module.

Vi sua l uses only ResNet model in the feature extraction module to accomplish the tweet detection based on unimodal visual content.

MAE eliminates the auto encoder module of the resistance module.

AMAE-M adds an antagonism module prior to generating the multimodal joint representation for directly optimizing the multimodal joint representation.

AMAE-L incorporates an antagonism module during reconstruction for indirectly optimizing the multimodal joint representation.

AMAE-ML adds the complete model of the resistance module simultaneously before and during reconstruction to generate the multi-modal joint representation.

The performance results of all the models described above are shown in table 2. It can be observed that the AMAE model presented herein is significantly better than all baseline models in both tasks. Compared with a single-peak baseline detection model, the AMAE model improves the accuracy by 4% -8%. Compared with a multimodal baseline detection model, the AMAE model improves the accuracy by 2% -4%. This is sufficient to demonstrate that the use of the antagonism module herein can effectively optimize the multimodal joint representation, enabling further mining of the complementarity information of the multimodal content.

To further verify the performance of the AMAE model, the performance of the proposed model was compared with a representative method in the relevant work, specifically including:

att_rnn is a recurrent neural network with added attention mechanisms aimed at fusing text, visual nuclear social context features to generate joint representations for mining rumors on social media. In the experiments herein, parts dealing with social context features were removed.

EANN includes three parts, a multi-modal feature extractor, a false news detector, and an event discriminator. Wherein the event discriminator is operable to generate a transferable, event-invariant feature. In the experiments herein, the complete structure thereof was used.

MVAE is a multi-modal variation auto-encoder aimed at fusing text with visual features to generate a joint representation to mine rumors on social media. In the experiments herein, the complete structure thereof was used.

The SCBD comprises a feature extraction module, a cross attention module and three modules which are embedded in a random sharing way. In the experiments herein, the complete structure thereof was used.

Table 3 shows the results of the ablation experiments and table 4 shows the results of the comparative experiments, and it can be observed that the AMAE model presented herein is significantly better than the related work. Compared with MVAE, the accuracy is improved by 2% -3%, which proves that the antagonism strategy can effectively optimize the multi-mode joint representation. Compared with EANN using the antagonism strategy, the accuracy is improved by 1% -3%. This demonstrates that AMAE can learn better multimodal joint representations. Overall, the overall performance of AMAE exceeded all baselines, which verifies the validity of the model presented herein.

Table 3 ablation experimental results

Table 4 results of comparative experiments

Example two

The embodiment provides an antagonistic multimodal text detection system based on crisis information detection, comprising:

Example III

The present embodiment provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps in a method of antagonistic multimodal context detection based on crisis information detection as described above.

Example IV

The embodiment provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps in the method for detecting the antagonistic multimodal context based on crisis information when executing the program.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disc, a Read-Only Memory (ROM), a Random access Memory (Random AccessMemory, RAM), or the like.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The crisis information detection method based on the antagonistic multi-mode automatic encoder is characterized by comprising the following steps of:

Acquiring multi-mode data to be detected;

obtaining crisis information based on multi-mode joint representation and detection model detection;

the method for obtaining the direct multi-mode joint representation and the indirect multi-mode joint representation by adopting the minimum and maximum game method for direct optimization and indirect optimization comprises the following steps:

Encoding the original text feature representation and the visual feature representation, and directly optimizing the original text feature representation and the visual feature representation based on the encoded text feature representation and visual feature representation to obtain a direct multi-mode joint representation; reconstructing based on the direct multi-modal joint representation to obtain a reconstructed text feature representation and a visual feature representation; and performing indirect optimization based on the original text feature representation and the visual feature representation and the reconstructed text feature representation and the visual feature representation to obtain an indirect multi-modal joint representation.

2. The method for detecting crisis information based on an antagonistic multimodal automatic encoder according to claim 1, wherein the original text feature representation is extracted by text content in a multimodal content and fine-tuned Bert, and the visual feature representation is extracted by visual content in a multimodal content and ResNet.

3. The method for detecting crisis information based on an antagonistic multimodal automatic encoder according to claim 1, wherein preprocessing is required before extracting the text modality data features: deleting the user handle of the forwarding title, the stop words and punctuation marks in the text sentence, then performing word segmentation operation on the sentence, and finally adding set characters into the beginning and the end of each sentence respectively to generate preprocessed text modal data.

4. The crisis information detection method based on an antagonistic multimodal automatic encoder according to claim 1, wherein in the encoding of the original text feature representation and the visual feature representation, encoder components are used for encoding, each component comprises a plurality of BGA blocks and a fully connected layer, wherein each BGA block comprises a Bi-GRU and a Self-attribute, and the context information is acquired through the Bi-GRUEnhanced context information acquisition by Self-attribute。

5. The crisis information detection method based on an antagonistic multi-mode automatic encoder according to claim 1, wherein the detection model comprises three fully connected layers, dropout is added between a first fully connected layer and a second fully connected layer, and a tanh activation function is added between the second fully connected layer and a third fully connected layer.

6. The method for detecting crisis information based on an antagonistic multi-modal automatic encoder according to claim 1, wherein the detection loss is calculated using a cross entropy function.

7. Crisis information detecting system based on antagonism multimode automatic encoder, characterized by comprising:

8. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps in the method for detecting crisis information based on an antagonistic multimodal automatic encoder according to any one of claims 1 to 6.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps in the method for detecting crisis information based on an antagonistic multimodal automatic encoder according to any of the claims 1-6 when the program is executed.