CN114973041A - Language prior method for overcoming visual question and answer based on self-contrast learning - Google Patents

Language prior method for overcoming visual question and answer based on self-contrast learning Download PDF

Info

Publication number
CN114973041A
CN114973041A CN202111557673.3A CN202111557673A CN114973041A CN 114973041 A CN114973041 A CN 114973041A CN 202111557673 A CN202111557673 A CN 202111557673A CN 114973041 A CN114973041 A CN 114973041A
Authority
CN
China
Prior art keywords
image
attention
self
contrast
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111557673.3A
Other languages
Chinese (zh)
Inventor
孔凡彦
刘利军
黄青松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202111557673.3A priority Critical patent/CN114973041A/en
Publication of CN114973041A publication Critical patent/CN114973041A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a language prior method for overcoming visual question answering based on self-contrast learning, which comprises the steps of firstly, completing image feature extraction through a pre-training model, and inputting a question word after the question word is embedded into a GRU (general purpose unit) to generate a question feature; then, an attention mechanism is used for the problem and the image, the problem characteristic and the image characteristic are fused into a joint representation characteristic, a weighted value learned by the attention mechanism is used for an anti-attention layer, the problem characteristic and the anti-image characteristic are fused into another joint representation characteristic, and the two characteristics are compared; VQA classification loss through optimization basis and self-contrast utilizationLearning loss L scl To increase the correlation of the problem with the image, these two losses are constructed as a joint loss untraining. The method is established on a model LMH, the most advanced performance of 59.00% is realized on the most common reference VQA CP v2 under the condition of not using an auxiliary task, and the absolute performance is improved by 6.51%.

Description

Language prior method for overcoming visual question and answer based on self-contrast learning
Technical Field
The invention relates to a language prior method for overcoming visual question answering based on self-contrast learning, belonging to the technical field of computer natural language processing and computer vision.
Background
The purpose of the Visual Question Answering (VQA) is to automatically answer natural language questions based on visual content, which is one of the benchmark tasks for multimodal (e.g., language and image). It requires visual analysis, language understanding, multimodal information fusion and reasoning. In recent years, VQA has attracted considerable interest, publishing various sets of reference data. With the introduction of a large number of works, the VQA task made a significant advance. A large number of works attempt to understand images and problems, but recent studies find that these works are largely driven by surface language correlation (i.e., language priors) in training QA pairs, ignoring image content. For example, these models tend to answer "how many … …? "question and" tennis "to answer" what movement … …? "and ignores the combination of image content to reason about this problem. To help address these bias factors, Agrawal et al created diagnostic benchmarks VQA-CP in 2018 (VQA altered priors by reorganizing the training and validation splits of the individual VQA datasets). Most prior works have designed various attention mechanisms to learn relationships between modalities, which can work well on VQA benchmark tests (e.g., VQA v 2). However, the performance of these works drops significantly on VQA-CP due to language priors. To alleviate language priors, the existing work focuses on reducing statistical priors of the problem and increasing image dependency and interpretability, which can be roughly divided into learning with auxiliary tasks and learning without auxiliary tasks. Learning without auxiliary tasks uses auxiliary QA branches to normalize the training of the target VQA model or a particular complex learning strategy. These methods attempt to add an auxiliary branch to capture the language prior to diminish its effect. The auxiliary task learning introduces additional manual supervision and auxiliary tasks (visual basis, image captioning, etc.) to increase image dependency and interpretability). The methods can better understand the image content under the guidance of auxiliary tasks, thereby achieving better performance. But the inherent data bias is severe leading to surface language correlation. Therefore, it is important to reduce the native language prior without introducing additional annotations while relying on the relevant visual areas for making decisions. Existing work to reduce language priors can be broadly divided into learning with and without auxiliary tasks. Learning through auxiliary tasks. These works introduce additional manual supervision and assistance tasks to increase image dependency and interpretability.
Disclosure of Invention
The invention provides a language prior method for overcoming visual question and answer based on self-contrast learning, which overcomes VQA language prior problem by using novel self-contrast learning, can concentrate on relevant areas to predict correct answers of given problems about input images, and improves the reasoning ability and robustness of VQA model.
The technical scheme of the invention is as follows: a language prior method for overcoming visual question answering based on self-contrast learning comprises the following specific steps:
step1, firstly, taking the questions, the images and answer options as experimental data, secondly, preprocessing the images to extract a feature map, and preprocessing the questions to generate question feature vectors;
step2, using attention layer learning to identify the image and the problem related area; after Step1 preprocessing, the attention mechanism uses the problem to calculate attention weights on image regions to locate the image region associated with the problem, and the resulting problem feature q and weighted image features
Figure BDA0003419586740000021
Fusing to a joint representation r;
step3, using the anti-attention layer to identify image areas that are currently not or less relevant; the question feature q and the weighted inverse image feature are weighted by the attention weight value obtained by Step2
Figure BDA0003419586740000022
Fused into a joint representation r 0 Focusing the problem on irrelevant areas and ignoring relevant areas on the image to form contrast;
step4, post-treatment: joint representation r by Step3 0 And the joint representation r obtained at Step1, training the proposed network to optimize the joint loss from the contrast loss Lscl and the base VQA classification loss Lvqa, we can focus on the relevant areas to predict the correct answer to a given question with respect to the input image.
Further, the Step1 includes the following steps:
step1.1, firstly, extracting a series of visual target characteristics from an image by using a pre-training model Faster-R-CNN;
step1.2, performing word embedding on the problem, and transmitting the problem to a single-layer GRU to generate a problem characteristic;
further, the specific Step of Step2 is as follows:
step2.1, after extracting the image and the problem features, transmitting the image and the problem features to an attention layer, and converting the image features and the problem features into a space with the same dimension;
step2.2, calculating attention weight, then generating a normalized attention weight for each feature map, wherein the final image feature is the weighted sum of all input features;
step2.3, fusing the weighted image characteristics and the question characteristics obtained by Step1.2 into a joint characteristic representation r, and further calculating the probability distribution of each answer a in the candidate answer set A
Figure BDA0003419586740000023
Further, the specific steps of Step3 are as follows:
step3.1, the attention mechanism uses the problem to compute attention weights on image regions to locate the image region associated with the problem. However, the counter-attention mechanism is the opposite of the attention mechanism. It helps VQA model overcome language priors by focusing the problem on irrelevant areas and ignoring relevant areas on the image to form contrasts.
Step3.2, normalized attention-back using attention weight a obtained from attention layer, centered on attention weight value obtained at Step2A force weight α' which performs a negative operation, opponent (a) ═ a or opponent (a) ═ e -a Making the larger weight smaller and the smaller weight larger, so that the attention weight output using the softmax function is focused on irrelevant areas;
step3.3, after learning the inverse attention weight, generating the weighted inverse image feature
Figure BDA0003419586740000031
Then, similar to the attention layer, we will weight the image features
Figure BDA0003419586740000032
Fusion of problem feature q obtained with Step1.2 to joint feature representation r 0 Further, the probability distribution of each answer a in the candidate answer set A is calculated
Figure BDA0003419586740000033
Further, the specific steps of Step4 are as follows:
step4.1, first loss layer contains two branches, the first one aiming at exploiting the underlying VQA model probability distribution
Figure BDA0003419586740000034
It is optimized by minimizing the cross-entropy loss of the binary, the loss function being defined as Lvqa;
step4.2, the other branch is a self-contrast layer, which aims at increasing the correlation and the dependency between the question and the image by using the answer distribution predicted by the self-contrast layer, firstly considering an objective function similar to QICE [36], benefiting from the objective function, and considering that certain correlation exists between the answers predicted by the question based on relevant and irrelevant areas in the same image, namely the predicted answers are mutually exclusive, thus excluding the partial answers defined by the self-contrast layer and then providing the self-contrast learning loss Lscl to increase the correlation between the question and the image;
step4.3, training the proposed net to optimize the joint loss of self-contrast loss Lscl in step4.2 and the underlying VQA classification loss Lvqa in step4.1, by which we can focus on the relevant areas to predict the correct answer to a given question with respect to the input image.
Further, the model of the anti-attention layer is similar to the attention layer, first using the attention weight a obtained by the attention layer, the normalized anti-attention weight α' can be calculated as: α' ═ softmax (opponent (a)), after learning the inverse attention weights, we generate inverse image features
Figure BDA0003419586740000035
As follows:
Figure BDA0003419586740000036
we will weight the inverse image features
Figure BDA0003419586740000037
Fusion of problem feature q obtained with Step1.2 to joint feature representation r 0 Further, the probability distribution of each answer a in the candidate answer set A is calculated
Figure BDA0003419586740000038
The formula is as follows: q's' 0 =f″ q (q), v′ 0 =f″ v (v),
Figure BDA0003419586740000039
Figure BDA00034195867400000310
Wherein, f v ,f q ,f 0 Is a transformation function and w 0 Representing the weight matrix that needs to be learned.
The invention has the beneficial effects that:
1. the present invention solves the VQA problem by a novel self-contrast learning method that overcomes language priors by comparing answers generated for problem-related and problem-unrelated regions in an image.
2. After self-contrast learning training, the model is forced to learn more information from the relevant image regions. It effectively increases the semantic dependency and interpretability of the image. In this way, image features and problem context no longer exist in isolation during the modeling process.
3. Numerous experiments were performed on popular benchmark tests VQA-CP v1 and VQA-CP v 2. Experimental results show that our method can significantly improve the performance of the baseline data set without the use of additional annotations. In particular, by constructing on top of the LMH model, we achieved 59.00% of the most advanced performance at VQA-CP v2, with an absolute performance improvement of 6.51%.
Drawings
FIG. 1 is a block diagram of a language prior method for overcoming visual question-answering based on self-contrast learning;
FIG. 2 is a comparison of the present invention with several variations of the overcome language prior VQA model;
FIG. 3 is an example of the self-contrast learning of the present invention;
FIG. 4 is a flow chart of the present invention.
Detailed Description
Example 1: as shown in fig. 1-4, a language prior method for overcoming visual question answering based on self-contrast learning is characterized in that: the method comprises the following specific steps:
step1, firstly, taking the questions, the images and answer options as experimental data, secondly, preprocessing the images to extract a feature map, and preprocessing the questions to generate question feature vectors;
step2, using attention layer learning to identify the image and the problem related area; after Step1 preprocessing, the attention mechanism uses the problem to calculate attention weights on image regions to locate the image region associated with the problem, and the resulting problem feature q and weighted image features
Figure BDA0003419586740000041
Fusion to a joint representation r;
step3, using the anti-attention layer to identify image areas that are currently not or less relevant; with the attention weight value obtained at Step2, the question will be askedFeature q and weighted inverse image feature
Figure BDA0003419586740000042
Fused into a joint representation r 0 Focusing the problem on irrelevant areas and ignoring relevant areas on the image to form contrast;
step4, post-processing: joint representation r by Step3 0 And the joint representation r obtained at Step1, training the proposed network to optimize the joint loss from the contrast loss Lscl and the base VQA classification loss Lvqa, we can focus on the relevant areas to predict the correct answer to a given question with respect to the input image.
Further, the Step1 includes the following steps:
step1.1, firstly, extracting a series of visual target characteristics from an image by using a pre-training model Faster-R-CNN;
step1.2, performing word embedding on the problem, and transmitting the problem to a single-layer GRU to generate a problem characteristic;
further, the specific steps of Step2 are as follows:
step2.1, after extracting the image and the problem features, transmitting the image and the problem features to an attention layer, and converting the image features and the problem features into a space with the same dimension;
step2.2, calculating attention weight, then generating a normalized attention weight for each feature map, wherein the final image feature is the weighted sum of all input features;
step2.3, fusing the weighted image features and the question features obtained by Step1.2 into a joint feature representation r, and further calculating the probability distribution of each answer a in the candidate answer set A
Figure BDA0003419586740000051
Further, the specific steps of Step3 are as follows:
step3.1, the attention mechanism uses the problem to compute attention weights on image regions to locate the image region associated with the problem. However, the counter-attention mechanism is the opposite of the attention mechanism. It helps VQA model overcome language priors by focusing the problem on irrelevant areas and ignoring relevant areas on the image to form contrasts.
Step3.2, starting with the attention weight value obtained at Step2 as the center, uses the attention weight a obtained at the attention level, the normalized attention-deficit weight α', which performs a negative operation, opponent (a) or opponent (a) e -a Making the larger weight smaller and the smaller weight larger, so that the attention weight output using the softmax function is focused on irrelevant areas;
step3.3, after learning the inverse attention weight, generating the weighted inverse image feature
Figure BDA0003419586740000052
Then, similar to the attention layer, we will weight the image features
Figure BDA0003419586740000053
Fusion of problem feature q obtained with Step1.2 to joint feature representation r 0 Further, the probability distribution of each answer a in the candidate answer set A is calculated
Figure BDA0003419586740000054
Further, the specific steps of Step4 are as follows:
step4.1, first loss layer contains two branches, the first one aiming at exploiting the underlying VQA model probability distribution
Figure BDA0003419586740000055
It is optimized by minimizing the cross-entropy loss of the binary, the loss function being defined as Lvqa;
step4.2, another branch is self-contrast layer, aiming at increasing the correlation and dependency between the question and the image by using the answer distribution predicted by the self-contrast layer, firstly considering an objective function similar to QICE [36], benefiting from the objective function, and considering that certain correlation exists between the answers predicted by the question based on relevant and irrelevant areas in the same image, namely the predicted answers are mutually exclusive, thus excluding the partial answers defined by the self-contrast layer and then providing self-contrast learning loss Lscl to increase the correlation between the question and the image;
step4.3, training the proposed net to optimize the joint loss of self-contrast loss Lscl in step4.2 and the underlying VQA classification loss Lvqa in step4.1, by which we can focus on the relevant areas to predict the correct answer to a given question with respect to the input image.
Further, the model of the anti-attention layer is similar to the attention layer, first using the attention weight a obtained by the attention layer, the normalized anti-attention weight α' can be calculated as: α' ═ softmax (opponent (a)), after learning the inverse attention weights, we generate inverse image features
Figure BDA0003419586740000061
As follows:
Figure BDA0003419586740000062
we will weight the inverse image features
Figure BDA0003419586740000063
Fusion of problem feature q obtained with Step1.2 to joint feature representation r 0 Further, the probability distribution of each answer a in the candidate answer set A is calculated
Figure BDA0003419586740000064
The formula is as follows: q's' 0 =f″ q (q), v′ 0 =f″ v (v),
Figure BDA0003419586740000065
Figure BDA0003419586740000066
Wherein f is v ,f q ,f 0 Is a transformation function and w 0 Representing the weight matrix that needs to be learned.
The present invention has performed a number of experiments on popular benchmark tests VQA-CP v1 and VQA-CP v 2. Experimental results show that the method of the present invention can significantly improve the performance of the reference data set without using additional annotations. In particular, by constructing on top of the LMH model, the most advanced performance of 59.00% was achieved at VQA-CP v2, with an absolute performance improvement of 6.51%, with the results shown in tables 1 and 2.
Table 1 shows the results of the experiment of the present invention at VQA-CP v2
Figure BDA0003419586740000067
Table 2 shows the results of the experiment of the present invention at VQA-CP v1
Figure BDA0003419586740000071
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (6)

1. A language prior method for overcoming visual question answering based on self-contrast learning is characterized in that: the method comprises the following specific steps:
step1, firstly, taking the questions, the images and answer options as experimental data, secondly, preprocessing the images to extract a feature map, and preprocessing the questions to generate question feature vectors;
step2, using attention layer learning to identify the image and the problem related area; after Step1 preprocessing, the attention mechanism uses the problem to calculate attention weights on image regions to locate the image region associated with the problem, and the resulting problem feature q and weighted image features
Figure FDA0003419586730000013
Are fused intoJointly represent r;
step3, using the anti-attention layer to identify image areas that are currently not or less relevant; the question feature q and the weighted inverse image feature are weighted by the attention weight value obtained by Step2
Figure FDA0003419586730000011
Fused into a joint representation r 0 Focusing the problem on irrelevant areas and ignoring relevant areas on the image to form contrast;
step4, post-processing: joint representation r by Step3 0 And Step1, training the network to optimize the joint loss from the contrast loss Lscl and the base VQA classification loss Lvqa, can focus on the relevant regions to predict the correct answer to a given question with respect to the input image.
2. The method for overcoming language priors for visual question answering based on self-contrast learning according to claim 1, wherein: the Step1 comprises the following steps:
step1.1, firstly, extracting a series of visual target characteristics from an image by using a pre-training model Faster-R-CNN;
step1.2, performing word embedding on the problem, and transmitting the problem to a single-layer GRU to generate a problem characteristic.
3. The method for overcoming language priors for visual question answering based on self-contrast learning according to claim 1, wherein: the specific steps of Step2 are as follows:
step2.1, after extracting the image and the problem features, transmitting the image and the problem features to an attention layer, and converting the image features and the problem features into a space with the same dimension;
step2.2, calculating attention weight, then generating a normalized attention weight for each feature map, wherein the final image feature is the weighted sum of all input features;
step2.3, fusing the weighted image features and the question features into a joint feature representation r, and further calculating each answer a in the candidate answer set AProbability distribution of
Figure FDA0003419586730000012
4. The method for overcoming language priors for visual question answering based on self-contrast learning according to claim 1: the specific steps of Step3 are as follows:
step3.1, the attention mechanism uses the problem to calculate attention weights on image regions to locate the image region associated with the problem; however, the counter-attention mechanism is the opposite of the attention mechanism, which helps the VQA model overcome language priors by focusing the problem on irrelevant areas and ignoring relevant areas on the image to form contrasts;
step3.2, starting with the attention weight value obtained at Step2 as the center, uses the attention weight a obtained at the attention level, the normalized attention-deficit weight α', which performs a negative operation, opponent (a) or opponent (a) e -a Making the large weight small and the small weight large, so that the attention weight output using the softmax function is focused on an irrelevant area;
step3.3, after learning the inverse attention weight, generating the weighted inverse image feature
Figure FDA0003419586730000021
The weighted inverse image features are then weighted similarly to the attention layer
Figure FDA0003419586730000022
Fusing the problem feature q obtained from Step1 into a joint feature representation r 0 Further, the probability distribution of each answer a in the candidate answer set A is calculated
Figure FDA0003419586730000023
5. The method for overcoming language priors for visual question answering based on self-contrast learning according to claim 1: the specific steps of Step4 are as follows:
step4.1, first loss layer contains two branches, the first one aiming at exploiting the underlying VQA model probability distribution
Figure FDA0003419586730000024
It is optimized by minimizing the cross-entropy loss of the binary, the loss function being defined as Lvqa;
step4.2, the other branch is a self-contrast layer, which aims to increase the correlation and the dependency between the question and the image by using the answer distribution predicted by the self-contrast layer, firstly considering an objective function similar to QICE [36], benefiting from the objective function, and considering that certain correlation exists between the answers predicted by the question based on relevant and irrelevant areas in the same image, namely the predicted answers are mutually exclusive, so that partial answers defined by the self-contrast layer are eliminated, and then the self-contrast learning loss Lscl is provided to increase the correlation between the question and the image;
step4.3, training the proposed net to optimize the joint loss of self-contrast loss Lscl in step4.2 and the underlying VQA classification loss Lvqa in step4.1, by this approach, we can focus on the relevant areas to predict the correct answer to a given question with respect to the input image.
6. The method for overcoming language priors for visual question answering based on self-contrast learning according to claim 1, wherein: the model of the anti-attention layer is similar to the attention layer, first using the attention weight a obtained by the attention layer, and the normalized anti-attention weight α' is calculated as: α' ═ softmax (opponent (a)), and after learning the attention-back weight, the inverse image feature is generated
Figure FDA0003419586730000025
As follows:
Figure FDA0003419586730000026
the weighted inverse image features
Figure FDA0003419586730000027
Fusing with the obtained problem feature q into a joint feature representation r 0 Further, the probability distribution of each answer a in the candidate answer set A is calculated
Figure FDA0003419586730000028
The formula is as follows: q's' 0 =f″ q (q),v′ 0 =f″ v (v),
Figure FDA0003419586730000029
Figure FDA00034195867300000210
Wherein f is v ,f q ,f 0 Is a transformation function and w 0 Representing the weight matrix that needs to be learned.
CN202111557673.3A 2021-12-20 2021-12-20 Language prior method for overcoming visual question and answer based on self-contrast learning Pending CN114973041A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111557673.3A CN114973041A (en) 2021-12-20 2021-12-20 Language prior method for overcoming visual question and answer based on self-contrast learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111557673.3A CN114973041A (en) 2021-12-20 2021-12-20 Language prior method for overcoming visual question and answer based on self-contrast learning

Publications (1)

Publication Number Publication Date
CN114973041A true CN114973041A (en) 2022-08-30

Family

ID=82974506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111557673.3A Pending CN114973041A (en) 2021-12-20 2021-12-20 Language prior method for overcoming visual question and answer based on self-contrast learning

Country Status (1)

Country Link
CN (1) CN114973041A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117079142A (en) * 2023-10-13 2023-11-17 昆明理工大学 Anti-attention generation countermeasure road center line extraction method for automatic inspection of unmanned aerial vehicle

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117079142A (en) * 2023-10-13 2023-11-17 昆明理工大学 Anti-attention generation countermeasure road center line extraction method for automatic inspection of unmanned aerial vehicle
CN117079142B (en) * 2023-10-13 2024-01-26 昆明理工大学 Anti-attention generation countermeasure road center line extraction method for automatic inspection of unmanned aerial vehicle

Similar Documents

Publication Publication Date Title
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN109271505B (en) Question-answering system implementation method based on question-answer pairs
CN110263912B (en) Image question-answering method based on multi-target association depth reasoning
CN110490946B (en) Text image generation method based on cross-modal similarity and antagonism network generation
CN112100351A (en) Method and equipment for constructing intelligent question-answering system through question generation data set
CN112364150A (en) Intelligent question and answer method and system combining retrieval and generation
CN109670168B (en) Short answer automatic scoring method, system and storage medium based on feature learning
Jandial et al. SAC: Semantic attention composition for text-conditioned image retrieval
CN108765383A (en) Video presentation method based on depth migration study
CN112860945B (en) Method for multi-mode video question answering by using frame-subtitle self-supervision
CN112527993B (en) Cross-media hierarchical deep video question-answer reasoning framework
CN116166782A (en) Intelligent question-answering method based on deep learning
CN113220856A (en) Multi-round dialogue system based on Chinese pre-training model
CN112905762A (en) Visual question-answering method based on equal attention-deficit-diagram network
Gomez-Perez et al. ISAAQ--Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention
CN110516240B (en) Semantic similarity calculation model DSSM (direct sequence spread spectrum) technology based on Transformer
Guo et al. Which is the effective way for gaokao: Information retrieval or neural networks?
CN114973041A (en) Language prior method for overcoming visual question and answer based on self-contrast learning
CN112749257A (en) Intelligent marking system based on machine learning algorithm
CN116821297A (en) Stylized legal consultation question-answering method, system, storage medium and equipment
CN116561274A (en) Knowledge question-answering method based on digital human technology and natural language big model
CN111144134A (en) Translation engine automatic evaluation system based on OpenKiwi
CN111259860B (en) Multi-order characteristic dynamic fusion sign language translation method based on data self-driving
CN114997175A (en) Emotion analysis method based on field confrontation training
CN114218439A (en) Video question-answering method based on self-driven twin sampling and reasoning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination