CN114691847A - Relational attention network visual question-answering method based on deep perception and semantic guidance - Google Patents
Relational attention network visual question-answering method based on deep perception and semantic guidance Download PDFInfo
- Publication number
- CN114691847A CN114691847A CN202210231121.1A CN202210231121A CN114691847A CN 114691847 A CN114691847 A CN 114691847A CN 202210231121 A CN202210231121 A CN 202210231121A CN 114691847 A CN114691847 A CN 114691847A
- Authority
- CN
- China
- Prior art keywords
- image
- attention
- correlation
- visual
- namely
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000000007 visual effect Effects 0.000 title claims abstract description 43
- 238000000034 method Methods 0.000 title claims abstract description 25
- 230000008447 perception Effects 0.000 title claims abstract description 19
- 230000007246 mechanism Effects 0.000 claims abstract description 20
- 239000000126 substance Substances 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 6
- 235000006629 Prosopis spicigera Nutrition 0.000 claims description 3
- 240000000037 Prosopis spicigera Species 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 238000002474 experimental method Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- YCKRFDGAMUMZLT-UHFFFAOYSA-N Fluorine atom Chemical compound [F] YCKRFDGAMUMZLT-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 229910052731 fluorine Inorganic materials 0.000 description 1
- 239000011737 fluorine Substances 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
- G06T7/62—Analysis of geometric attributes of area, perimeter, diameter or volume
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Geometry (AREA)
- Human Computer Interaction (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a relation attention network visual question-answering method based on depth perception and semantic guidance, which comprises the following steps of: 1) constructing a three-dimensional space relation between image targets; obtaining a three-dimensional space relation between image targets; 2) acquiring a correlation score of the image targets i and j in a space dimension according to a three-dimensional space relationship between the image targets; 3) acquiring the correlation between the image objects i and j by combining the implicit attention and the explicit attention; 4) according to the framework of the Transformer, the traditional self-attention layer is replaced by an improved attention mechanism, and a visual question-answering model is obtained. The invention introduces the correlation of the three-dimensional space into the traditional self-attention mechanism, and improves the accuracy of the visual question answering.
Description
Technical Field
The invention relates to a natural language processing technology, in particular to a relational attention network visual question-answering method based on deep perception and semantic guidance.
Background
Conventional visual question answering methods are generally based on depth feature fusion models, such as bilinear BLOCK diagonal fusion (BLOCK) and Self-attention mechanism fusion (Self-attention), but these methods have difficulty solving the answer of complex problems with spatial relationship reasoning. With the progress of deep learning, many studies based on deep neural network models are dedicated to improving the effect of the visual question-answering task, which generally extracts an image target visual representation and a word vector representation from an image and a text respectively for input, and implements multi-modal entity alignment in an end-to-end training manner, and then adopts a multi-classification strategy to predict an answer. Recently, many research works construct models based on Attention networks (Attention networks), and although the models show excellent performance on the visual question-answering task, the models do not consider the spatial relationship or semantic relationship between image targets, so that the models have limitation on complex question answering related to visual reasoning.
Disclosure of Invention
The invention aims to solve the technical problem of providing a relational attention network visual question-answering method based on depth perception and semantic guidance aiming at the defects in the prior art.
The technical scheme adopted by the invention for solving the technical problems is as follows: a relational attention network visual question-answering method based on depth perception and semantic guidance comprises the following steps:
1) three-dimensional spatial relationship construction between image objects
The visual relationship in two-dimensional space is calculated using the rectangular box coordinates of the image objects, which two-dimensional spatial relationship is expressed as for image objects i and jIt can be derived from the rectangular boxes of the two image objects by a specific calculation, namely:
wherein the content of the first and second substances,wi,hirespectively representing the coordinate, the width and the height of the central point of a rectangular frame of the image target i;
according to the depth distance value dep of the center point of the rectangular frame of the image targets i and jiAnd depjThen calculating the visual relationship in the depth spaceNamely:
wherein the content of the first and second substances,the area of the overlapped part of the rectangular frames i and j is shown;
according to the two-dimensional space relation between the image objects i and jAnd depth spatial relationshipThe three-dimensional space relation between the image objects can be obtainedNamely:
wherein the content of the first and second substances,as a learnable weight parameter, ds64 is the dimension of the explicit spatial relationship representation, σ is the activation function ReLU;
2) depth perception and semantic guidance attention mechanism
Using the explicitly modeled three-dimensional spatial relationship described above, it can be used to calculate a correlation score in the spatial dimension between image objects i and jNamely:
wherein f isspaFor calculating the correlation of two image objects in space dimension, from the visual characteristics q of the ith image objectiAnd three-dimensional spatial relationship representationObtained by dot product, namely:
fsemthe method is used for calculating the correlation of the spatial relationship of two image objects and text semantics, namely:
wherein the content of the first and second substances,is a learnable weight parameter;[ CLS ] from BERT model for textual feature representation of problem]The feature representation of the last layer of the position is obtained;
3) combining implicit attention and explicit attention
Correlation alpha between image objects i and jijFinal implicit correlationAnd explicit dependenciesObtained by weighting, namely:
4) attention mechanism incorporated into visual question-answering model
According to the framework of the Transformer, an improved attention mechanism alpha is adoptedijReplacing the conventional self-attention layer, all alpha's are usedijExpressed in matrix form, i.e.The improved Transformer calculation mode is as follows:
wherein, L is the number of transform layers, FFN is a multilayer perceptron (MLP) which is two fully-connected layers and activated by a ReLU hidden layer, that is:
FFN(X)=W2σ(W1X+b1)+b2
The invention has the following beneficial effects:
1. the invention introduces the correlation of the three-dimensional space to the traditional self-attention mechanism, and flexibly expands the correlation to realize the explicit modeling and calculation of the three-dimensional space relation between the image targets;
2. by modeling the three-dimensional spatial relationship between the image targets and designing an attention mechanism for depth perception and semantic guidance on the basis, more accurate spatial correlation calculation is performed between the input image targets by introducing two different attention weight bias terms, namely the correlation weights of the spatial dimension and the semantic dimension, so that the accuracy of visual question answering is improved.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a schematic diagram of an overall model structure of the visual question answering method of the present invention;
FIG. 2 is a schematic structural diagram of a depth perception and semantic guidance relationship attention mechanism in the visual question-answering method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, a method for relational attention network visual question answering based on depth perception and semantic guidance includes the following steps:
the invention provides a depth perception and semantic-guided relational attention network, which calculates spatial correlation between image targets by explicitly modeling three-dimensional spatial relations between the image targetsAnd semantic relevanceThe method mainly comprises the following parts:
1) three-dimensional spatial relationship construction between image objects
The visual relationship in two-dimensional space is calculated using the rectangular box coordinates of the image objects, which two-dimensional space relationship is expressed as image objects i and jIt can be derived from the rectangular boxes of the two image objects by a specific calculation, namely:
wherein, the first and the second end of the pipe are connected with each other,wi,hirespectively representing the coordinate, the width and the height of the central point of a rectangular frame of the image target i;
according to the depth distance value dep of the center point of the rectangular frame of the image targets i and jiAnd depjThen calculating the visual relationship in the depth spaceNamely:
wherein the content of the first and second substances,the area of the overlapped part of the rectangular frames i and j is shown;
according to the two-dimensional space relation between the image objects i and jAnd depth spatial relationshipThe three-dimensional space relation between the image objects can be obtainedNamely:
wherein the content of the first and second substances,as a learnable weight parameter, ds64 is the dimension of the explicit spatial relationship representation, σ is the activation function ReLU;
2) depth perception and semantic guidance attention mechanism
As shown in FIG. 2, the three-dimensional spatial relationship explicitly modeled above may be used to calculate a correlation score in a spatial dimension between image objects i and jNamely:
wherein f isspaFor calculating the correlation of two image objects in space dimension, from the visual characteristics q of the ith image objectiAnd three-dimensional spatial relationship representationObtained by dot product, namely:
fsemthe method is used for calculating the correlation of the spatial relationship of two image objects and text semantics, namely:
wherein the content of the first and second substances,is a learnable weight parameter;[ CLS ] from BERT model for textual feature representation of problem]The feature representation of the last layer of the position is obtained;
3) combining implicit attention and explicit attention
Correlation alpha between image objects i and jijFinal implicit correlationAnd explicit dependenciesObtained by weighting, namely:
the original Transformer model uses an implicit correlation self-attention mechanism to calculate the outputThe correlation between entries assumes that the feature matrix formed by the input image object RoI features isWhere N is the number of detected image objects, dhFor the characteristic dimension, in order to measure the implicit relation between image targets, the invention firstly adopts a scaled dot-product (scaled dot-product) f (-) to calculate the implicit correlation between the image targets i and j, and then adopts a softmax function to normalize all image target neighbors to convert into a correlation scoreSpecifically, the input features X are first mapped to the hidden space of query, key and value, and then used to measure the implicit correlation between two image objects, namely:
qi=Wqxi
kj=Wkxj
vj=Wvxj
wherein, Wq,Wk,Is a learnable full connectivity layer parameter. x is a radical of a fluorine atomi,xjVisual characteristics of the ith, jth image object, qi,kj,vjTo map to the visual features of the hidden space, f (·,) is the scaled dot product function, exp (·) is an exponential function based on the natural number e.
According to the method, the relevance between the image targets can be measured from the characteristic dimension and the space dimension respectively through combining the implicit attention mechanism and the explicit attention mechanism, and compared with the original Transformer which only considers the relevance of the input in the characteristic level, the method also considers the relevance of the input in the space dimension, so that the capability of answering visual reasoning related complex questions is improved.
4) Attention mechanism incorporated into visual question-answering model
According to the framework of the Transformer, an improved attention mechanism alpha is adoptedijReplacing the conventional self-attention layer, all alpha's are usedijExpressed in matrix form, i.e.The improved Transformer calculation mode is as follows:
wherein, L is the number of transform layers, FFN is a multilayer perceptron (MLP) which is two fully-connected layers and activated by a ReLU hidden layer, that is:
FFN(X)=W2σ(W1X+b1)+b2
The invention provides a neural network architecture for a visual question-answering task, which comprises implicit and explicit image target relation modeling, and the following relation reasoning is better realized by implicitly and explicitly constructing the spatial and semantic relation between image targets. The depth perception and semantic guidance relation attention module provided by the invention is incorporated into a self-attention layer in a transform framework, namely, a layer of similarity measuring the spatial relation of image targets and text semantics is added, and a new image target correlation matrix is obtained by adjusting the original self-attention distribution weight, wherein the correlation matrix can reflect the correlation between the image targets in the relation level.
Experiments show that compared with the existing mainstream method, the sequence labeling method provided by the invention has a better effect. The experiment was evaluated using two reference Visual Question Answering datasets, namely, Visual Question Answering v2(VQA v2) and GQA datasets. The detailed information of the data set is shown in table 1.
Table 1 data set information
The experimental section is intended to evaluate the effectiveness of the visual question-answering model proposed by the present invention on different data sets. Specifically, we list the accuracy of the VQA v2 dataset and the GQA dataset as evaluation indexes of the model, and the experimental comparison results are given in table 2 and table 3, respectively.
TABLE 2 VQA v2 data set comparison experiment results
TABLE 3 GQA data set comparison of experimental results
It is noteworthy that, as can be observed from the above two tables, the method proposed by the present invention consistently outperforms all of these reference models in different visual question-answering tasks. Because most of these models focus on the attention of the image target entities, and ignore the modeling of the spatial and semantic relationships of the image targets, the models lack the ability to reason between the image targets. By explicitly modeling the three-dimensional spatial position characteristics of the image targets and adopting an attention mechanism to be combined into a neural network structure, the method provided by the invention can explicitly model the relationship between the image targets, so that the relationship reasoning between the image targets can be realized.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.
Claims (5)
1. A relational attention network visual question-answering method based on depth perception and semantic guidance is characterized by comprising the following steps:
1) three-dimensional spatial relationship construction between image objects
1.1) Using the rectangular box coordinates of the image objects, the visual relationship in two-dimensional space is calculated, for image objects i and j, the two-dimensional spatial relationship is expressed as
1.2) according to the depth distance value dep of the center point of the rectangular frame of the image targets i and jiAnd depjThen calculating the visual relationship in the depth space
1.3) based on the two-dimensional spatial relationship between image objects i and jAnd depth spatial relationshipObtaining three-dimensional spatial relationships between image objectsNamely:
wherein W is a learnable weight parameter, dsThe dimensionality represented by the explicit spatial relationship is sigma of an activation function ReLU;
2) depth perception and semantic guidance attention mechanism
According to the three-dimensional space relationship between the image targets, the correlation score of the image targets i and j in the space dimension is obtainedNamely:
wherein f isspaFor calculating the correlation of two image objects in space dimension, from the visual characteristics q of the ith image objectiAnd three-dimensional spatial relationship representationObtained by dot product, namely:
fsemthe method is used for calculating the correlation of the spatial relationship of two image objects and text semantics, namely:
wherein the content of the first and second substances,is a learnable weight parameter;[ CLS ] from BERT model for textual feature representation of problem]The feature representation of the last layer of the position is obtained;
3) combining implicit attention and explicit attention
Correlation alpha between image objects i and jijBy implicit correlationAnd explicit dependenciesObtained by weighting, namely:
4) attention mechanism incorporated into visual question-answering model
According to the framework of the Transformer, an improved attention mechanism alpha is adoptedijReplacing the conventional self-attention layer, all alpha's are usedijExpressed in matrix form, i.e.The improved Transformer calculation mode is as follows:
wherein, L is the number of layers of a transducer, FFN is a multilayer perceptron which is activated by two layers of full connection layers and adopts a ReLU hidden layer, namely:
FFN(X)=W2σ(W1X+b1)+b2
2. The method for relational attention network visual question answering based on depth perception and semantic guidance according to claim 1, wherein in the step 1.1), two-dimensional spatial relationThe rectangular frames from the two image objects are calculated as follows:
wherein the content of the first and second substances,respectively representing the coordinate, the width and the height of the central point of a rectangular frame of the image target i;respectively, the coordinate of the center point of the rectangular frame of the image target j, the width and the height.
3. The method for visual question-answering based on deep perception and semantically guided relation attention network according to claim 1, wherein, in the step 1.2),
4. The relation attention network visual question-answering method based on deep perception and semantic guidance according to claim 1, wherein in the step 3), implicit correlation is performedThe self-attention mechanism adopting the Transformer model specifically comprises the following steps:
qi=Wqxi
kj=Wkxj
vj=Wvxj
wherein the content of the first and second substances,for learnable full connectivity layer parameters, xi,xjVisual characteristics of the ith, jth image object, qi,kj,vjTo map to the visual features of the hidden space, f (·,) is the scaled dot product function, exp (·) is an exponential function based on the natural number e.
5. The method for visual question-answering based on deep perception and semantically guided relation attention network according to claim 1, wherein in the step 3),
correlation alpha between image objects i and jijBy implicit correlationAnd explicit dependenciesObtained by average weighting, namely:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210231121.1A CN114691847B (en) | 2022-03-10 | 2022-03-10 | Relation attention network vision question-answering method based on depth perception and semantic guidance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210231121.1A CN114691847B (en) | 2022-03-10 | 2022-03-10 | Relation attention network vision question-answering method based on depth perception and semantic guidance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114691847A true CN114691847A (en) | 2022-07-01 |
CN114691847B CN114691847B (en) | 2024-04-26 |
Family
ID=82138315
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210231121.1A Active CN114691847B (en) | 2022-03-10 | 2022-03-10 | Relation attention network vision question-answering method based on depth perception and semantic guidance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114691847B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2711670A1 (en) * | 2012-09-21 | 2014-03-26 | Technische Universität München | Visual localisation |
US20180322646A1 (en) * | 2016-01-05 | 2018-11-08 | California Institute Of Technology | Gaussian mixture models for temporal depth fusion |
CN110377710A (en) * | 2019-06-17 | 2019-10-25 | 杭州电子科技大学 | A kind of vision question and answer fusion Enhancement Method based on multi-modal fusion |
CN111984772A (en) * | 2020-07-23 | 2020-11-24 | 中山大学 | Medical image question-answering method and system based on deep learning |
US20210081728A1 (en) * | 2019-09-12 | 2021-03-18 | Nec Laboratories America, Inc | Contextual grounding of natural language phrases in images |
EP3920048A1 (en) * | 2020-06-02 | 2021-12-08 | Siemens Aktiengesellschaft | Method and system for automated visual question answering |
-
2022
- 2022-03-10 CN CN202210231121.1A patent/CN114691847B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2711670A1 (en) * | 2012-09-21 | 2014-03-26 | Technische Universität München | Visual localisation |
US20180322646A1 (en) * | 2016-01-05 | 2018-11-08 | California Institute Of Technology | Gaussian mixture models for temporal depth fusion |
CN110377710A (en) * | 2019-06-17 | 2019-10-25 | 杭州电子科技大学 | A kind of vision question and answer fusion Enhancement Method based on multi-modal fusion |
US20210081728A1 (en) * | 2019-09-12 | 2021-03-18 | Nec Laboratories America, Inc | Contextual grounding of natural language phrases in images |
EP3920048A1 (en) * | 2020-06-02 | 2021-12-08 | Siemens Aktiengesellschaft | Method and system for automated visual question answering |
CN111984772A (en) * | 2020-07-23 | 2020-11-24 | 中山大学 | Medical image question-answering method and system based on deep learning |
Non-Patent Citations (2)
Title |
---|
HUASONG ZHONG ET AL.: "Selfadaptive neural module transformer for visual question answering", 《IEEE TRANSACTIONS ON MULTIMEDIA》, vol. 23, 18 May 2020 (2020-05-18), pages 1264 - 1273, XP011851901, DOI: 10.1109/TMM.2020.2995278 * |
邱男等: "基于复合图文特征的视觉问答模型研究", 《计算机应用研究》, vol. 38, no. 08, 23 April 2021 (2021-04-23), pages 2293 - 2298 * |
Also Published As
Publication number | Publication date |
---|---|
CN114691847B (en) | 2024-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gan et al. | Sparse attention based separable dilated convolutional neural network for targeted sentiment analysis | |
CN110263912B (en) | Image question-answering method based on multi-target association depth reasoning | |
CN113610126B (en) | Label-free knowledge distillation method based on multi-target detection model and storage medium | |
CN109766427B (en) | Intelligent question-answering method based on collaborative attention for virtual learning environment | |
CN113191357B (en) | Multilevel image-text matching method based on graph attention network | |
CN110309503A (en) | A kind of subjective item Rating Model and methods of marking based on deep learning BERT--CNN | |
CN111897944B (en) | Knowledge graph question-answering system based on semantic space sharing | |
CN111242197B (en) | Image text matching method based on double-view semantic reasoning network | |
Jiang et al. | An eight-layer convolutional neural network with stochastic pooling, batch normalization and dropout for fingerspelling recognition of Chinese sign language | |
CN114936623B (en) | Aspect-level emotion analysis method integrating multi-mode data | |
CN111563149A (en) | Entity linking method for Chinese knowledge map question-answering system | |
CN111611367B (en) | Visual question-answering method introducing external knowledge | |
CN114973125A (en) | Method and system for assisting navigation in intelligent navigation scene by using knowledge graph | |
CN112632250A (en) | Question and answer method and system under multi-document scene | |
CN116611024A (en) | Multi-mode trans mock detection method based on facts and emotion oppositivity | |
CN115331075A (en) | Countermeasures type multi-modal pre-training method for enhancing knowledge of multi-modal scene graph | |
Yu et al. | Question classification based on MAC-LSTM | |
CN114595306A (en) | Text similarity calculation system and method based on distance perception self-attention mechanism and multi-angle modeling | |
Zhou et al. | Stock prediction based on bidirectional gated recurrent unit with convolutional neural network and feature selection | |
CN116187349A (en) | Visual question-answering method based on scene graph relation information enhancement | |
CN110889340A (en) | Visual question-answering model based on iterative attention mechanism | |
CN116303976B (en) | Penetration test question-answering method, system and medium based on network security knowledge graph | |
CN114691847A (en) | Relational attention network visual question-answering method based on deep perception and semantic guidance | |
Tian et al. | Scene graph generation by multi-level semantic tasks | |
Xin et al. | Knowledge-based intelligent education recommendation system with IoT networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |