CN110879844B - Cross-media reasoning method and system based on heterogeneous interactive learning - Google Patents

Cross-media reasoning method and system based on heterogeneous interactive learning Download PDF

Info

Publication number
CN110879844B
CN110879844B CN201911023636.7A CN201911023636A CN110879844B CN 110879844 B CN110879844 B CN 110879844B CN 201911023636 A CN201911023636 A CN 201911023636A CN 110879844 B CN110879844 B CN 110879844B
Authority
CN
China
Prior art keywords
media
cross
heterogeneous
reasoning
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911023636.7A
Other languages
Chinese (zh)
Other versions
CN110879844A (en
Inventor
彭宇新
黄鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201911023636.7A priority Critical patent/CN110879844B/en
Publication of CN110879844A publication Critical patent/CN110879844A/en
Application granted granted Critical
Publication of CN110879844B publication Critical patent/CN110879844B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/45Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a cross-media reasoning method and a system based on heterogeneous interactive learning. The method comprises the following steps: 1. a cross-media implication reasoning data set is established, wherein the premises includes two different media types and the conclusion includes one media type. 2. The heterogeneous interactive learning network structure is trained by using a cross-media implication reasoning data set, and the method mainly comprises cross-media interactive attention learning and heterogeneous tensor space construction. 3. And reasoning by using the trained heterogeneous interactive learning network, and judging the implication relation of the given premise and the conclusion. Compared with the prior art, the method can realize the implication reasoning based on different media premises and improve the accuracy of the implication reasoning.

Description

Cross-media reasoning method and system based on heterogeneous interactive learning
Technical Field
The invention relates to the field of multimedia analysis, in particular to a cross-media reasoning method and system based on heterogeneous interactive learning.
Background
Reasoning is a key ability of humans to perceive the external world, while implication reasoning is an important basic form of reasoning. Implication reasoning refers to judging whether the conclusion H is true or not according to a given premise P. The method has wide application value in semantic retrieval, intelligent question answering and other applications. The existing implication reasoning method focuses on texts, namely, the condition that the preconditions and the conclusions are both one section of text, and the emphasis is on judging the text similarity of the preconditions and the conclusions. However, human knowledge and reasoning processes often participate in vision, language and other senses, and the reasoning form relying on text only greatly limits the breadth and depth of reasoning. Therefore, how to expand the inference form mainly based on the existing text to the cross-media inference with the participation of multiple media becomes a key problem of research and application.
The related technology mainly comprises two categories of text implication reasoning and cross-media analysis. On the basis of text implication reasoning, the goal is to judge three conditions of the conclusion H according to a given premise P: certain establishment (implication), certain non-establishment (contradiction), and no judgment (irrelevance). As a basic task of natural language processing, text implication reasoning has received extensive attention from researchers. An inference rule-based approach, such as that proposed by Mirkin et al in the document "Source-Language information Modeling for transforming Unknown terminals", can attempt transformation of the previously mentioned conclusions by known text rules. The rules involved include inclusion relationships (e.g., dog → animal) and causal relationships (e.g., buy → own), among others. If the premise can obtain a conclusion through rule transformation, the premise and the conclusion are in an implication relationship. Bowman et al propose a depth network-based method in the document "large annotated corppus for learning natural language reference", using two independent recurrent neural network models to extract text features of preconditions and conclusions, and then judging implication relations through a plurality of full-connected layers. However, these methods all use text preconditions and text conclusions as input, so that only reasoning on text implication relationships can be performed. This greatly limits the depth and breadth of the inference.
In cross-media analysis, existing research has focused on the search task. The mainstream method is unified representation learning, namely, different media such as images and texts are mapped into the same semantic space, so that the representations of the media can be subjected to similarity measurement. For example, rasiwasia et al proposed a high-level semantic mapping method in the document "A New Approach to Cross-Module Multimedia Retrieval", which maps an image and a text into the same space by using a typical correlation analysis method, labels the data according to their categories, and learns the semantics by using a logistic regression method. Ngiam in the document "Multimodal Deep Learning" proposes a Multimodal self-encoder method, which uses two self-encoders to simultaneously receive the input of two media, and uses the reconstruction error minimization principle to train. The two self-encoders have a shared encoding layer, so that the association relation of different media can be learned. However, these methods are all directed to retrieval tasks, and the emphasis is on judging the similarity of different media data, and the implication reasoning task cannot be supported.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a cross-media reasoning method and a system based on heterogeneous interactive learning, which can comprehensively consider the premise of two different media data and judge whether the conclusion is true. Through cross-media interactive attention learning and heterogeneous tensor space construction, complementary cross-media fine-grained clues can be fully mined, and comprehensive reasoning is achieved.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a cross-media reasoning method based on heterogeneous interactive learning is used for comprehensively analyzing reasoning clues contained in different media and judging the possibility of establishing a conclusion, so that cross-media implication reasoning is realized. The method comprises the following steps:
(1) Establishing a cross-media implication reasoning data set, wherein the premises comprise two different media types, and the conclusion comprises one media type;
(2) Training a heterogeneous interactive learning network structure by using a cross-media implication reasoning data set, mainly comprising cross-media interactive attention learning and heterogeneous tensor space construction;
(3) And reasoning by using the trained heterogeneous interactive learning network, and judging the implication relation of the given premise and the conclusion.
Further, in the above cross-media inference method based on heterogeneous interactive learning, the media types of the precondition in the step (1) are text and image; the media type of the conclusion is text.
Further, in the above cross-media inference method based on heterogeneous interactive learning, the network structure in step (2) includes two main parts: cross-media interactive attention learning and heterogeneous tensor space construction. The method comprises the steps of firstly generating fine-grained representation for images and texts, and then simultaneously mining reasoning clues of image preconditions, text preconditions and conclusions in a tensor space to realize implication reasoning.
Further, in the above cross-media inference method based on heterogeneous interactive learning, the implication relationship in step (3) is divided into implication, contradiction and irrelevant. The reasoning way is as follows: and simultaneously inputting the image precondition, the text precondition and the text conclusion, outputting the probability values of the three implication relations by the network, and taking the implication relation with the maximum probability as an output result.
Based on the same inventive concept, the invention also provides a cross-media reasoning system based on heterogeneous interactive learning, which comprises:
the data set establishing module is responsible for establishing a cross-media implication reasoning data set, wherein the premises comprise two different media types, and the conclusion comprises one media type;
the network training module is in charge of training a heterogeneous interactive learning network structure by using a cross-media implication reasoning data set, and comprises cross-media interactive attention learning and heterogeneous tensor space construction;
and the reasoning module is used for reasoning by utilizing the trained heterogeneous interactive learning network and judging the implication relation of the given premise and the conclusion.
The invention has the beneficial effects that: compared with the prior art, the implication relationship reasoning based on different media premises can be realized. In addition, through interactive attention learning and heterogeneous tensor space construction, complementary clues of different media are more fully utilized, and accuracy of implication reasoning is improved.
The present invention has the above-described effects because: fine-grained semantic alignment of images-texts and texts-texts is realized through interactive attention learning, and cross-media association relations of different media are fully mined; through the construction of the heterogeneous tensor space, the premises and the conclusions of different media are constructed in the same tensor space, reasoning clues between the premises and the conclusions of the different media can be comprehensively analyzed, and the accuracy of cross-media implication reasoning is improved.
Drawings
FIG. 1 is a flow chart of a cross-media reasoning method based on heterogeneous interactive learning according to the present invention.
Fig. 2 is a schematic diagram of the complete network architecture of the present invention.
FIG. 3 is a schematic diagram of cross-media interactive attention learning in the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and specific embodiments.
The invention discloses a cross-media reasoning method based on heterogeneous interactive learning, a flow chart of which is shown in figure 1, and the method comprises the following steps:
(1) A cross-media implication reasoning data set is established, wherein the premises includes two different media types and the conclusion includes one media type.
In this embodiment, two media types in the premise are as follows: text and images; the media types concluded were: text. Data representing a set of cross-media implication reasoning Data, then
Figure BDA0002248006770000031
Wherein N is the number of data, (P (I) n ,P(T) n ) And h n An nth premise-conclusion pair is formed. P (I) n Representing image premise, P (T) n Representing a text precondition. And e n The implication relation label of the nth premise-conclusion pair is represented, and the structure of the implication relation label is a 1 × 3 vector, one dimension and only one dimension are 1, and the rest are 0. A dimension of 1 represents one of three implication relationships: implications, contradictions and irrelevance.
(2) The heterogeneous interactive learning network structure is trained by using a cross-media implication reasoning data set, and the method mainly comprises cross-media interactive attention learning and heterogeneous tensor space construction.
The network structure of this step is shown in fig. 2. In this embodiment, for image input, a VGG19 convolutional neural network structure is used to extract pool5 layer features as local features of an image (each image has v =49 regions); for text entry, the method of the document "Natural Language assessment over Interaction Space" (by the authors YICHEN Gong, heng Luo, and Jianan Zhang, published in International Conference on Learning responses 2018) was used to extract the features of each word as local features of the text. For convenience, features are extracted for each text according to w =49 words, redundant interception is carried out, and the deficiency is completed by 0. The local features of the image are converted into representations with the same dimensionality as the local features of the text through a full connection layer, and therefore the representations are used as input of interactive attention learning.
In this embodiment, the cross-media interactive attention learning occurs between a text and a text, and between a text and an image at the same time, the goal is to perform cross coding between a premise and a conclusion, and mine the semantic association relationship of the cross-media, and the cross-media interactive attention learning is shown in fig. 3. Specifically, let P (I) n Is characterized in part by
Figure BDA0002248006770000041
Wherein
Figure BDA0002248006770000042
The 1 st local feature is obtained in sequence; similarly, P (T) n Is characterized by
Figure BDA0002248006770000043
h n Is characterized by
Figure BDA0002248006770000044
In this example, the compound represented by the formula P (I) n And h n For example, the interactive attention is expressed as a v × w matrix, the number of image areas is v, the number of text words is w, and each element is
Figure BDA0002248006770000045
Wherein the symbol "o" represents multiplication of vectors by corresponding elements,
Figure BDA0002248006770000046
and representing a fully-connected layer with the output dimension of 1, taking x as an input, and representing a network parameter by theta. According to the interactive attention, obtain
Figure BDA0002248006770000047
The cross-coding of (a) is as follows:
Figure BDA0002248006770000048
likewise, cross-coding of text preconditions and text conclusions
Figure BDA0002248006770000049
And
Figure BDA00022480067700000410
can be calculated according to the method.
In this embodiment, the method for constructing the heterogeneous tensor space is to express an interactive relationship between the preconditions and the conclusions of different media types in the same tensor space. Specifically, taking the image premise and conclusion branch of the network as an example, the inputs required for the tensor space construction are as follows:
Figure BDA00022480067700000411
Figure BDA00022480067700000412
wherein, symbol "; "denotes the concatenation of vectors. Then, the tensor of the branch of the image precondition and the text conclusion can be obtained:
Figure BDA00022480067700000413
similarly, the Tensor Tensor (TT) of the text precondition and the text conclusion branch can be obtained, and the final heterogeneous Tensor space Tensor (HT) = [ Tensor (IT) can be obtained according to the Tensor Tensor (TT); tensor (TT) ]. Then, a convolutional neural network model (in this embodiment, denseNet is used), the Heterogeneous Tensor (HT) is used as an input, a classifier is used to perform probabilistic inference of implication relationships, and the most probable is an inference result.
(3) And reasoning by using the trained heterogeneous interactive learning network to judge the implication relation of the given premise and the conclusion.
After the deep network training is finished, the implication relation of the preconditions and the conclusion can be judged according to the input image preconditions, the text preconditions and the text conclusion. Specifically, the image precondition, the text precondition and the text conclusion are simultaneously input into a network structure, the network finally obtains a probability value for the implication, contradiction and irrelevance, and the maximum probability value is taken as an inference result.
The following experimental results show that compared with the existing method, the cross-media reasoning method based on heterogeneous interactive learning can obtain higher implication reasoning accuracy.
In this example, experiments were carried out based on the implication reasoning data set SNLI, which was proposed by the documents "A large annotated co-rpus for learning Natural Language introduction" (authors: samuel R. Bowman, gabor Angeli, christopher Potts, and Christopher D. Management, published in 2015 reference on Empirical Methods on Natural Language Processing), and which contained 570,000 prerequisite text-conclusion text pairs. Since each premise in the SNLI dataset can correspond to one image in Flickr30k, we performed experiments by adding this image to construct data of the premise image, the premise text, and the conclusion text. We tested the following 4 methods as experimental comparisons:
the prior method comprises the following steps: the 100-d LSTM encoders method in the document "A large annotated corps for learning natural language inference" (author: samuel R. Bowman, gabor Angeli, christopher Potts, and Christopher D. Management) extracts text features through two independent recurrent neural networks and then performs implication relationship inference using several fully connected layers.
The prior method II comprises the following steps: the BiMPM method in the document "Bilateral Multi-Perspectral Matching for Natural Language sequences" (authors Zhiguo Wang, wael Hamza, and Radu Florian) considers implication reasoning as a two-way Matching problem from the aforementioned conclusion, from conclusion to the premise, matching text from multiple angles.
The existing method three: in the DIIN method in the document "Natural Language Inference over Interaction Space" (authors yiche Gong, heng Luo, and Jian Zhang), features are extracted from text preconditions and text conclusions by using a self-attention mechanism (self-attention), and then the association relationship between the preconditions and the conclusions is modeled in an Interaction tensor Space to perform implication reasoning.
The invention comprises the following steps: the method of this embodiment.
The experiment adopts an Accuracy (Accuracy) index to evaluate the Accuracy of implication reasoning. The accuracy rate refers to the ratio of the correct data volume to all data volumes judged by the implication relationship in the data set. The higher the accuracy, the better the implication reasoning effect.
TABLE 1 Experimental results of the invention show
Comparison method Rate of accuracy
Existing method 1 77.6%
Conventional method II 86.9%
Existing method III 88.0%
The invention 90.3%
As can be seen from Table 1, the method can obtain a better result than the existing method in the accuracy of implication reasoning. The network structure of the first method is simple, and only the recurrent neural network and a plurality of full connection layers are adopted, so that the accuracy is low. In the existing method II and the existing method III, a bidirectional matching mechanism, an attention mechanism and the like are respectively adopted, so that higher accuracy is obtained. However, the above methods can only realize implication reasoning of text, and cannot utilize complementary information brought by images, so that accuracy is limited. On one hand, fine-grained semantic alignment of images-texts and texts-texts is realized through interactive attention learning, and cross-media association relation is fully excavated; on the other hand, through the heterogeneous tensor space construction, reasoning clues between different media premises and conclusions can be comprehensively analyzed, and the accuracy of cross-media implication reasoning is improved.
Based on the same inventive concept, another embodiment of the present invention provides a cross-media inference system based on heterogeneous interactive learning, which includes:
the data set establishing module is responsible for establishing a cross-media implication reasoning data set, wherein the premise comprises two different media types, and the conclusion comprises one media type;
the network training module is in charge of training a heterogeneous interactive learning network structure by using a cross-media implication reasoning data set, and comprises cross-media interactive attention learning and heterogeneous tensor space construction;
and the reasoning module is responsible for reasoning by utilizing the trained heterogeneous interactive learning network and judging the implication relation of the given premise and the conclusion.
Based on the same inventive concept, another embodiment of the present invention provides a computer/server comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the steps of the inventive method.
Based on the same inventive concept, another embodiment of the present invention provides a computer-readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program, which when executed by a computer, performs the steps of the inventive method.
In the foregoing embodiment, the media types of the premises are text and image, and the media type of the conclusion is text. The method of the invention also supports implication reasoning of other media types, such as image and audio media types and text media types of conclusion.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (8)

1. A cross-media reasoning method based on heterogeneous interactive learning comprises the following steps:
establishing a cross-media implication reasoning data set, wherein the premises comprise two different media types, and the conclusion comprises one media type;
training a heterogeneous interactive learning network structure by using a cross-media implication reasoning data set, wherein the structure comprises cross-media interactive attention learning and heterogeneous tensor space construction;
reasoning by using the trained heterogeneous interactive learning network, and judging the implication relation of the given premise and the conclusion;
the cross-media interactive attention learning simultaneously occurs between texts and between images, and aims to perform cross coding between the premises and the conclusions and emphasize the semantic association of cross-media;
assuming that the number of image regions is v and the number of text words is w, the image premise P (I) n Is characterized by
Figure FDA0003798285950000011
Textual conclusion hn has local features of
Figure FDA0003798285950000012
The cross-media interaction attention is expressed as a v x w matrix, each element being
Figure FDA0003798285950000013
Wherein, the symbol
Figure FDA0003798285950000014
Representing a vectorThe multiplication is carried out according to the corresponding elements,
Figure FDA0003798285950000015
representing a full connection layer with an output dimension of 1, taking x as input, and representing a network parameter by theta; according to the interactive attention, an image premise P (I) n I local feature of
Figure FDA0003798285950000016
Is as follows:
Figure FDA0003798285950000017
similarly, the cross coding of the text precondition and the text conclusion is calculated according to the method
Figure FDA0003798285950000018
And
Figure FDA0003798285950000019
2. the method of claim 1, wherein the media types of the premises are text and images; the media type of the conclusion is text; the nth data in the cross-media implication reasoning data set is composed of text preconditions P (I) n Image premise P (T) n And a text conclusion hn with an implication relationship label en.
3. The method of claim 1, the heterogeneous interactive learning network structure comprising two main parts: performing cross-media interactive attention learning and constructing a heterogeneous tensor space; the method comprises the steps of firstly generating fine-grained representation of an image and a text, and then simultaneously mining implication relations of image preconditions, text preconditions and conclusions in a heterogeneous tensor space to realize implication reasoning.
4. A method as in claim 3, wherein the heterogeneous tensor space is constructed with the goal of expressing inference cues for the premises and conclusions of different media types in the same tensor space.
5. The method of claim 4, wherein for image preconditions and textual conclusion branches of the heterogeneous interactive learning network structure, the inputs required for the heterogeneous tensor space construction are as follows:
Figure FDA0003798285950000021
Figure FDA0003798285950000022
wherein,
Figure FDA0003798285950000023
for image precondition P (I) n The ith local feature of (a);
Figure FDA0003798285950000024
the ith local feature of the text conclusion hn;
Figure FDA0003798285950000025
cross coding for image premise;
Figure FDA0003798285950000026
cross coding for textual conclusions; symbol
Figure FDA0003798285950000027
The representative vectors are multiplied by the corresponding elements,
Figure FDA0003798285950000028
represents a fully connected layer with an output dimension of 1, takes x as input, and theta represents a network parameterCounting; symbol "; "represents concatenation of vectors; then, the tensor of the branch of the image precondition and the text conclusion is obtained:
Figure FDA0003798285950000029
similarly, a Tensor Tensor (TT) of the text precondition and the text conclusion branch is obtained, and a final heterogeneous Tensor space Tensor (HT) = [ Tensor (IT) is obtained according to the Tensor Tensor (TT); tensor (TT) ]; and then, using a convolutional neural network model, taking the heterogeneous Tensor Tensor (HT) as input, and performing probability inference of the implication relation through a classifier, wherein the maximum probability is an inference result.
6. The method of claim 1, wherein the utilizing the trained heterogeneous interactive learning network for reasoning means that the image precondition, the text precondition and the text conclusion are simultaneously input into the heterogeneous interactive learning network structure, the network finally obtains a probability value for the implication, the contradiction and the irrelevance, and takes the item with the maximum probability value as the reasoning result.
7. A cross-media inference system based on heterogeneous interactive learning using the method of any one of claims 1 to 6, comprising:
the data set establishing module is responsible for establishing a cross-media implication reasoning data set, wherein the premise comprises two different media types, and the conclusion comprises one media type;
the network training module is responsible for training a heterogeneous interactive learning network structure by using a cross-media implication reasoning data set, and comprises cross-media interactive attention learning and heterogeneous tensor space construction;
and the reasoning module is used for reasoning by utilizing the trained heterogeneous interactive learning network and judging the implication relation of the given premise and the conclusion.
8. A computer comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the method according to any one of claims 1 to 6.
CN201911023636.7A 2019-10-25 2019-10-25 Cross-media reasoning method and system based on heterogeneous interactive learning Active CN110879844B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911023636.7A CN110879844B (en) 2019-10-25 2019-10-25 Cross-media reasoning method and system based on heterogeneous interactive learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911023636.7A CN110879844B (en) 2019-10-25 2019-10-25 Cross-media reasoning method and system based on heterogeneous interactive learning

Publications (2)

Publication Number Publication Date
CN110879844A CN110879844A (en) 2020-03-13
CN110879844B true CN110879844B (en) 2022-10-14

Family

ID=69728037

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911023636.7A Active CN110879844B (en) 2019-10-25 2019-10-25 Cross-media reasoning method and system based on heterogeneous interactive learning

Country Status (1)

Country Link
CN (1) CN110879844B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102667765A (en) * 2009-09-08 2012-09-12 诺基亚公司 Method and apparatus for selective sharing of semantic information sets
CN103971311A (en) * 2014-05-09 2014-08-06 北京化工大学 Reasoning drill method and system based on man-machine coordination
CN105718532A (en) * 2016-01-15 2016-06-29 北京大学 Cross-media sequencing method based on multi-depth network structure
CN110263912A (en) * 2019-05-14 2019-09-20 杭州电子科技大学 A kind of image answering method based on multiple target association depth reasoning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102667765A (en) * 2009-09-08 2012-09-12 诺基亚公司 Method and apparatus for selective sharing of semantic information sets
CN103971311A (en) * 2014-05-09 2014-08-06 北京化工大学 Reasoning drill method and system based on man-machine coordination
CN105718532A (en) * 2016-01-15 2016-06-29 北京大学 Cross-media sequencing method based on multi-depth network structure
CN110263912A (en) * 2019-05-14 2019-09-20 杭州电子科技大学 A kind of image answering method based on multiple target association depth reasoning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《A Fast Unified Model for Parsing and Sentence Understanding》;Bowman, SR 等;《54th Annual Meeting of the Association-for-Computational-Linguistics (ACL)》;20161231;全文 *
《基于综合推理的多媒体语义挖掘和跨媒体检索》;杨易 等;《计算机辅助设计与图形学学报》;20090930;第21卷(第9期);全文 *

Also Published As

Publication number Publication date
CN110879844A (en) 2020-03-13

Similar Documents

Publication Publication Date Title
CN111079409B (en) Emotion classification method utilizing context and aspect memory information
CN113239169B (en) Answer generation method, device, equipment and storage medium based on artificial intelligence
CN113626589B (en) Multi-label text classification method based on mixed attention mechanism
CN113987179A (en) Knowledge enhancement and backtracking loss-based conversational emotion recognition network model, construction method, electronic device and storage medium
Chen et al. Recurrent synchronization network for emotion-cause pair extraction
Huang et al. Relation classification via knowledge graph enhanced transformer encoder
Liu et al. AMFF: A new attention-based multi-feature fusion method for intention recognition
Choi et al. Hybrid information mixing module for stock movement prediction
Zhou Research on sentiment analysis model of short text based on deep learning
Liu et al. Dual-feature-embeddings-based semi-supervised learning for cognitive engagement classification in online course discussions
CN110889505B (en) Cross-media comprehensive reasoning method and system for image-text sequence matching
CN111597816A (en) Self-attention named entity recognition method, device, equipment and storage medium
CN115906816A (en) Text emotion analysis method of two-channel Attention model based on Bert
Tao et al. Multi‐head attention graph convolutional network model: End‐to‐end entity and relation joint extraction based on multi‐head attention graph convolutional network
CN113010662B (en) Hierarchical conversational machine reading understanding system and method
Zhao et al. [Retracted] Application of Knowledge Map Based on BiLSTM‐CRF Algorithm Model in Ideological and Political Education Question Answering System
CN112989024B (en) Method, device and equipment for extracting relation of text content and storage medium
Narayanan et al. VQA as a factoid question answering problem: A novel approach for knowledge-aware and explainable visual question answering
CN116719900A (en) Event causal relationship identification method based on hypergraph modeling document-level causal structure
CN110879844B (en) Cross-media reasoning method and system based on heterogeneous interactive learning
Jasim et al. Analyzing Social Media Sentiment: Twitter as a Case Study
van Dongen Quality prediction of scientific documents using textual and visual content
CN115270746A (en) Question sample generation method and device, electronic equipment and storage medium
CN114998041A (en) Method and device for training claim settlement prediction model, electronic equipment and storage medium
Ibrahiem et al. Convolutional Neural Network Multi-Emotion Classifiers

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant