CN110879844B - Cross-media reasoning method and system based on heterogeneous interactive learning - Google Patents
Cross-media reasoning method and system based on heterogeneous interactive learning Download PDFInfo
- Publication number
- CN110879844B CN110879844B CN201911023636.7A CN201911023636A CN110879844B CN 110879844 B CN110879844 B CN 110879844B CN 201911023636 A CN201911023636 A CN 201911023636A CN 110879844 B CN110879844 B CN 110879844B
- Authority
- CN
- China
- Prior art keywords
- media
- cross
- heterogeneous
- reasoning
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 56
- 238000010276 construction Methods 0.000 claims abstract description 14
- 238000012549 training Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 5
- 239000013598 vector Substances 0.000 claims description 5
- 230000003993 interaction Effects 0.000 claims description 4
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 2
- 238000005065 mining Methods 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 208000031968 Cadaver Diseases 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000002146 bilateral effect Effects 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000004836 empirical method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/45—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
- G06F16/435—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a cross-media reasoning method and a system based on heterogeneous interactive learning. The method comprises the following steps: 1. a cross-media implication reasoning data set is established, wherein the premises includes two different media types and the conclusion includes one media type. 2. The heterogeneous interactive learning network structure is trained by using a cross-media implication reasoning data set, and the method mainly comprises cross-media interactive attention learning and heterogeneous tensor space construction. 3. And reasoning by using the trained heterogeneous interactive learning network, and judging the implication relation of the given premise and the conclusion. Compared with the prior art, the method can realize the implication reasoning based on different media premises and improve the accuracy of the implication reasoning.
Description
Technical Field
The invention relates to the field of multimedia analysis, in particular to a cross-media reasoning method and system based on heterogeneous interactive learning.
Background
Reasoning is a key ability of humans to perceive the external world, while implication reasoning is an important basic form of reasoning. Implication reasoning refers to judging whether the conclusion H is true or not according to a given premise P. The method has wide application value in semantic retrieval, intelligent question answering and other applications. The existing implication reasoning method focuses on texts, namely, the condition that the preconditions and the conclusions are both one section of text, and the emphasis is on judging the text similarity of the preconditions and the conclusions. However, human knowledge and reasoning processes often participate in vision, language and other senses, and the reasoning form relying on text only greatly limits the breadth and depth of reasoning. Therefore, how to expand the inference form mainly based on the existing text to the cross-media inference with the participation of multiple media becomes a key problem of research and application.
The related technology mainly comprises two categories of text implication reasoning and cross-media analysis. On the basis of text implication reasoning, the goal is to judge three conditions of the conclusion H according to a given premise P: certain establishment (implication), certain non-establishment (contradiction), and no judgment (irrelevance). As a basic task of natural language processing, text implication reasoning has received extensive attention from researchers. An inference rule-based approach, such as that proposed by Mirkin et al in the document "Source-Language information Modeling for transforming Unknown terminals", can attempt transformation of the previously mentioned conclusions by known text rules. The rules involved include inclusion relationships (e.g., dog → animal) and causal relationships (e.g., buy → own), among others. If the premise can obtain a conclusion through rule transformation, the premise and the conclusion are in an implication relationship. Bowman et al propose a depth network-based method in the document "large annotated corppus for learning natural language reference", using two independent recurrent neural network models to extract text features of preconditions and conclusions, and then judging implication relations through a plurality of full-connected layers. However, these methods all use text preconditions and text conclusions as input, so that only reasoning on text implication relationships can be performed. This greatly limits the depth and breadth of the inference.
In cross-media analysis, existing research has focused on the search task. The mainstream method is unified representation learning, namely, different media such as images and texts are mapped into the same semantic space, so that the representations of the media can be subjected to similarity measurement. For example, rasiwasia et al proposed a high-level semantic mapping method in the document "A New Approach to Cross-Module Multimedia Retrieval", which maps an image and a text into the same space by using a typical correlation analysis method, labels the data according to their categories, and learns the semantics by using a logistic regression method. Ngiam in the document "Multimodal Deep Learning" proposes a Multimodal self-encoder method, which uses two self-encoders to simultaneously receive the input of two media, and uses the reconstruction error minimization principle to train. The two self-encoders have a shared encoding layer, so that the association relation of different media can be learned. However, these methods are all directed to retrieval tasks, and the emphasis is on judging the similarity of different media data, and the implication reasoning task cannot be supported.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a cross-media reasoning method and a system based on heterogeneous interactive learning, which can comprehensively consider the premise of two different media data and judge whether the conclusion is true. Through cross-media interactive attention learning and heterogeneous tensor space construction, complementary cross-media fine-grained clues can be fully mined, and comprehensive reasoning is achieved.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a cross-media reasoning method based on heterogeneous interactive learning is used for comprehensively analyzing reasoning clues contained in different media and judging the possibility of establishing a conclusion, so that cross-media implication reasoning is realized. The method comprises the following steps:
(1) Establishing a cross-media implication reasoning data set, wherein the premises comprise two different media types, and the conclusion comprises one media type;
(2) Training a heterogeneous interactive learning network structure by using a cross-media implication reasoning data set, mainly comprising cross-media interactive attention learning and heterogeneous tensor space construction;
(3) And reasoning by using the trained heterogeneous interactive learning network, and judging the implication relation of the given premise and the conclusion.
Further, in the above cross-media inference method based on heterogeneous interactive learning, the media types of the precondition in the step (1) are text and image; the media type of the conclusion is text.
Further, in the above cross-media inference method based on heterogeneous interactive learning, the network structure in step (2) includes two main parts: cross-media interactive attention learning and heterogeneous tensor space construction. The method comprises the steps of firstly generating fine-grained representation for images and texts, and then simultaneously mining reasoning clues of image preconditions, text preconditions and conclusions in a tensor space to realize implication reasoning.
Further, in the above cross-media inference method based on heterogeneous interactive learning, the implication relationship in step (3) is divided into implication, contradiction and irrelevant. The reasoning way is as follows: and simultaneously inputting the image precondition, the text precondition and the text conclusion, outputting the probability values of the three implication relations by the network, and taking the implication relation with the maximum probability as an output result.
Based on the same inventive concept, the invention also provides a cross-media reasoning system based on heterogeneous interactive learning, which comprises:
the data set establishing module is responsible for establishing a cross-media implication reasoning data set, wherein the premises comprise two different media types, and the conclusion comprises one media type;
the network training module is in charge of training a heterogeneous interactive learning network structure by using a cross-media implication reasoning data set, and comprises cross-media interactive attention learning and heterogeneous tensor space construction;
and the reasoning module is used for reasoning by utilizing the trained heterogeneous interactive learning network and judging the implication relation of the given premise and the conclusion.
The invention has the beneficial effects that: compared with the prior art, the implication relationship reasoning based on different media premises can be realized. In addition, through interactive attention learning and heterogeneous tensor space construction, complementary clues of different media are more fully utilized, and accuracy of implication reasoning is improved.
The present invention has the above-described effects because: fine-grained semantic alignment of images-texts and texts-texts is realized through interactive attention learning, and cross-media association relations of different media are fully mined; through the construction of the heterogeneous tensor space, the premises and the conclusions of different media are constructed in the same tensor space, reasoning clues between the premises and the conclusions of the different media can be comprehensively analyzed, and the accuracy of cross-media implication reasoning is improved.
Drawings
FIG. 1 is a flow chart of a cross-media reasoning method based on heterogeneous interactive learning according to the present invention.
Fig. 2 is a schematic diagram of the complete network architecture of the present invention.
FIG. 3 is a schematic diagram of cross-media interactive attention learning in the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and specific embodiments.
The invention discloses a cross-media reasoning method based on heterogeneous interactive learning, a flow chart of which is shown in figure 1, and the method comprises the following steps:
(1) A cross-media implication reasoning data set is established, wherein the premises includes two different media types and the conclusion includes one media type.
In this embodiment, two media types in the premise are as follows: text and images; the media types concluded were: text. Data representing a set of cross-media implication reasoning Data, thenWherein N is the number of data, (P (I) n ,P(T) n ) And h n An nth premise-conclusion pair is formed. P (I) n Representing image premise, P (T) n Representing a text precondition. And e n The implication relation label of the nth premise-conclusion pair is represented, and the structure of the implication relation label is a 1 × 3 vector, one dimension and only one dimension are 1, and the rest are 0. A dimension of 1 represents one of three implication relationships: implications, contradictions and irrelevance.
(2) The heterogeneous interactive learning network structure is trained by using a cross-media implication reasoning data set, and the method mainly comprises cross-media interactive attention learning and heterogeneous tensor space construction.
The network structure of this step is shown in fig. 2. In this embodiment, for image input, a VGG19 convolutional neural network structure is used to extract pool5 layer features as local features of an image (each image has v =49 regions); for text entry, the method of the document "Natural Language assessment over Interaction Space" (by the authors YICHEN Gong, heng Luo, and Jianan Zhang, published in International Conference on Learning responses 2018) was used to extract the features of each word as local features of the text. For convenience, features are extracted for each text according to w =49 words, redundant interception is carried out, and the deficiency is completed by 0. The local features of the image are converted into representations with the same dimensionality as the local features of the text through a full connection layer, and therefore the representations are used as input of interactive attention learning.
In this embodiment, the cross-media interactive attention learning occurs between a text and a text, and between a text and an image at the same time, the goal is to perform cross coding between a premise and a conclusion, and mine the semantic association relationship of the cross-media, and the cross-media interactive attention learning is shown in fig. 3. Specifically, let P (I) n Is characterized in part byWhereinThe 1 st local feature is obtained in sequence; similarly, P (T) n Is characterized byh n Is characterized by
In this example, the compound represented by the formula P (I) n And h n For example, the interactive attention is expressed as a v × w matrix, the number of image areas is v, the number of text words is w, and each element isWherein the symbol "o" represents multiplication of vectors by corresponding elements,and representing a fully-connected layer with the output dimension of 1, taking x as an input, and representing a network parameter by theta. According to the interactive attention, obtainThe cross-coding of (a) is as follows:
likewise, cross-coding of text preconditions and text conclusionsAndcan be calculated according to the method.
In this embodiment, the method for constructing the heterogeneous tensor space is to express an interactive relationship between the preconditions and the conclusions of different media types in the same tensor space. Specifically, taking the image premise and conclusion branch of the network as an example, the inputs required for the tensor space construction are as follows:
wherein, symbol "; "denotes the concatenation of vectors. Then, the tensor of the branch of the image precondition and the text conclusion can be obtained:
similarly, the Tensor Tensor (TT) of the text precondition and the text conclusion branch can be obtained, and the final heterogeneous Tensor space Tensor (HT) = [ Tensor (IT) can be obtained according to the Tensor Tensor (TT); tensor (TT) ]. Then, a convolutional neural network model (in this embodiment, denseNet is used), the Heterogeneous Tensor (HT) is used as an input, a classifier is used to perform probabilistic inference of implication relationships, and the most probable is an inference result.
(3) And reasoning by using the trained heterogeneous interactive learning network to judge the implication relation of the given premise and the conclusion.
After the deep network training is finished, the implication relation of the preconditions and the conclusion can be judged according to the input image preconditions, the text preconditions and the text conclusion. Specifically, the image precondition, the text precondition and the text conclusion are simultaneously input into a network structure, the network finally obtains a probability value for the implication, contradiction and irrelevance, and the maximum probability value is taken as an inference result.
The following experimental results show that compared with the existing method, the cross-media reasoning method based on heterogeneous interactive learning can obtain higher implication reasoning accuracy.
In this example, experiments were carried out based on the implication reasoning data set SNLI, which was proposed by the documents "A large annotated co-rpus for learning Natural Language introduction" (authors: samuel R. Bowman, gabor Angeli, christopher Potts, and Christopher D. Management, published in 2015 reference on Empirical Methods on Natural Language Processing), and which contained 570,000 prerequisite text-conclusion text pairs. Since each premise in the SNLI dataset can correspond to one image in Flickr30k, we performed experiments by adding this image to construct data of the premise image, the premise text, and the conclusion text. We tested the following 4 methods as experimental comparisons:
the prior method comprises the following steps: the 100-d LSTM encoders method in the document "A large annotated corps for learning natural language inference" (author: samuel R. Bowman, gabor Angeli, christopher Potts, and Christopher D. Management) extracts text features through two independent recurrent neural networks and then performs implication relationship inference using several fully connected layers.
The prior method II comprises the following steps: the BiMPM method in the document "Bilateral Multi-Perspectral Matching for Natural Language sequences" (authors Zhiguo Wang, wael Hamza, and Radu Florian) considers implication reasoning as a two-way Matching problem from the aforementioned conclusion, from conclusion to the premise, matching text from multiple angles.
The existing method three: in the DIIN method in the document "Natural Language Inference over Interaction Space" (authors yiche Gong, heng Luo, and Jian Zhang), features are extracted from text preconditions and text conclusions by using a self-attention mechanism (self-attention), and then the association relationship between the preconditions and the conclusions is modeled in an Interaction tensor Space to perform implication reasoning.
The invention comprises the following steps: the method of this embodiment.
The experiment adopts an Accuracy (Accuracy) index to evaluate the Accuracy of implication reasoning. The accuracy rate refers to the ratio of the correct data volume to all data volumes judged by the implication relationship in the data set. The higher the accuracy, the better the implication reasoning effect.
TABLE 1 Experimental results of the invention show
Comparison method | Rate of accuracy |
Existing method 1 | 77.6% |
Conventional method II | 86.9% |
Existing method III | 88.0% |
The invention | 90.3% |
As can be seen from Table 1, the method can obtain a better result than the existing method in the accuracy of implication reasoning. The network structure of the first method is simple, and only the recurrent neural network and a plurality of full connection layers are adopted, so that the accuracy is low. In the existing method II and the existing method III, a bidirectional matching mechanism, an attention mechanism and the like are respectively adopted, so that higher accuracy is obtained. However, the above methods can only realize implication reasoning of text, and cannot utilize complementary information brought by images, so that accuracy is limited. On one hand, fine-grained semantic alignment of images-texts and texts-texts is realized through interactive attention learning, and cross-media association relation is fully excavated; on the other hand, through the heterogeneous tensor space construction, reasoning clues between different media premises and conclusions can be comprehensively analyzed, and the accuracy of cross-media implication reasoning is improved.
Based on the same inventive concept, another embodiment of the present invention provides a cross-media inference system based on heterogeneous interactive learning, which includes:
the data set establishing module is responsible for establishing a cross-media implication reasoning data set, wherein the premise comprises two different media types, and the conclusion comprises one media type;
the network training module is in charge of training a heterogeneous interactive learning network structure by using a cross-media implication reasoning data set, and comprises cross-media interactive attention learning and heterogeneous tensor space construction;
and the reasoning module is responsible for reasoning by utilizing the trained heterogeneous interactive learning network and judging the implication relation of the given premise and the conclusion.
Based on the same inventive concept, another embodiment of the present invention provides a computer/server comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the steps of the inventive method.
Based on the same inventive concept, another embodiment of the present invention provides a computer-readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program, which when executed by a computer, performs the steps of the inventive method.
In the foregoing embodiment, the media types of the premises are text and image, and the media type of the conclusion is text. The method of the invention also supports implication reasoning of other media types, such as image and audio media types and text media types of conclusion.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (8)
1. A cross-media reasoning method based on heterogeneous interactive learning comprises the following steps:
establishing a cross-media implication reasoning data set, wherein the premises comprise two different media types, and the conclusion comprises one media type;
training a heterogeneous interactive learning network structure by using a cross-media implication reasoning data set, wherein the structure comprises cross-media interactive attention learning and heterogeneous tensor space construction;
reasoning by using the trained heterogeneous interactive learning network, and judging the implication relation of the given premise and the conclusion;
the cross-media interactive attention learning simultaneously occurs between texts and between images, and aims to perform cross coding between the premises and the conclusions and emphasize the semantic association of cross-media;
assuming that the number of image regions is v and the number of text words is w, the image premise P (I) n Is characterized byTextual conclusion hn has local features ofThe cross-media interaction attention is expressed as a v x w matrix, each element beingWherein, the symbolRepresenting a vectorThe multiplication is carried out according to the corresponding elements,representing a full connection layer with an output dimension of 1, taking x as input, and representing a network parameter by theta; according to the interactive attention, an image premise P (I) n I local feature ofIs as follows:
2. the method of claim 1, wherein the media types of the premises are text and images; the media type of the conclusion is text; the nth data in the cross-media implication reasoning data set is composed of text preconditions P (I) n Image premise P (T) n And a text conclusion hn with an implication relationship label en.
3. The method of claim 1, the heterogeneous interactive learning network structure comprising two main parts: performing cross-media interactive attention learning and constructing a heterogeneous tensor space; the method comprises the steps of firstly generating fine-grained representation of an image and a text, and then simultaneously mining implication relations of image preconditions, text preconditions and conclusions in a heterogeneous tensor space to realize implication reasoning.
4. A method as in claim 3, wherein the heterogeneous tensor space is constructed with the goal of expressing inference cues for the premises and conclusions of different media types in the same tensor space.
5. The method of claim 4, wherein for image preconditions and textual conclusion branches of the heterogeneous interactive learning network structure, the inputs required for the heterogeneous tensor space construction are as follows:
wherein,for image precondition P (I) n The ith local feature of (a);the ith local feature of the text conclusion hn;cross coding for image premise;cross coding for textual conclusions; symbolThe representative vectors are multiplied by the corresponding elements,represents a fully connected layer with an output dimension of 1, takes x as input, and theta represents a network parameterCounting; symbol "; "represents concatenation of vectors; then, the tensor of the branch of the image precondition and the text conclusion is obtained:
similarly, a Tensor Tensor (TT) of the text precondition and the text conclusion branch is obtained, and a final heterogeneous Tensor space Tensor (HT) = [ Tensor (IT) is obtained according to the Tensor Tensor (TT); tensor (TT) ]; and then, using a convolutional neural network model, taking the heterogeneous Tensor Tensor (HT) as input, and performing probability inference of the implication relation through a classifier, wherein the maximum probability is an inference result.
6. The method of claim 1, wherein the utilizing the trained heterogeneous interactive learning network for reasoning means that the image precondition, the text precondition and the text conclusion are simultaneously input into the heterogeneous interactive learning network structure, the network finally obtains a probability value for the implication, the contradiction and the irrelevance, and takes the item with the maximum probability value as the reasoning result.
7. A cross-media inference system based on heterogeneous interactive learning using the method of any one of claims 1 to 6, comprising:
the data set establishing module is responsible for establishing a cross-media implication reasoning data set, wherein the premise comprises two different media types, and the conclusion comprises one media type;
the network training module is responsible for training a heterogeneous interactive learning network structure by using a cross-media implication reasoning data set, and comprises cross-media interactive attention learning and heterogeneous tensor space construction;
and the reasoning module is used for reasoning by utilizing the trained heterogeneous interactive learning network and judging the implication relation of the given premise and the conclusion.
8. A computer comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911023636.7A CN110879844B (en) | 2019-10-25 | 2019-10-25 | Cross-media reasoning method and system based on heterogeneous interactive learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911023636.7A CN110879844B (en) | 2019-10-25 | 2019-10-25 | Cross-media reasoning method and system based on heterogeneous interactive learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110879844A CN110879844A (en) | 2020-03-13 |
CN110879844B true CN110879844B (en) | 2022-10-14 |
Family
ID=69728037
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911023636.7A Active CN110879844B (en) | 2019-10-25 | 2019-10-25 | Cross-media reasoning method and system based on heterogeneous interactive learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110879844B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102667765A (en) * | 2009-09-08 | 2012-09-12 | 诺基亚公司 | Method and apparatus for selective sharing of semantic information sets |
CN103971311A (en) * | 2014-05-09 | 2014-08-06 | 北京化工大学 | Reasoning drill method and system based on man-machine coordination |
CN105718532A (en) * | 2016-01-15 | 2016-06-29 | 北京大学 | Cross-media sequencing method based on multi-depth network structure |
CN110263912A (en) * | 2019-05-14 | 2019-09-20 | 杭州电子科技大学 | A kind of image answering method based on multiple target association depth reasoning |
-
2019
- 2019-10-25 CN CN201911023636.7A patent/CN110879844B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102667765A (en) * | 2009-09-08 | 2012-09-12 | 诺基亚公司 | Method and apparatus for selective sharing of semantic information sets |
CN103971311A (en) * | 2014-05-09 | 2014-08-06 | 北京化工大学 | Reasoning drill method and system based on man-machine coordination |
CN105718532A (en) * | 2016-01-15 | 2016-06-29 | 北京大学 | Cross-media sequencing method based on multi-depth network structure |
CN110263912A (en) * | 2019-05-14 | 2019-09-20 | 杭州电子科技大学 | A kind of image answering method based on multiple target association depth reasoning |
Non-Patent Citations (2)
Title |
---|
《A Fast Unified Model for Parsing and Sentence Understanding》;Bowman, SR 等;《54th Annual Meeting of the Association-for-Computational-Linguistics (ACL)》;20161231;全文 * |
《基于综合推理的多媒体语义挖掘和跨媒体检索》;杨易 等;《计算机辅助设计与图形学学报》;20090930;第21卷(第9期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110879844A (en) | 2020-03-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111079409B (en) | Emotion classification method utilizing context and aspect memory information | |
CN113239169B (en) | Answer generation method, device, equipment and storage medium based on artificial intelligence | |
CN113626589B (en) | Multi-label text classification method based on mixed attention mechanism | |
CN113987179A (en) | Knowledge enhancement and backtracking loss-based conversational emotion recognition network model, construction method, electronic device and storage medium | |
Chen et al. | Recurrent synchronization network for emotion-cause pair extraction | |
Huang et al. | Relation classification via knowledge graph enhanced transformer encoder | |
Liu et al. | AMFF: A new attention-based multi-feature fusion method for intention recognition | |
Choi et al. | Hybrid information mixing module for stock movement prediction | |
Zhou | Research on sentiment analysis model of short text based on deep learning | |
Liu et al. | Dual-feature-embeddings-based semi-supervised learning for cognitive engagement classification in online course discussions | |
CN110889505B (en) | Cross-media comprehensive reasoning method and system for image-text sequence matching | |
CN111597816A (en) | Self-attention named entity recognition method, device, equipment and storage medium | |
CN115906816A (en) | Text emotion analysis method of two-channel Attention model based on Bert | |
Tao et al. | Multi‐head attention graph convolutional network model: End‐to‐end entity and relation joint extraction based on multi‐head attention graph convolutional network | |
CN113010662B (en) | Hierarchical conversational machine reading understanding system and method | |
Zhao et al. | [Retracted] Application of Knowledge Map Based on BiLSTM‐CRF Algorithm Model in Ideological and Political Education Question Answering System | |
CN112989024B (en) | Method, device and equipment for extracting relation of text content and storage medium | |
Narayanan et al. | VQA as a factoid question answering problem: A novel approach for knowledge-aware and explainable visual question answering | |
CN116719900A (en) | Event causal relationship identification method based on hypergraph modeling document-level causal structure | |
CN110879844B (en) | Cross-media reasoning method and system based on heterogeneous interactive learning | |
Jasim et al. | Analyzing Social Media Sentiment: Twitter as a Case Study | |
van Dongen | Quality prediction of scientific documents using textual and visual content | |
CN115270746A (en) | Question sample generation method and device, electronic equipment and storage medium | |
CN114998041A (en) | Method and device for training claim settlement prediction model, electronic equipment and storage medium | |
Ibrahiem et al. | Convolutional Neural Network Multi-Emotion Classifiers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |