CN110704606B - Generation type abstract generation method based on image-text fusion - Google Patents
Generation type abstract generation method based on image-text fusion Download PDFInfo
- Publication number
- CN110704606B CN110704606B CN201910764261.3A CN201910764261A CN110704606B CN 110704606 B CN110704606 B CN 110704606B CN 201910764261 A CN201910764261 A CN 201910764261A CN 110704606 B CN110704606 B CN 110704606B
- Authority
- CN
- China
- Prior art keywords
- image
- text
- abstract
- features
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/60—Editing figures and text; Combining figures or text
Abstract
The invention discloses a generating type abstract generating method based on image-text fusion, which comprises the following steps: 1) dividing a given text data set into a training set, a verification set and a test set; each sample in the text data set is a triple (X, I, Y), wherein X is a text, I is an image corresponding to the text X, and Y is an abstract of the text X; 2) extracting entity features of the images of the text data set, and expressing the extracted entity features into image feature vectors with the same dimensionality as the text; 3) training the generative abstract model by using the training set and the image characteristic vectors corresponding to the training set; 4) inputting a text and a corresponding image and generating an image characteristic vector of the image, and then inputting the text and the image characteristic vector corresponding to the text into a trained generative abstract model to obtain an abstract corresponding to the text. The abstract generated by the invention can effectively adjust the weight of the entity in the text and relieve the problem of unregistered words to a certain extent.
Description
Technical Field
The invention belongs to the technical field of manual work, and relates to a generating type abstract generating method based on image-text fusion.
Background
The existing generative abstract method is mainly realized based on a seq2seq framework of deep learning and an attention mechanism. The Seq2Seq framework is mainly composed of an encoder (encoder) and a decoder (decoder), both of which are implemented by a neural network, which may be a Recurrent Neural Network (RNN) or a Convolutional Neural Network (CNN). The specific process is as follows, the encoder encodes the input original text into a vector (context), which is a representation of the original text. The decoder is then responsible for extracting important information from this vector, generating a text summary. The attention mechanism is to solve the bottleneck of information loss caused by conversion of long sequences to fixed-length vectors, i.e. to focus attention on the corresponding context in the decoder.
Although the seq2seq framework and attention mechanism based on deep learning have achieved some level of performance in the field of summary generation, they tend to generate high frequency words, which can lead to the problem of key entity bias. In general, there are two forms of deviation for key entities: firstly, due to the limitation of hardware resources, a limited word list is generally adopted, and some obscure key entity words in the article do not appear in the word list, so that the key entities are lost in the generated abstract; the second, relatively low frequency entity is ignored.
In order to solve the problem of key entity deviation, the invention provides a generating type abstract method based on image-text fusion.
Disclosure of Invention
The method and the device can solve the problem that key entities of the existing generated abstract are lost, so that the quality and readability of the generated abstract are improved.
The technical problem is solved by the following technical scheme:
a method for generating a generating abstract based on image-text fusion comprises the following steps:
And 2, extracting main characteristic entities from the images corresponding to the text data set in the step 1, and expressing the main characteristic entities as image characteristics with the same dimension as the text. The characteristic entity comprises a full-text graph representation and three image representations of key entities; taking the text a as an example, if 30 words exist, the length of the word vector is 128 dimensions, the text is 30 128-dimensional vectors, the image features include three entities of the global and maximum region, so that the text is 4 128-dimensional vectors, and the text is 34 128-dimensional vectors together.
And 3, training the model by using the image characteristics corresponding to the training set processed in the step 1 and the training set processed in the step 2.
And 4, testing the performance of the model by using the test set after the abstract generation model is trained, wherein the Rouge evaluation index can be used.
And 5, in practical application, inputting a text and a corresponding image on an interactive interface, generating image characteristics of the image, and then inputting the input text and the corresponding image characteristics into the trained generative abstract model to obtain a corresponding abstract.
In step 1, the text data is preprocessed as follows:
step 1.1, the given original data set is subjected to one-to-one correspondence of texts, abstracts and images to obtain a triple (X, I, Y) of each sample.
And step 1.2, removing special characters, emoticons, full-angle characters and the like from the text and the abstract.
And step 1.3, replacing all hyperlink URLs by using TAGURL, replacing all dates by using TAGDATA, replacing all numbers by using TAGNUM and replacing all punctuation marks by using TAGPUN in the data set obtained in the step 1.2.
And step 1.4, filtering stop words by using the stop word list on the data washed by the step 1.3.
And step 1.5, the texts, the abstracts and the images are shuffled simultaneously in a one-to-one correspondence manner, and are proportionally divided into a training set, a verification set and a test set.
And step 1.6, constructing a word list with a certain length according to the data set, representing words in the text and the abstract which do not appear in the dictionary as 'UNK', adding a mark 'BOS' at the beginning of the document, adding 'EOS' after finishing, processing the text and the abstract into fixed lengths respectively, directly cutting off redundant words, and filling the words which are smaller than the length by using a placeholder 'PAD'.
Step 1.7, using the wordlebelling toolkit of Gensim, each word in the text summary dataset is represented by a word vector of fixed dimension k, including the special label of step 1.6.
In step 2, a generating abstract model based on image-text fusion is shown in fig. 1, and includes three modules: the method comprises a feature extraction module, a feature fusion module and an abstract generation module respectively, wherein step 2 is a detailed feature extraction method, and details are as follows:
and 2.1, capturing key entity characteristics of the corresponding images by using the images in the step 1.5 one by one through a Regional Convolutional Neural Network (RCNN) tool. The regional convolutional neural network algorithm comprises four steps of candidate region generation, feature extraction, category marking and position trimming, and the detailed process is as follows:
step 2.1.1, first, an over-segmentation technique is applied to segment each image into as many independent regions as possible, typically more than 1000. Then, the areas of the same image are merged according to a certain rule, and the merging rule comprises similar color merging, similar texture merging and the like. And finally, taking all the regions which appear after combination in the process as preliminary candidate regions.
And 2.1.2, performing feature extraction on each preliminary candidate region appearing in the step 2.1.1 by using a CNN network.
And 2.1.3, inputting the feature representation obtained by each preliminary candidate region into a Support Vector Machine (SVM) classifier, judging whether the feature representation is a corresponding entity label, if so, marking the entity label as 1, performing the step 2.1.4, if not, marking the entity label as 0, and deleting the candidate region.
And 2.1.4, correcting the frame position of the preliminary candidate region according to the result of the category mark by using a Regression (Regression) model. Specifically, for each class of objects, a Linear Ridge Regressor (LRR) is used for refinement.
And 2.2, sequencing the regional entity characteristics of each image obtained in the step 2.1 according to the size of the region, and selecting the first three regional entity characteristics with the largest region as candidate regions.
Step 2.3, uniformly using the VGG-16 network, and using fc for each candidate area feature obtained in the step 2.2 as shown in FIG. 27The layers are represented as 4096-dimensional image features, and the global vector of candidate regions is also represented as 4096-dimensional image features.
In the step 3, the detailed steps of feature fusion and abstract generation are as follows:
step 3.1, converting each 4096-dimensional image feature obtained by 2.3 into a feature with the same dimension as the text by using a bilinear network, wherein the feature can be represented as It=WiIvIn which IvRepresenting the image characteristics, W, obtained in step 2.3iIs a parameter of the bilinear network, ItRepresenting image feature vectors of the same dimension as the text.
And 3.2, for the same sample, splicing the text vector of the sample obtained in the step 1.7 and the image characteristic vector of the sample obtained in the step 3.1, splicing the text and the image into A, combining the A with the original abstract Y to obtain a binary group (A, Y), and obtaining a training set, a verification set and a test set represented by vectorization again.
Step 3.3, sampling k samples of the new training set obtained in the step 3.2, and sequentially inputting the samples into an encoder to obtain a combined code h of the text and the imagesBy means of an intermediate semantic vector ctCalculating the current state h of the decodertTherefore, feature fusion is realized, and the detailed settings are as follows:
the summary generation module generates a summary using the fused features. Representing the input samples of the training set as (a, Y), where a ═ a1,a2,…,anThe term represents n characteristics of text and image, and the term represents Y ═ Y1,y2,…,ymGet the summary forAnd (4) showing.
In the encoding stage, the input feature vector at the current time i is represented as ai(vector for splicing text and image), the hidden layer output at the last moment is recorded as hs-1Then the hidden layer output at the current time i is hs=f(hs-1,ai)。
In the encoding stage, h is usedtRepresenting the hidden state of the decoder at the current time i.
By means of a transfer matrix WaCalculate h at the current StatetAnd hsDegree of association of (c), i.e. score (h)t,hs)=htWahsAfter normalizing it, haveThereby obtaining an intermediate semantic vector ct=at(s)·hsAnd corresponding decoder derived hidden statesIs through a parametric network WcAnd a corresponding activation function, the expression of which is
Step 3.4, the hidden state of the decoder in the current state in the step 3.3 is processedThrough the softmax layer, a generation abstract is obtained and is represented asWherein, ytThe t-th word of the generated abstract Y, A is the splicing characteristic of the text vector and the image characteristic vector of the sample, and Ws is a parameter matrix.
Step 3.5, use optimization objectivesRepeating the steps 3.3 and 3.4 to train the model until the model converges; n is the total number of samples in the training set, theta is the model parameter, ynIs the nth word of the summary.
In step 4, the evaluation model is as follows:
step 4.1, inputting the characteristics of the test set obtained in the step 3.2 into the model trained in the step 3.5 to obtain a corresponding abstract;
step 4.2, the artificial abstracts corresponding to the test set correspond to the generated abstracts corresponding to the step 4.1 one by one to obtain
In step 4, the step of applying the model is similar to step 4.1.
Compared with the prior art, the invention has the following positive effects:
compared with a pure text generation system, the abstract generated by the invention can effectively adjust the weight of the entity in the text and relieve the problem of unregistered words to a certain extent.
Drawings
FIG. 1 is a diagram of a generative abstract model based on image-text fusion;
FIG. 2 is a VGG-16 network model diagram.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings.
The present embodiment employs a multi-modal sentence abstract data set MMSS, which is a data set containing triples of text, image and abstract (X, Y, I), where the text and abstract are from the Gigawords data set of a broad evaluation abstract system, and the image is retrieved by a search engine. Finally, an (X, Y, I) triple data set is obtained through manual screening, wherein the (X, Y, I) triple data set comprises 66000 samples in a training set, and 2000 samples in a verification set and a test set respectively.
Step 1.1, the given original data set is subjected to text, abstract and image one-to-one correspondence, namely (X, Y, I).
And step 1.2, removing special characters, emoticons and full-angle characters such as 'Rib', '300' and the like from the text and the abstract.
And step 1.3, replacing all hyperlink URLs by using TAGURL, replacing all dates by using TAGDATA, replacing all numbers by using TAGNUM and replacing all punctuation marks by using TAGPUN in the data set obtained in the step 1.2.
Step 1.4, since MMSS is a sentence-level abstract, the text is short, so the corresponding stop words are not filtered on the data set.
And step 1.5, the preprocessed text abstract images (X, Y, I) are shuffled simultaneously in a one-to-one correspondence manner, and are proportionally divided into a training set, a verification set and a test set.
Step 1.6, constructing a 5-thousand dictionary according to the data set, representing words in the text and the abstract which do not appear in the dictionary as 'UNK', adding a mark 'BOS' at the beginning of the document, adding 'EOS' at the end, limiting the text length to be 120 words at the longest, summarizing the text to be 30 words, directly cutting off redundant words, and filling the words with a placeholder 'PAD' which are smaller than the text length.
Step 1.7, using the WordEmbedding toolkit of Gensim, each word in the text summary dataset is represented by a 256-dimensional word vector with fixed dimensions, including the special mark of step 1.6.
And 2, extracting main characteristic entities from the image I corresponding to the text data set in the step 1, and expressing the main characteristic entities into image characteristics with the same dimension as the text.
And 2.1, capturing key entity characteristics of the corresponding images by using the images in the step 1.5 one by one through a Regional Convolutional Neural Network (RCNN) tool.
And 2.2, sorting the regional entity characteristics of each image obtained in the step 2.1 according to the size of the region, and selecting the first three regions with the largest regions as candidate regions.
Step 2.3, uniformly using the VGG-16 network, and using fc for each area feature obtained in the step 2.27The layers are represented as features of 4096 dimensions.
And 3, a generating abstract model based on image-text fusion is trained by using the training sets in the steps 1 and 2.
And 3.1, converting each area 4096-dimensional feature obtained by the 2.3 into a 256-dimensional feature with the same dimension as the text by using a bilinear network.
And 3.2, splicing the image characteristics obtained in the step 3.1 with the text obtained in the step 1.7, putting the image characteristics in the front of the text, and obtaining a training set, a verification set and a test set represented by vectorization again after BOS marking.
And 3.3, sampling 64 samples of the new training set obtained in the step 3.2, and sequentially inputting the samples into the model for training.
And 3.4, repeating the step 3.3 until the model converges on the training set and is optimal on the verification set.
Step 4, after the abstract generation model is trained, testing the performance of the model by using the test set, and evaluating indexes by using Rouge
Step 4.1, inputting the characteristics of the test set obtained in the step 3.2 into the model trained in the step 3 to obtain a corresponding abstract;
step 4.2, the artificial abstracts corresponding to the test set correspond to the generated abstracts corresponding to the step 4.1 one by one to obtain
In order to compare the advantages and disadvantages of the generating-type abstract generating method (abbreviated as MSE) based on image-text fusion in the invention compared with the existing pure text model, currently, the method respectively adopts the Lead directly selecting the first 8 words, uses the compression of syntactic structure compression, the original Seq2Seq model (Abs), the Seq2Seq model + Attention mechanism (Abs +), uses the layered Attention mechanism to learn the Seq2Seq framework (Multi-Source) of Multi-Source data, records the F-measure of the Rouge score of each model for generating the abstract for the test set, and the experimental results are shown in the following table:
system for controlling a power supply | Rouge-1 | Rouge-2 | Rouge-L |
Lead | 33.46 | 13.40 | 31.84 |
Compress | 31.56 | 11.02 | 28.87 |
Abs | 35.95 | 18.21 | 31.89 |
Abs+A | 41.11 | 21.75 | 39.92 |
Multi-Source | 39.67 | 19.11 | 38.03 |
MSE | 43.94 | 23.15 | 41.56 |
The experimental result shows that after image information is introduced, the three Rouge scores are improved to a certain extent by the image-text fusion-based generation type abstract method, particularly the Rouge2, and the effectiveness brought by the image-text fusion is more effectively explained.
In practical application, a text is input in an interactive interface, image input can be omitted in the application stage, and a corresponding abstract is obtained by using 'PAD' filling:
for example, the input text: "Japan's colleted kidu traction, the large sample Such infection in the country, had induced disorders of # # billion yen-lrb- # billion malls-rrb-, the bank of Japan sack sand wednesday"
Obtaining an abstract: "Japan's bank losses # # # billion yen".
The abstract generated by the invention can effectively generate the entity of 'bank', which can be obtained from the practical case.
Although specific details of the invention, algorithms and figures are disclosed for illustrative purposes, these are intended to aid in the understanding of the contents of the invention and the implementation in accordance therewith, as will be appreciated by those skilled in the art: various substitutions, changes and modifications are possible without departing from the spirit and scope of the present invention and the appended claims. The invention should not be limited to the preferred embodiments and drawings disclosed herein, but rather should be defined only by the scope of the appended claims.
Claims (9)
1. A method for generating a generative abstract based on image-text fusion comprises the following steps:
1) dividing a given text data set into a training set, a verification set and a test set; each sample in the text data set is a triple (X, I, Y), wherein X is a text, I is an image corresponding to the text X, and Y is an abstract of the text X; the generative abstract model comprises a feature extraction module, a feature fusion module and an abstract generation module;
2) the feature extraction module captures the entity features of each image by using a regional convolutional neural network, and then selects the first three entity features with the largest regions as candidate regions; then generating image features of the image global features and image features of the three candidate regions; then converting the image features into image feature vectors with the same dimension as the text;
3) training the generative abstract model by using the training set and the image characteristic vectors corresponding to the training set; when training is carried out, for the same sample, the feature fusion module splices a text vector corresponding to the sample and an image feature vector corresponding to the sample to obtain a training set, a verification set and a test set which are represented vectorially; then k samples are selected from the training set represented by vectorization and are sequentially input into an encoder to obtain the joint encoding h of the text and the imagesBy means of an intermediate semantic vector ctComputing the hidden state h of the decodertThereby realizing feature fusion; then the abstract generating module generates an abstract by using the fused features;
4) inputting a text and a corresponding image and generating an image characteristic vector of the image, and then inputting the text and the image characteristic vector corresponding to the text into a trained generative abstract model to obtain an abstract corresponding to the text.
2. The method of claim 1, wherein the image feature vector comprises an image global feature vector and three entity vectors of a largest region in the image.
3. The method of claim 1, wherein the feature fusion method is: the hidden layer output at the current time i in the coding stage is a joint code hsAt the current time of the encoding stage, i the hidden state of the decoder is htBy means of a transfer matrix WaCalculate h at the current StatetAnd hsDegree of association score (h)t,hs) And normalizing it to obtain at(s) then computing an intermediate semantic vector ct=at(s)·hsAnd hidden state of decoder
5. The method of claim 1, wherein the image features of each candidate region are converted into an image feature vector I with the same dimension as the text using a bilinear networkt=WiIvIn which IvRepresenting features of the image, WiIs a parameter of the bilinear network, ItRepresenting image feature vectors of the same dimension as the text.
6. The method of claim 1, wherein the method of capturing the physical features of each image using the area convolution neural network is:
21) dividing each image into a plurality of regions by applying an over-segmentation technology, then merging the regions of the same image according to a set merging rule, and taking all the regions appearing after merging as preliminary candidate regions;
22) performing feature extraction on each preliminary candidate region by using a CNN network;
23) inputting the features obtained from each preliminary candidate region into a support vector machine classifier, and judging whether the features are corresponding entity labels;
24) correcting the frame position of the preliminary candidate region according to the result of the category mark by using a regression model;
25) and sequencing the preliminary candidate regions of the image according to the sizes of the regions, and selecting entities corresponding to the first three regions with the largest regions as entity features of the image.
7. The method of claim 6, wherein the merging rule is near color merging or near texture merging.
8. The method of claim 1, wherein the trained generative digest model is tested using a test set, the generative digest model is verified using a verification set after the test is passed, and the step 4) is performed after the verification is passed.
9. The method of claim 1, wherein the step of converting the signal into a signal comprises converting the signal into a signal having a frequency of about one half of the signalAs an optimization objective, the generative summary model is trained until the generative summary model is generatedConverging the model; where N is the total number of samples in the training set, θ is the generative abstract model parameter, ynIs the nth word of the abstract, anIs the feature corresponding to the nth word.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910764261.3A CN110704606B (en) | 2019-08-19 | 2019-08-19 | Generation type abstract generation method based on image-text fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910764261.3A CN110704606B (en) | 2019-08-19 | 2019-08-19 | Generation type abstract generation method based on image-text fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110704606A CN110704606A (en) | 2020-01-17 |
CN110704606B true CN110704606B (en) | 2022-05-31 |
Family
ID=69193427
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910764261.3A Active CN110704606B (en) | 2019-08-19 | 2019-08-19 | Generation type abstract generation method based on image-text fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110704606B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111414505B (en) * | 2020-03-11 | 2023-10-20 | 上海爱数信息技术股份有限公司 | Quick image abstract generation method based on sequence generation model |
CN111563207B (en) * | 2020-07-14 | 2020-11-10 | 口碑(上海)信息技术有限公司 | Search result sorting method and device, storage medium and computer equipment |
CN112541346A (en) * | 2020-12-24 | 2021-03-23 | 北京百度网讯科技有限公司 | Abstract generation method and device, electronic equipment and readable storage medium |
CN113076433B (en) * | 2021-04-26 | 2022-05-17 | 支付宝(杭州)信息技术有限公司 | Retrieval method and device for retrieval object with multi-modal information |
CN113609285A (en) * | 2021-08-09 | 2021-11-05 | 福州大学 | Multi-mode text summarization system based on door control fusion mechanism |
CN115309888B (en) * | 2022-08-26 | 2023-05-30 | 百度在线网络技术(北京)有限公司 | Method and device for generating chart abstract and training method and device for generating model |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018149376A1 (en) * | 2017-02-17 | 2018-08-23 | 杭州海康威视数字技术股份有限公司 | Video abstract generation method and device |
CN109508400A (en) * | 2018-10-09 | 2019-03-22 | 中国科学院自动化研究所 | Picture and text abstraction generating method |
CN109543512A (en) * | 2018-10-09 | 2019-03-29 | 中国科学院自动化研究所 | The evaluation method of picture and text abstract |
CN109766432A (en) * | 2018-07-12 | 2019-05-17 | 中国科学院信息工程研究所 | A kind of Chinese abstraction generating method and device based on generation confrontation network |
CN106997387B (en) * | 2017-03-28 | 2019-08-09 | 中国科学院自动化研究所 | Based on the multi-modal automaticabstracting of text-images match |
-
2019
- 2019-08-19 CN CN201910764261.3A patent/CN110704606B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018149376A1 (en) * | 2017-02-17 | 2018-08-23 | 杭州海康威视数字技术股份有限公司 | Video abstract generation method and device |
CN106997387B (en) * | 2017-03-28 | 2019-08-09 | 中国科学院自动化研究所 | Based on the multi-modal automaticabstracting of text-images match |
CN109766432A (en) * | 2018-07-12 | 2019-05-17 | 中国科学院信息工程研究所 | A kind of Chinese abstraction generating method and device based on generation confrontation network |
CN109508400A (en) * | 2018-10-09 | 2019-03-22 | 中国科学院自动化研究所 | Picture and text abstraction generating method |
CN109543512A (en) * | 2018-10-09 | 2019-03-29 | 中国科学院自动化研究所 | The evaluation method of picture and text abstract |
Non-Patent Citations (4)
Title |
---|
Adversarial Reinforcement Learning for Chinese Text Summarization;Xu H,Cao Y,Jia R,et a1.;《International Conference on Computational Science》;20181231;全文 * |
Image caption generation with text-conditional semantic attention;ZHOU L,XU C,KOCH P,et a1;《arXiv preprint arXiv:1606.04621》;20160912;全文 * |
Rich feature hierarchies for accurate object detection and semantic segmentation;Ross Girshick;《2014 IEEE Conference on Computer Vision and Pattern Recognition》;20140628;全文 * |
Sequence Generative Adversarial Network for Long Text Summarization;Xu H,Cao Y,Jia R,et a1;《2018 IEEE 30th International Conference on Tools with Artificial Intelligence》;20181231;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110704606A (en) | 2020-01-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110704606B (en) | Generation type abstract generation method based on image-text fusion | |
CN110298037B (en) | Convolutional neural network matching text recognition method based on enhanced attention mechanism | |
CN110046656B (en) | Multi-mode scene recognition method based on deep learning | |
CN109165294B (en) | Short text classification method based on Bayesian classification | |
CN109766432B (en) | Chinese abstract generation method and device based on generation countermeasure network | |
CN110287320A (en) | A kind of deep learning of combination attention mechanism is classified sentiment analysis model more | |
Shi et al. | Deep adaptively-enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval | |
CN110096587B (en) | Attention mechanism-based LSTM-CNN word embedded fine-grained emotion classification model | |
CN115203442B (en) | Cross-modal deep hash retrieval method, system and medium based on joint attention | |
CN110781290A (en) | Extraction method of structured text abstract of long chapter | |
CN110765264A (en) | Text abstract generation method for enhancing semantic relevance | |
CN112800184B (en) | Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction | |
CN111858940A (en) | Multi-head attention-based legal case similarity calculation method and system | |
CN115438154A (en) | Chinese automatic speech recognition text restoration method and system based on representation learning | |
CN114357120A (en) | Non-supervision type retrieval method, system and medium based on FAQ | |
CN114281982B (en) | Book propaganda abstract generation method and system adopting multi-mode fusion technology | |
CN115712731A (en) | Multi-modal emotion analysis method based on ERNIE and multi-feature fusion | |
CN115600605A (en) | Method, system, equipment and storage medium for jointly extracting Chinese entity relationship | |
CN111462752A (en) | Client intention identification method based on attention mechanism, feature embedding and BI-L STM | |
CN114742047A (en) | Text emotion recognition method based on maximum probability filling and multi-head attention mechanism | |
CN113377953B (en) | Entity fusion and classification method based on PALC-DCA model | |
CN116932736A (en) | Patent recommendation method based on combination of user requirements and inverted list | |
CN116701996A (en) | Multi-modal emotion analysis method, system, equipment and medium based on multiple loss functions | |
Purba et al. | A hybrid convolutional long short-term memory (CNN-LSTM) based natural language processing (NLP) model for sentiment analysis of customer product reviews in Bangla | |
Song et al. | A lexical updating algorithm for sentiment analysis on Chinese movie reviews |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |