CN110704606B - Generation type abstract generation method based on image-text fusion - Google Patents

Generation type abstract generation method based on image-text fusion Download PDF

Info

Publication number
CN110704606B
CN110704606B CN201910764261.3A CN201910764261A CN110704606B CN 110704606 B CN110704606 B CN 110704606B CN 201910764261 A CN201910764261 A CN 201910764261A CN 110704606 B CN110704606 B CN 110704606B
Authority
CN
China
Prior art keywords
image
text
abstract
features
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910764261.3A
Other languages
Chinese (zh)
Other versions
CN110704606A (en
Inventor
曹亚男
徐灏
尚燕敏
刘燕兵
谭建龙
郭莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201910764261.3A priority Critical patent/CN110704606B/en
Publication of CN110704606A publication Critical patent/CN110704606A/en
Application granted granted Critical
Publication of CN110704606B publication Critical patent/CN110704606B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text

Abstract

The invention discloses a generating type abstract generating method based on image-text fusion, which comprises the following steps: 1) dividing a given text data set into a training set, a verification set and a test set; each sample in the text data set is a triple (X, I, Y), wherein X is a text, I is an image corresponding to the text X, and Y is an abstract of the text X; 2) extracting entity features of the images of the text data set, and expressing the extracted entity features into image feature vectors with the same dimensionality as the text; 3) training the generative abstract model by using the training set and the image characteristic vectors corresponding to the training set; 4) inputting a text and a corresponding image and generating an image characteristic vector of the image, and then inputting the text and the image characteristic vector corresponding to the text into a trained generative abstract model to obtain an abstract corresponding to the text. The abstract generated by the invention can effectively adjust the weight of the entity in the text and relieve the problem of unregistered words to a certain extent.

Description

Generation type abstract generation method based on image-text fusion
Technical Field
The invention belongs to the technical field of manual work, and relates to a generating type abstract generating method based on image-text fusion.
Background
The existing generative abstract method is mainly realized based on a seq2seq framework of deep learning and an attention mechanism. The Seq2Seq framework is mainly composed of an encoder (encoder) and a decoder (decoder), both of which are implemented by a neural network, which may be a Recurrent Neural Network (RNN) or a Convolutional Neural Network (CNN). The specific process is as follows, the encoder encodes the input original text into a vector (context), which is a representation of the original text. The decoder is then responsible for extracting important information from this vector, generating a text summary. The attention mechanism is to solve the bottleneck of information loss caused by conversion of long sequences to fixed-length vectors, i.e. to focus attention on the corresponding context in the decoder.
Although the seq2seq framework and attention mechanism based on deep learning have achieved some level of performance in the field of summary generation, they tend to generate high frequency words, which can lead to the problem of key entity bias. In general, there are two forms of deviation for key entities: firstly, due to the limitation of hardware resources, a limited word list is generally adopted, and some obscure key entity words in the article do not appear in the word list, so that the key entities are lost in the generated abstract; the second, relatively low frequency entity is ignored.
In order to solve the problem of key entity deviation, the invention provides a generating type abstract method based on image-text fusion.
Disclosure of Invention
The method and the device can solve the problem that key entities of the existing generated abstract are lost, so that the quality and readability of the generated abstract are improved.
The technical problem is solved by the following technical scheme:
a method for generating a generating abstract based on image-text fusion comprises the following steps:
step 1, carrying out data preprocessing operations such as stop word removal, special word marking and the like on a given text data set, and dividing the data into a training set, a verification set and a test set after shuffling. Each sample in the text dataset is a triplet (X, I, Y); where X is the text, I is the corresponding image (i.e., the image that matches X), and Y is the summary of the text X.
And 2, extracting main characteristic entities from the images corresponding to the text data set in the step 1, and expressing the main characteristic entities as image characteristics with the same dimension as the text. The characteristic entity comprises a full-text graph representation and three image representations of key entities; taking the text a as an example, if 30 words exist, the length of the word vector is 128 dimensions, the text is 30 128-dimensional vectors, the image features include three entities of the global and maximum region, so that the text is 4 128-dimensional vectors, and the text is 34 128-dimensional vectors together.
And 3, training the model by using the image characteristics corresponding to the training set processed in the step 1 and the training set processed in the step 2.
And 4, testing the performance of the model by using the test set after the abstract generation model is trained, wherein the Rouge evaluation index can be used.
And 5, in practical application, inputting a text and a corresponding image on an interactive interface, generating image characteristics of the image, and then inputting the input text and the corresponding image characteristics into the trained generative abstract model to obtain a corresponding abstract.
In step 1, the text data is preprocessed as follows:
step 1.1, the given original data set is subjected to one-to-one correspondence of texts, abstracts and images to obtain a triple (X, I, Y) of each sample.
And step 1.2, removing special characters, emoticons, full-angle characters and the like from the text and the abstract.
And step 1.3, replacing all hyperlink URLs by using TAGURL, replacing all dates by using TAGDATA, replacing all numbers by using TAGNUM and replacing all punctuation marks by using TAGPUN in the data set obtained in the step 1.2.
And step 1.4, filtering stop words by using the stop word list on the data washed by the step 1.3.
And step 1.5, the texts, the abstracts and the images are shuffled simultaneously in a one-to-one correspondence manner, and are proportionally divided into a training set, a verification set and a test set.
And step 1.6, constructing a word list with a certain length according to the data set, representing words in the text and the abstract which do not appear in the dictionary as 'UNK', adding a mark 'BOS' at the beginning of the document, adding 'EOS' after finishing, processing the text and the abstract into fixed lengths respectively, directly cutting off redundant words, and filling the words which are smaller than the length by using a placeholder 'PAD'.
Step 1.7, using the wordlebelling toolkit of Gensim, each word in the text summary dataset is represented by a word vector of fixed dimension k, including the special label of step 1.6.
In step 2, a generating abstract model based on image-text fusion is shown in fig. 1, and includes three modules: the method comprises a feature extraction module, a feature fusion module and an abstract generation module respectively, wherein step 2 is a detailed feature extraction method, and details are as follows:
and 2.1, capturing key entity characteristics of the corresponding images by using the images in the step 1.5 one by one through a Regional Convolutional Neural Network (RCNN) tool. The regional convolutional neural network algorithm comprises four steps of candidate region generation, feature extraction, category marking and position trimming, and the detailed process is as follows:
step 2.1.1, first, an over-segmentation technique is applied to segment each image into as many independent regions as possible, typically more than 1000. Then, the areas of the same image are merged according to a certain rule, and the merging rule comprises similar color merging, similar texture merging and the like. And finally, taking all the regions which appear after combination in the process as preliminary candidate regions.
And 2.1.2, performing feature extraction on each preliminary candidate region appearing in the step 2.1.1 by using a CNN network.
And 2.1.3, inputting the feature representation obtained by each preliminary candidate region into a Support Vector Machine (SVM) classifier, judging whether the feature representation is a corresponding entity label, if so, marking the entity label as 1, performing the step 2.1.4, if not, marking the entity label as 0, and deleting the candidate region.
And 2.1.4, correcting the frame position of the preliminary candidate region according to the result of the category mark by using a Regression (Regression) model. Specifically, for each class of objects, a Linear Ridge Regressor (LRR) is used for refinement.
And 2.2, sequencing the regional entity characteristics of each image obtained in the step 2.1 according to the size of the region, and selecting the first three regional entity characteristics with the largest region as candidate regions.
Step 2.3, uniformly using the VGG-16 network, and using fc for each candidate area feature obtained in the step 2.2 as shown in FIG. 27The layers are represented as 4096-dimensional image features, and the global vector of candidate regions is also represented as 4096-dimensional image features.
In the step 3, the detailed steps of feature fusion and abstract generation are as follows:
step 3.1, converting each 4096-dimensional image feature obtained by 2.3 into a feature with the same dimension as the text by using a bilinear network, wherein the feature can be represented as It=WiIvIn which IvRepresenting the image characteristics, W, obtained in step 2.3iIs a parameter of the bilinear network, ItRepresenting image feature vectors of the same dimension as the text.
And 3.2, for the same sample, splicing the text vector of the sample obtained in the step 1.7 and the image characteristic vector of the sample obtained in the step 3.1, splicing the text and the image into A, combining the A with the original abstract Y to obtain a binary group (A, Y), and obtaining a training set, a verification set and a test set represented by vectorization again.
Step 3.3, sampling k samples of the new training set obtained in the step 3.2, and sequentially inputting the samples into an encoder to obtain a combined code h of the text and the imagesBy means of an intermediate semantic vector ctCalculating the current state h of the decodertTherefore, feature fusion is realized, and the detailed settings are as follows:
the summary generation module generates a summary using the fused features. Representing the input samples of the training set as (a, Y), where a ═ a1,a2,…,anThe term represents n characteristics of text and image, and the term represents Y ═ Y1,y2,…,ymGet the summary for
Figure BDA0002171405580000031
And (4) showing.
In the encoding stage, the input feature vector at the current time i is represented as ai(vector for splicing text and image), the hidden layer output at the last moment is recorded as hs-1Then the hidden layer output at the current time i is hs=f(hs-1,ai)。
In the encoding stage, h is usedtRepresenting the hidden state of the decoder at the current time i.
By means of a transfer matrix WaCalculate h at the current StatetAnd hsDegree of association of (c), i.e. score (h)t,hs)=htWahsAfter normalizing it, have
Figure BDA0002171405580000032
Thereby obtaining an intermediate semantic vector ct=at(s)·hsAnd corresponding decoder derived hidden states
Figure BDA0002171405580000033
Is through a parametric network WcAnd a corresponding activation function, the expression of which is
Figure BDA0002171405580000034
Step 3.4, the hidden state of the decoder in the current state in the step 3.3 is processed
Figure BDA0002171405580000035
Through the softmax layer, a generation abstract is obtained and is represented as
Figure BDA0002171405580000041
Wherein, ytThe t-th word of the generated abstract Y, A is the splicing characteristic of the text vector and the image characteristic vector of the sample, and Ws is a parameter matrix.
Step 3.5, use optimization objectives
Figure BDA0002171405580000042
Repeating the steps 3.3 and 3.4 to train the model until the model converges; n is the total number of samples in the training set, theta is the model parameter, ynIs the nth word of the summary.
In step 4, the evaluation model is as follows:
step 4.1, inputting the characteristics of the test set obtained in the step 3.2 into the model trained in the step 3.5 to obtain a corresponding abstract;
step 4.2, the artificial abstracts corresponding to the test set correspond to the generated abstracts corresponding to the step 4.1 one by one to obtain
Figure BDA0002171405580000043
Step 4.3, mixing
Figure BDA0002171405580000044
The F-measures of Rouge-1, Rouge-2 and Rouge-L were evaluated in the Rouge toolkit.
In step 4, the step of applying the model is similar to step 4.1.
Compared with the prior art, the invention has the following positive effects:
compared with a pure text generation system, the abstract generated by the invention can effectively adjust the weight of the entity in the text and relieve the problem of unregistered words to a certain extent.
Drawings
FIG. 1 is a diagram of a generative abstract model based on image-text fusion;
FIG. 2 is a VGG-16 network model diagram.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings.
The present embodiment employs a multi-modal sentence abstract data set MMSS, which is a data set containing triples of text, image and abstract (X, Y, I), where the text and abstract are from the Gigawords data set of a broad evaluation abstract system, and the image is retrieved by a search engine. Finally, an (X, Y, I) triple data set is obtained through manual screening, wherein the (X, Y, I) triple data set comprises 66000 samples in a training set, and 2000 samples in a verification set and a test set respectively.
Step 1, preprocessing a data set.
Step 1.1, the given original data set is subjected to text, abstract and image one-to-one correspondence, namely (X, Y, I).
And step 1.2, removing special characters, emoticons and full-angle characters such as 'Rib', '300' and the like from the text and the abstract.
And step 1.3, replacing all hyperlink URLs by using TAGURL, replacing all dates by using TAGDATA, replacing all numbers by using TAGNUM and replacing all punctuation marks by using TAGPUN in the data set obtained in the step 1.2.
Step 1.4, since MMSS is a sentence-level abstract, the text is short, so the corresponding stop words are not filtered on the data set.
And step 1.5, the preprocessed text abstract images (X, Y, I) are shuffled simultaneously in a one-to-one correspondence manner, and are proportionally divided into a training set, a verification set and a test set.
Step 1.6, constructing a 5-thousand dictionary according to the data set, representing words in the text and the abstract which do not appear in the dictionary as 'UNK', adding a mark 'BOS' at the beginning of the document, adding 'EOS' at the end, limiting the text length to be 120 words at the longest, summarizing the text to be 30 words, directly cutting off redundant words, and filling the words with a placeholder 'PAD' which are smaller than the text length.
Step 1.7, using the WordEmbedding toolkit of Gensim, each word in the text summary dataset is represented by a 256-dimensional word vector with fixed dimensions, including the special mark of step 1.6.
And 2, extracting main characteristic entities from the image I corresponding to the text data set in the step 1, and expressing the main characteristic entities into image characteristics with the same dimension as the text.
And 2.1, capturing key entity characteristics of the corresponding images by using the images in the step 1.5 one by one through a Regional Convolutional Neural Network (RCNN) tool.
And 2.2, sorting the regional entity characteristics of each image obtained in the step 2.1 according to the size of the region, and selecting the first three regions with the largest regions as candidate regions.
Step 2.3, uniformly using the VGG-16 network, and using fc for each area feature obtained in the step 2.27The layers are represented as features of 4096 dimensions.
And 3, a generating abstract model based on image-text fusion is trained by using the training sets in the steps 1 and 2.
And 3.1, converting each area 4096-dimensional feature obtained by the 2.3 into a 256-dimensional feature with the same dimension as the text by using a bilinear network.
And 3.2, splicing the image characteristics obtained in the step 3.1 with the text obtained in the step 1.7, putting the image characteristics in the front of the text, and obtaining a training set, a verification set and a test set represented by vectorization again after BOS marking.
And 3.3, sampling 64 samples of the new training set obtained in the step 3.2, and sequentially inputting the samples into the model for training.
And 3.4, repeating the step 3.3 until the model converges on the training set and is optimal on the verification set.
Step 4, after the abstract generation model is trained, testing the performance of the model by using the test set, and evaluating indexes by using Rouge
Step 4.1, inputting the characteristics of the test set obtained in the step 3.2 into the model trained in the step 3 to obtain a corresponding abstract;
step 4.2, the artificial abstracts corresponding to the test set correspond to the generated abstracts corresponding to the step 4.1 one by one to obtain
Figure BDA0002171405580000051
Step 4.3, mixing
Figure BDA0002171405580000052
The F-measures of Rouge-1, Rouge-2 and Rouge-L were evaluated in the Rouge toolkit.
In order to compare the advantages and disadvantages of the generating-type abstract generating method (abbreviated as MSE) based on image-text fusion in the invention compared with the existing pure text model, currently, the method respectively adopts the Lead directly selecting the first 8 words, uses the compression of syntactic structure compression, the original Seq2Seq model (Abs), the Seq2Seq model + Attention mechanism (Abs +), uses the layered Attention mechanism to learn the Seq2Seq framework (Multi-Source) of Multi-Source data, records the F-measure of the Rouge score of each model for generating the abstract for the test set, and the experimental results are shown in the following table:
system for controlling a power supply Rouge-1 Rouge-2 Rouge-L
Lead 33.46 13.40 31.84
Compress 31.56 11.02 28.87
Abs 35.95 18.21 31.89
Abs+A 41.11 21.75 39.92
Multi-Source 39.67 19.11 38.03
MSE 43.94 23.15 41.56
The experimental result shows that after image information is introduced, the three Rouge scores are improved to a certain extent by the image-text fusion-based generation type abstract method, particularly the Rouge2, and the effectiveness brought by the image-text fusion is more effectively explained.
In practical application, a text is input in an interactive interface, image input can be omitted in the application stage, and a corresponding abstract is obtained by using 'PAD' filling:
for example, the input text: "Japan's colleted kidu traction, the large sample Such infection in the country, had induced disorders of # # billion yen-lrb- # billion malls-rrb-, the bank of Japan sack sand wednesday"
Obtaining an abstract: "Japan's bank losses # # # billion yen".
The abstract generated by the invention can effectively generate the entity of 'bank', which can be obtained from the practical case.
Although specific details of the invention, algorithms and figures are disclosed for illustrative purposes, these are intended to aid in the understanding of the contents of the invention and the implementation in accordance therewith, as will be appreciated by those skilled in the art: various substitutions, changes and modifications are possible without departing from the spirit and scope of the present invention and the appended claims. The invention should not be limited to the preferred embodiments and drawings disclosed herein, but rather should be defined only by the scope of the appended claims.

Claims (9)

1. A method for generating a generative abstract based on image-text fusion comprises the following steps:
1) dividing a given text data set into a training set, a verification set and a test set; each sample in the text data set is a triple (X, I, Y), wherein X is a text, I is an image corresponding to the text X, and Y is an abstract of the text X; the generative abstract model comprises a feature extraction module, a feature fusion module and an abstract generation module;
2) the feature extraction module captures the entity features of each image by using a regional convolutional neural network, and then selects the first three entity features with the largest regions as candidate regions; then generating image features of the image global features and image features of the three candidate regions; then converting the image features into image feature vectors with the same dimension as the text;
3) training the generative abstract model by using the training set and the image characteristic vectors corresponding to the training set; when training is carried out, for the same sample, the feature fusion module splices a text vector corresponding to the sample and an image feature vector corresponding to the sample to obtain a training set, a verification set and a test set which are represented vectorially; then k samples are selected from the training set represented by vectorization and are sequentially input into an encoder to obtain the joint encoding h of the text and the imagesBy means of an intermediate semantic vector ctComputing the hidden state h of the decodertThereby realizing feature fusion; then the abstract generating module generates an abstract by using the fused features;
4) inputting a text and a corresponding image and generating an image characteristic vector of the image, and then inputting the text and the image characteristic vector corresponding to the text into a trained generative abstract model to obtain an abstract corresponding to the text.
2. The method of claim 1, wherein the image feature vector comprises an image global feature vector and three entity vectors of a largest region in the image.
3. The method of claim 1, wherein the feature fusion method is: the hidden layer output at the current time i in the coding stage is a joint code hsAt the current time of the encoding stage, i the hidden state of the decoder is htBy means of a transfer matrix WaCalculate h at the current StatetAnd hsDegree of association score (h)t,hs) And normalizing it to obtain at(s) then computing an intermediate semantic vector ct=at(s)·hsAnd hidden state of decoder
Figure FDA0003410620880000011
4. The method of claim 3, wherein the digest generated by the digest generation module is
Figure FDA0003410620880000012
Figure FDA0003410620880000013
Wherein, ytThe t-th word of the generated abstract Y, A is the splicing characteristic of the text vector and the image characteristic vector of the sample, and Ws is a parameter matrix.
5. The method of claim 1, wherein the image features of each candidate region are converted into an image feature vector I with the same dimension as the text using a bilinear networkt=WiIvIn which IvRepresenting features of the image, WiIs a parameter of the bilinear network, ItRepresenting image feature vectors of the same dimension as the text.
6. The method of claim 1, wherein the method of capturing the physical features of each image using the area convolution neural network is:
21) dividing each image into a plurality of regions by applying an over-segmentation technology, then merging the regions of the same image according to a set merging rule, and taking all the regions appearing after merging as preliminary candidate regions;
22) performing feature extraction on each preliminary candidate region by using a CNN network;
23) inputting the features obtained from each preliminary candidate region into a support vector machine classifier, and judging whether the features are corresponding entity labels;
24) correcting the frame position of the preliminary candidate region according to the result of the category mark by using a regression model;
25) and sequencing the preliminary candidate regions of the image according to the sizes of the regions, and selecting entities corresponding to the first three regions with the largest regions as entity features of the image.
7. The method of claim 6, wherein the merging rule is near color merging or near texture merging.
8. The method of claim 1, wherein the trained generative digest model is tested using a test set, the generative digest model is verified using a verification set after the test is passed, and the step 4) is performed after the verification is passed.
9. The method of claim 1, wherein the step of converting the signal into a signal comprises converting the signal into a signal having a frequency of about one half of the signal
Figure FDA0003410620880000021
As an optimization objective, the generative summary model is trained until the generative summary model is generatedConverging the model; where N is the total number of samples in the training set, θ is the generative abstract model parameter, ynIs the nth word of the abstract, anIs the feature corresponding to the nth word.
CN201910764261.3A 2019-08-19 2019-08-19 Generation type abstract generation method based on image-text fusion Active CN110704606B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910764261.3A CN110704606B (en) 2019-08-19 2019-08-19 Generation type abstract generation method based on image-text fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910764261.3A CN110704606B (en) 2019-08-19 2019-08-19 Generation type abstract generation method based on image-text fusion

Publications (2)

Publication Number Publication Date
CN110704606A CN110704606A (en) 2020-01-17
CN110704606B true CN110704606B (en) 2022-05-31

Family

ID=69193427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910764261.3A Active CN110704606B (en) 2019-08-19 2019-08-19 Generation type abstract generation method based on image-text fusion

Country Status (1)

Country Link
CN (1) CN110704606B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414505B (en) * 2020-03-11 2023-10-20 上海爱数信息技术股份有限公司 Quick image abstract generation method based on sequence generation model
CN111563207B (en) * 2020-07-14 2020-11-10 口碑(上海)信息技术有限公司 Search result sorting method and device, storage medium and computer equipment
CN112541346A (en) * 2020-12-24 2021-03-23 北京百度网讯科技有限公司 Abstract generation method and device, electronic equipment and readable storage medium
CN113076433B (en) * 2021-04-26 2022-05-17 支付宝(杭州)信息技术有限公司 Retrieval method and device for retrieval object with multi-modal information
CN113609285A (en) * 2021-08-09 2021-11-05 福州大学 Multi-mode text summarization system based on door control fusion mechanism
CN115309888B (en) * 2022-08-26 2023-05-30 百度在线网络技术(北京)有限公司 Method and device for generating chart abstract and training method and device for generating model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018149376A1 (en) * 2017-02-17 2018-08-23 杭州海康威视数字技术股份有限公司 Video abstract generation method and device
CN109508400A (en) * 2018-10-09 2019-03-22 中国科学院自动化研究所 Picture and text abstraction generating method
CN109543512A (en) * 2018-10-09 2019-03-29 中国科学院自动化研究所 The evaluation method of picture and text abstract
CN109766432A (en) * 2018-07-12 2019-05-17 中国科学院信息工程研究所 A kind of Chinese abstraction generating method and device based on generation confrontation network
CN106997387B (en) * 2017-03-28 2019-08-09 中国科学院自动化研究所 Based on the multi-modal automaticabstracting of text-images match

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018149376A1 (en) * 2017-02-17 2018-08-23 杭州海康威视数字技术股份有限公司 Video abstract generation method and device
CN106997387B (en) * 2017-03-28 2019-08-09 中国科学院自动化研究所 Based on the multi-modal automaticabstracting of text-images match
CN109766432A (en) * 2018-07-12 2019-05-17 中国科学院信息工程研究所 A kind of Chinese abstraction generating method and device based on generation confrontation network
CN109508400A (en) * 2018-10-09 2019-03-22 中国科学院自动化研究所 Picture and text abstraction generating method
CN109543512A (en) * 2018-10-09 2019-03-29 中国科学院自动化研究所 The evaluation method of picture and text abstract

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Adversarial Reinforcement Learning for Chinese Text Summarization;Xu H,Cao Y,Jia R,et a1.;《International Conference on Computational Science》;20181231;全文 *
Image caption generation with text-conditional semantic attention;ZHOU L,XU C,KOCH P,et a1;《arXiv preprint arXiv:1606.04621》;20160912;全文 *
Rich feature hierarchies for accurate object detection and semantic segmentation;Ross Girshick;《2014 IEEE Conference on Computer Vision and Pattern Recognition》;20140628;全文 *
Sequence Generative Adversarial Network for Long Text Summarization;Xu H,Cao Y,Jia R,et a1;《2018 IEEE 30th International Conference on Tools with Artificial Intelligence》;20181231;全文 *

Also Published As

Publication number Publication date
CN110704606A (en) 2020-01-17

Similar Documents

Publication Publication Date Title
CN110704606B (en) Generation type abstract generation method based on image-text fusion
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN110046656B (en) Multi-mode scene recognition method based on deep learning
CN109165294B (en) Short text classification method based on Bayesian classification
CN109766432B (en) Chinese abstract generation method and device based on generation countermeasure network
CN110287320A (en) A kind of deep learning of combination attention mechanism is classified sentiment analysis model more
Shi et al. Deep adaptively-enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval
CN110096587B (en) Attention mechanism-based LSTM-CNN word embedded fine-grained emotion classification model
CN115203442B (en) Cross-modal deep hash retrieval method, system and medium based on joint attention
CN110781290A (en) Extraction method of structured text abstract of long chapter
CN110765264A (en) Text abstract generation method for enhancing semantic relevance
CN112800184B (en) Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction
CN111858940A (en) Multi-head attention-based legal case similarity calculation method and system
CN115438154A (en) Chinese automatic speech recognition text restoration method and system based on representation learning
CN114357120A (en) Non-supervision type retrieval method, system and medium based on FAQ
CN114281982B (en) Book propaganda abstract generation method and system adopting multi-mode fusion technology
CN115712731A (en) Multi-modal emotion analysis method based on ERNIE and multi-feature fusion
CN115600605A (en) Method, system, equipment and storage medium for jointly extracting Chinese entity relationship
CN111462752A (en) Client intention identification method based on attention mechanism, feature embedding and BI-L STM
CN114742047A (en) Text emotion recognition method based on maximum probability filling and multi-head attention mechanism
CN113377953B (en) Entity fusion and classification method based on PALC-DCA model
CN116932736A (en) Patent recommendation method based on combination of user requirements and inverted list
CN116701996A (en) Multi-modal emotion analysis method, system, equipment and medium based on multiple loss functions
Purba et al. A hybrid convolutional long short-term memory (CNN-LSTM) based natural language processing (NLP) model for sentiment analysis of customer product reviews in Bangla
Song et al. A lexical updating algorithm for sentiment analysis on Chinese movie reviews

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant