CN108804495B - Automatic text summarization method based on enhanced semantics - Google Patents
Automatic text summarization method based on enhanced semantics Download PDFInfo
- Publication number
- CN108804495B CN108804495B CN201810281684.5A CN201810281684A CN108804495B CN 108804495 B CN108804495 B CN 108804495B CN 201810281684 A CN201810281684 A CN 201810281684A CN 108804495 B CN108804495 B CN 108804495B
- Authority
- CN
- China
- Prior art keywords
- text
- hidden layer
- abstract
- sequence
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 239000010410 layer Substances 0.000 claims abstract description 38
- 239000013598 vector Substances 0.000 claims abstract description 25
- 238000004364 calculation method Methods 0.000 claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 12
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 239000002356 single layer Substances 0.000 claims abstract description 8
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 3
- 239000004576 sand Substances 0.000 claims description 3
- 238000010845 search algorithm Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 2
- 230000002457 bidirectional effect Effects 0.000 abstract description 3
- 238000013136 deep learning model Methods 0.000 abstract 1
- 230000004927 fusion Effects 0.000 abstract 1
- 230000007246 mechanism Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 241000723353 Chrysanthemum Species 0.000 description 1
- 235000005633 Chrysanthemum balsamita Nutrition 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 239000013604 expression vector Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- JEIPFZHSYJVQDO-UHFFFAOYSA-N iron(III) oxide Inorganic materials O=[Fe]O[Fe]=O JEIPFZHSYJVQDO-UHFFFAOYSA-N 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an automatic text summarization method based on enhanced semantics, which comprises the following steps: preprocessing a text, arranging the text from high to low according to word frequency information, and converting words into id; encoding the input sequence by using a single-layer bidirectional LSTM, and extracting text information characteristics; decoding the text semantic vector obtained by encoding by using a single-layer unidirectional LSTM to obtain a hidden layer state; calculating a context vector, and extracting the most useful information from the input sequence and the current output; and in the training stage, loss calculation is carried out on the semantic similarity of the fusion generated abstract and the source text, so that the semantic similarity of the abstract and the source text is improved. The invention uses the LSTM deep learning model to represent the text, integrates the semantic relation of the context, enhances the semantic relation between the abstract and the source text, generates the abstract which can be more suitable for the theme of the text, and has wide application prospect.
Description
Technical Field
The invention relates to the technical field of natural language processing, in particular to an automatic text summarization method based on enhanced semantics.
Background
With the rapid development of science and technology and the internet, the big data era comes, and the network information of the covered area is increasing day by day. In which, the explosive increase of representative text information amount, such as news, blog, chat, report, microblog, etc., makes the information burden heavy, and the huge information makes people spend a lot of time when browsing and reading. Therefore, how to quickly extract key contents from a large amount of text information and solve the problem of information overload becomes an urgent need, and an automatic text summarization technology comes along.
The automatic text summarization technique is classified into an abstract summary and a generative summary according to the type of the generative summary. The former is to sort the sentences in the original text according to a certain method, and take the first n sentences with the highest importance as the abstract; the latter is to describe and summarize the original text center thought by mining deeper semantic information. There has been a lot of research on abstract, but this method is only the vocabulary information staying on the surface, and the generated abstract is more suitable for the process of human generating abstract.
In recent years, due to the rise of deep learning, a lot of achievements have been achieved in many fields, and the field of automatic summarization has also been introduced. Based on the sequence-to-sequence seq2seq model, a generative abstract can be realized, and by using the successful application of machine translation as a reference, the automatic abstract based on the seq2seq model has become a research hotspot of natural language processing, but has some problems of continuity and readability. The traditional abstraction generally causes great information loss, which is particularly reflected in long texts, so that the deep research of the generated automatic abstraction has important significance for really solving the problem of information overload.
Disclosure of Invention
The invention aims to solve the defects in the prior art and provides an automatic text summarization method based on enhanced semantics, which is based on a seq2seq model, introduces an attention mechanism and trains by using semantic similarity between a generated summary and a source text, thereby improving semantic relevancy between the generated summary and the source text and improving summary quality.
The purpose of the invention can be achieved by adopting the following technical scheme:
an automatic text summarization method based on enhanced semantics, the automatic text summarization method comprising:
a text preprocessing step, namely performing word segmentation, form reduction and reference resolution on the text, arranging the words from high to low according to word frequency information, and converting the words into id;
coding, namely coding an input sequence, and obtaining a hidden layer state vector carrying text sequence information through a neural network;
decoding step, initializing the last hidden layer state obtained by the encoder, and starting decoding to obtain the hidden layer state s of each stept;
An attention distribution calculation step for combining the hidden layer state of the input sequence with the hidden layer state s obtained by decoding at the current momenttCalculating the context vector to obtain the context vector u at the current time tt;
And an abstract generation step, namely mapping the output obtained in the decoding step into vectors of the dimension of the size of the word list through two linear layers, wherein each dimension represents the probability of the word in the word list, and selecting a candidate word by using a certain selection strategy to generate an abstract.
Further, the data of the text in the text preprocessing step is a corpus crawled by a crawler or an open-source corpus, and consists of article-abstract pairs.
Further, in the step of preprocessing the text, the first 200k words are obtained as a basic word list, and meanwhile, special marks [ PAD ], [ UNK ], [ START ] and [ STOP ] are added into the word list, and the words of the text are converted into id, and each piece corresponds to a sequence.
Further, the input sequence is a word vector corresponding to an id sequence obtained by converting the text, the dimension of the word vector is 128, and the maximum length of the sequence is 700.
Further, the neural network is a single-layer bidirectional LSTM, the number of hidden layer units is 256, and the forward and reverse hidden layer states h are connected to obtain a final hidden layer state.
Further, the decoding step process is as follows:
receiving an input word vector and a previous-time hidden layer state, and obtaining a current-time hidden layer state s through a single-layer unidirectional LSTM neural networktThe number of hidden units is 256.
Further, the context vector utThe calculation method of (c) is as follows:
wherein, v, Wh,WsAnd battIs a parameter to be learned, hiIs the hidden layer state value of encoder, N is the outputThe length of the incoming sequence.
Furthermore, the selection strategy refers to that 4 results with the maximum probability are selected in each step by using a beam search algorithm in the testing stage until a summary sequence with the maximum probability is obtained finally, while only the words with the maximum probability are selected in the training stage, and the summary is compared and evaluated with the reference summary after being completely generated.
Further, in the digest generation step, only one word is generated in each step, and the maximum length of the generated digest is 100, that is, the maximum number of cycles from the encoding step to the digest generation step is 100, and when the end flag is output or the maximum length is reached, the probability calculation formula is as follows:
pv=softmax(V1(V2[st,ut]+b2)+b1)
wherein, V1,V2,b1,b2Are all parameters that need to be learned, pvProviding basis for predicting the next word.
Further, the digest generation step further includes: and performing semantic similarity Rel calculation on the finally obtained prediction abstract and the source text sequence, and punishing the abstract with low semantic relevance in the training process, wherein the calculation is as follows:
wherein,andhidden layer states, G, in forward and backward directions, respectivelytIs the encoder hidden layer state, λ is an adjustable factor, M is the length of the generated digest sequence, losstIs the loss of each step, combined with the semantic similarity Rel to form the total loss.
Compared with the prior art, the invention has the following advantages and effects:
the invention constructs an automatic text abstract model based on an LSTM based on a seq2seq model, introduces an attention mechanism to obtain a context vector at each moment when a decoder is used, introduces semantic similarity to enhance the semantic relevance between a generated abstract and a source text, fuses the similarity into a loss function during training, avoids model bias and improves the quality of the abstract.
Drawings
FIG. 1 is a flow chart of the steps of the enhanced semantic based automatic text summarization method of the present invention;
FIG. 2 is a diagram of a semantic similarity calculation structure in the present invention;
fig. 3 is a flowchart of an algorithm of each step when generating the abstract word in decoding according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
As shown in fig. 1, the automatic text summarization method based on enhanced semantics includes: the method comprises the steps of text preprocessing, encoding, decoding, attention and abstract generation. Wherein:
text preprocessing step, where the text data may be a corpus crawled by a crawler, or an open-source corpus, for example, CNN/daisy Mail, composed of article-abstract pairs, where each article has 780 words on average and the abstract has 56 words on average. The method comprises the steps of performing word segmentation and morphological restoration on a source text, obtaining the first 200k words as a basic word list according to the word frequency and the word frequency after resolution, adding special marks [ PAD ], [ UNK ], [ START ], [ STOP ] into the word list at the same time, converting the words of the text into id, wherein each word corresponds to a sequence, the abstract has the same principle, a training set comprises 287226 samples, a verification set comprises 13368 samples, and a test set comprises 11490 samples.
And a coding step, namely performing word embedding on the input sequence to obtain a 128-dimensional vector, and obtaining a text expression vector carrying text sequence information through a neural network.
The input sequence is an id sequence obtained by converting an article, the maximum length is 700, and the minimum length is 30.
The neural network in the encoding step is composed of a single-layer bidirectional LSTM, the number of hidden layer units is 256, and the forward and reverse hidden layer states h are connected to obtain the final hidden layer state.
Decoding step, receiving word vector of input sequence, passing through single-layer unidirectional LSTM neural network to obtain final hidden layer state stThe number of hidden units is 256.
An attention calculating step, which combines the decoding step at the current moment to obtain a decoding state stAnd the hidden layer state of the input sequence of the encoding step, obtaining the context vector u at the current momentt。
The context vector at time t is calculated as follows:
wherein, v, Wh,WsAnd battIs a parameter to be learned, hiIs the hidden layer state value of the encoder, and N is the length of the input sequence.
And an abstract generating step, namely mapping the output obtained in the decoding step into vectors of the dimension of the size of the word list through two linear layers, wherein each dimension represents the probability of the word in the word list, and selecting candidate words by using a certain selection strategy.
The selection strategy refers to that 4 results with the maximum probability are selected by the beam search algorithm in each step in the testing stage until a summary sequence with the maximum probability is obtained finally, the training stage only takes the words with the maximum probability, and the summary is compared and evaluated with the reference summary after being completely generated.
The maximum length of the generated abstract is 100, and the probability calculation formula is as follows:
pv=softmax(V1(V2[st,ut]+b2)+b1)
wherein, V1,V2,b1,b2Are all parameters that need to be learned, pvProviding a basis for predicting the next word.
The abstract generating step also comprises the following steps of carrying out semantic similarity Rel calculation on the finally obtained prediction abstract and the source text sequence, and punishing the abstract with low semantic relevance in the training process, wherein the calculation is as follows:
wherein,andhidden layer states, G, in forward and backward directions, respectivelytIs a plaitThe state of a hidden layer of a coder, lambda is an adjustable factor and defaults to 1, M is the length of a generated summary sequence, losstIs the loss of each step, combined with the similarity to make up the total loss.
In the training process, a back propagation algorithm is adopted, an Adagarad optimizer is used, the learning rate is 0.15, and the initial accelerator value is 0.1.
The decoding step is divided into a training stage and a testing stage, wherein the training stage takes the reference abstract as input, and the testing stage takes the last moment output as the moment input.
The indicators evaluating the reference summary and the prediction summary are the ROUGE indicators. A linux operating system is adopted, a program is run on a GPU, the used programming language is python, and the platform is tensorflow. The model with semantic similarity introduced runs for about 4 days, with about 380000 iterations, and the experimental results are shown in the following table.
TABLE 1 comparison of the results of the three models
Experimental model | ROUGE-1 | ROUGE-2 | ROUGE-L |
Basic LSTM model | 0.2896 | 0.1028 | 0.2613 |
LSTM+Attention | 0.3116 | 0.1127 | 0.2920 |
LSTM+Attention+Rel | 0.3493 | 0.1390 | 0.3342 |
The method fully exerts the capability of the seq2seq model for deeply excavating text semantic information by fusing an attention mechanism, so that information which is useful for current output in an input sequence can be focused when the abstract is generated by decoding, and loss calculation is carried out by fusing semantic similarity, so that the semantic similarity with a source text can be focused when the abstract is generated by the model, and sentences which are more in line with the original text semantics can be obtained. Compared with the traditional automatic summarization method based on statistics, the model based on deep learning has more representation capability and has great advantages on the task of automatic text summarization.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (7)
1. An automatic text summarization method based on enhanced semantics, characterized in that the automatic text summarization method comprises:
a text preprocessing step, namely performing word segmentation, form reduction and reference resolution on the text, arranging words from high to low according to word frequency information, and converting the words into an id sequence;
coding, namely coding an input sequence, and obtaining a hidden layer state vector carrying text sequence information through a neural network;
decoding step, initializing the last hidden layer state obtained by the encoder, and starting decoding to obtain the hidden layer state s of each stept;
An attention distribution calculation step for combining the hidden layer state of the input sequence with the hidden layer state s obtained by decoding at the current momenttCalculating the context vector to obtain the context vector u at the current time tt;
A summary generation step, namely mapping the output obtained in the decoding step into vectors of the dimension of the size of the word list through two linear layers, wherein each dimension represents the probability of a word in the word list, selecting a candidate word by using a selection strategy, and generating a summary; the selection strategy refers to that 4 results with the maximum probability are selected in each step by using a beam search algorithm in the testing stage until a summary sequence with the maximum probability is obtained finally, only the words with the maximum probability are selected in the training stage, and the summary is compared and evaluated with a reference summary after being completely generated;
the abstract generating step further comprises: and performing semantic similarity Rel calculation on the finally obtained prediction abstract and the source text sequence, and punishing the abstract with low semantic relevance in the training process, wherein the calculation is as follows:
wherein,andhidden layer states, G, in forward and backward directions, respectivelytIs the encoder hidden layer state, λ is an adjustable factor, M is the length of the generated digest sequence, losstLoss of each step is combined with semantic similarity Rel to form total loss;
in the abstract generating step, only one word is generated in each step, the maximum length of the generated abstract is 100, that is, the maximum cycle number from the encoding step to the abstract generating step is 100, and when the output end mark or the maximum length is reached, the probability calculation formula is as follows:
pv=softmax(V1(V2[st,ut]+b2)+b1)
wherein, V1,V2,b1,b2Are all parameters that need to be learned, pvProviding basis for predicting the next word.
2. The method for automatically abstracting text based on enhanced semantics as claimed in claim 1, wherein the data of the text in the text preprocessing step is a corpus crawled by a crawler or an open-source corpus, and is composed of article-abstract pairs.
3. The method for automatically summarizing text based on enhanced semantics of claim 1, wherein in the text preprocessing step, the top 200k words are obtained as basic vocabulary, and the special labels [ PAD ], [ UNK ], [ START ] and [ STOP ] are added into the vocabulary, and the words of the text are converted into id sequences, each corresponding to a sequence.
4. The method for automatically abstracting text based on enhanced semantics of claim 1, wherein the input sequence is a word vector corresponding to an id sequence obtained by converting a text, the dimension of the word vector is 128, and the maximum length of the sequence is 700.
5. The method according to claim 1, wherein the neural network is a single-layer bi-directional LSTM, the number of hidden layer units is 256, and forward and reverse hidden layer states h are connected to obtain a final hidden layer state.
6. The method for automatic text summarization based on enhanced semantics of claim 1 wherein the decoding step is performed as follows:
receiving an input word vector and a previous-time hidden layer state, and obtaining a current-time hidden layer state s through a single-layer unidirectional LSTM neural networktThe number of hidden units is 256.
7. The method according to claim 1, wherein the context vector u is a semantic vector with a semantic meaning that is different from the semantic meaning of the text to be extractedtThe calculation method of (c) is as follows:
wherein, v, Wh,WsAnd battIs a parameter to be learned, hiIs the hidden layer state value of the encoder, and N is the length of the input sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810281684.5A CN108804495B (en) | 2018-04-02 | 2018-04-02 | Automatic text summarization method based on enhanced semantics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810281684.5A CN108804495B (en) | 2018-04-02 | 2018-04-02 | Automatic text summarization method based on enhanced semantics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108804495A CN108804495A (en) | 2018-11-13 |
CN108804495B true CN108804495B (en) | 2021-10-22 |
Family
ID=64095279
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810281684.5A Expired - Fee Related CN108804495B (en) | 2018-04-02 | 2018-04-02 | Automatic text summarization method based on enhanced semantics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108804495B (en) |
Families Citing this family (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109800390B (en) * | 2018-12-21 | 2023-08-18 | 北京石油化工学院 | Method and device for calculating personalized emotion abstract |
CN109620205B (en) * | 2018-12-26 | 2022-10-28 | 上海联影智能医疗科技有限公司 | Electrocardiogram data classification method and device, computer equipment and storage medium |
CN111460109B (en) * | 2019-01-22 | 2023-12-26 | 阿里巴巴集团控股有限公司 | Method and device for generating abstract and dialogue abstract |
CN109829161B (en) * | 2019-01-30 | 2023-08-04 | 延边大学 | Method for automatically abstracting multiple languages |
CN109885673A (en) * | 2019-02-13 | 2019-06-14 | 北京航空航天大学 | A kind of Method for Automatic Text Summarization based on pre-training language model |
CN109947931B (en) * | 2019-03-20 | 2021-05-14 | 华南理工大学 | Method, system, device and medium for automatically abstracting text based on unsupervised learning |
CN110119444B (en) * | 2019-04-23 | 2023-06-30 | 中电科大数据研究院有限公司 | Drawing type and generating type combined document abstract generating model |
CN110134782B (en) * | 2019-05-14 | 2021-05-18 | 南京大学 | Text summarization model based on improved selection mechanism and LSTM variant and automatic text summarization method |
CN110209801B (en) * | 2019-05-15 | 2021-05-14 | 华南理工大学 | Text abstract automatic generation method based on self-attention network |
CN110222840B (en) * | 2019-05-17 | 2023-05-05 | 中山大学 | Cluster resource prediction method and device based on attention mechanism |
CN110209802B (en) * | 2019-06-05 | 2021-12-28 | 北京金山数字娱乐科技有限公司 | Method and device for extracting abstract text |
CN110334362B (en) * | 2019-07-12 | 2023-04-07 | 北京百奥知信息科技有限公司 | Method for solving and generating untranslated words based on medical neural machine translation |
CN110390103B (en) * | 2019-07-23 | 2022-12-27 | 中国民航大学 | Automatic short text summarization method and system based on double encoders |
CN110688479B (en) * | 2019-08-19 | 2022-06-17 | 中国科学院信息工程研究所 | Evaluation method and sequencing network for generating abstract |
CN110532554B (en) * | 2019-08-26 | 2023-05-05 | 南京信息职业技术学院 | Chinese abstract generation method, system and storage medium |
CN110765264A (en) * | 2019-10-16 | 2020-02-07 | 北京工业大学 | Text abstract generation method for enhancing semantic relevance |
CN110795556B (en) * | 2019-11-01 | 2023-04-18 | 中山大学 | Abstract generation method based on fine-grained plug-in decoding |
CN111078866B (en) * | 2019-12-30 | 2023-04-28 | 华南理工大学 | Chinese text abstract generation method based on sequence-to-sequence model |
CN111339763B (en) * | 2020-02-26 | 2022-06-28 | 四川大学 | English mail subject generation method based on multi-level neural network |
CN111414505B (en) * | 2020-03-11 | 2023-10-20 | 上海爱数信息技术股份有限公司 | Quick image abstract generation method based on sequence generation model |
CN111563160B (en) * | 2020-04-15 | 2023-03-31 | 华南理工大学 | Text automatic summarization method, device, medium and equipment based on global semantics |
CN111708877B (en) * | 2020-04-20 | 2023-05-09 | 中山大学 | Text abstract generation method based on key information selection and variational potential variable modeling |
CN111639174B (en) * | 2020-05-15 | 2023-12-22 | 民生科技有限责任公司 | Text abstract generation system, method, device and computer readable storage medium |
CN111797196B (en) * | 2020-06-01 | 2021-11-02 | 武汉大学 | Service discovery method combining attention mechanism LSTM and neural topic model |
CN112364157A (en) * | 2020-11-02 | 2021-02-12 | 北京中科凡语科技有限公司 | Multi-language automatic abstract generation method, device, equipment and storage medium |
CN113157855B (en) * | 2021-02-22 | 2023-02-21 | 福州大学 | Text summarization method and system fusing semantic and context information |
CN113221577A (en) * | 2021-04-28 | 2021-08-06 | 西安交通大学 | Education text knowledge induction method, system, equipment and readable storage medium |
CN113111663B (en) * | 2021-04-28 | 2024-09-06 | 东南大学 | Abstract generation method for fusing key information |
CN113407711B (en) * | 2021-06-17 | 2023-04-07 | 成都崇瑚信息技术有限公司 | Gibbs limited text abstract generation method by using pre-training model |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107291699A (en) * | 2017-07-04 | 2017-10-24 | 湖南星汉数智科技有限公司 | A kind of sentence semantic similarity computational methods |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107484017B (en) * | 2017-07-25 | 2020-05-26 | 天津大学 | Supervised video abstract generation method based on attention model |
CN107844469B (en) * | 2017-10-26 | 2020-06-26 | 北京大学 | Text simplification method based on word vector query model |
CN107832300A (en) * | 2017-11-17 | 2018-03-23 | 合肥工业大学 | Towards minimally invasive medical field text snippet generation method and device |
-
2018
- 2018-04-02 CN CN201810281684.5A patent/CN108804495B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107291699A (en) * | 2017-07-04 | 2017-10-24 | 湖南星汉数智科技有限公司 | A kind of sentence semantic similarity computational methods |
Also Published As
Publication number | Publication date |
---|---|
CN108804495A (en) | 2018-11-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108804495B (en) | Automatic text summarization method based on enhanced semantics | |
CN110119765B (en) | Keyword extraction method based on Seq2Seq framework | |
CN111897949B (en) | Guided text abstract generation method based on Transformer | |
CN111061862B (en) | Method for generating abstract based on attention mechanism | |
Ji et al. | Representation learning for text-level discourse parsing | |
CN110348016A (en) | Text snippet generation method based on sentence association attention mechanism | |
JP5128629B2 (en) | Part-of-speech tagging system, part-of-speech tagging model training apparatus and method | |
CN110020438A (en) | Enterprise or tissue Chinese entity disambiguation method and device based on recognition sequence | |
CN112215013B (en) | Clone code semantic detection method based on deep learning | |
CN111241816A (en) | Automatic news headline generation method | |
CN112183058B (en) | Poetry generation method and device based on BERT sentence vector input | |
CN111061861A (en) | XLNET-based automatic text abstract generation method | |
CN112732862B (en) | Neural network-based bidirectional multi-section reading zero sample entity linking method and device | |
CN113609284A (en) | Method and device for automatically generating text abstract fused with multivariate semantics | |
CN111984782A (en) | Method and system for generating text abstract of Tibetan language | |
CN114281982B (en) | Book propaganda abstract generation method and system adopting multi-mode fusion technology | |
Tomer et al. | STV-BEATS: skip thought vector and bi-encoder based automatic text summarizer | |
CN117708644A (en) | Method and system for generating judicial judge document abstract | |
Zhang et al. | Extractive Document Summarization based on hierarchical GRU | |
CN116069924A (en) | Text abstract generation method and system integrating global and local semantic features | |
CN109992774A (en) | The key phrase recognition methods of word-based attribute attention mechanism | |
CN114996442A (en) | Text abstract generation system combining abstract degree judgment and abstract optimization | |
CN114358006A (en) | Text content abstract generation method based on knowledge graph | |
CN114357154A (en) | Chinese abstract generation method based on double-coding-pointer hybrid network | |
KR102214754B1 (en) | Method and apparatus for generating product evaluation criteria |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20211022 |