CN111061862A - Method for generating abstract based on attention mechanism - Google Patents

Method for generating abstract based on attention mechanism Download PDF

Info

Publication number
CN111061862A
CN111061862A CN201911293797.8A CN201911293797A CN111061862A CN 111061862 A CN111061862 A CN 111061862A CN 201911293797 A CN201911293797 A CN 201911293797A CN 111061862 A CN111061862 A CN 111061862A
Authority
CN
China
Prior art keywords
word
article
abstract
layer
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911293797.8A
Other languages
Chinese (zh)
Other versions
CN111061862B (en
Inventor
唐卓
方小泉
李肯立
周文
阳王东
周旭
刘楚波
曹嵘晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN201911293797.8A priority Critical patent/CN111061862B/en
Publication of CN111061862A publication Critical patent/CN111061862A/en
Application granted granted Critical
Publication of CN111061862B publication Critical patent/CN111061862B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Abstract

The invention discloses a method for generating an abstract based on an attention mechanism to generate a text abstract, which comprises two stages in total: the first stage is a sentence ordering process and the second stage is a summary generation process, and the input of the summary generation process is the N sentences which are most relevant to the article topics and obtained by the first stage. In the first stage, a supervised sorting method is provided for articles with titles, similarity between each sentence and the title is calculated, the articles are sorted according to the similarity, and finally N sentences with the highest similarity are selected. For the second stage, the present invention proposes a new way of calculating the attention distribution between the encoder and the decoder, i.e. at different times the decoder should focus on different parts of the encoder. The invention solves the problem of article information attenuation caused by directly truncating a part of text as the input of the abstract generation model when the article is too long through the sorting method and the abstract generation method.

Description

Method for generating abstract based on attention mechanism
Technical Field
The invention belongs to the field of natural language processing, and particularly relates to a method for generating an abstract based on an attention mechanism.
Background
With the development of the internet, the network information is explosively increased, and how to quickly and effectively acquire the network information has become an important research problem. The text abstract is developed under such a background, and with the development of information retrieval and natural language processing technologies, the text abstract has become a research hotspot in recent years.
Text summarization is the purpose of converting a text or a collection of texts into a short text containing key information. Text digests can be divided into single document digests, which generate digests from a given one of the documents, and multiple document digests, which generate digests from a given set of topic-related documents, according to the type of input. There are two methods for dividing the sentence in the abstract mode, one is the extraction type, which is to find some key sentences from the article directly and combine them into an abstract according to the appearance sequence of the article; one is generative, requiring that the computer can read the content of the article and express it in more refined form. Compared with the abstraction type abstract technology, the advanced natural language processing algorithm used in the generation type method generates a more concise and concise abstract through the technologies of rephrasing, synonymous substitution, sentence abbreviation and the like.
At present, most of the methods for generating the abstract are implemented by using an encoder and a decoder using a Recurrent Neural Network (RNN) and its variants as basic units. However, this digest generation method has some non-negligible drawbacks: firstly, the RNN network training process is complex and the training is very slow; secondly, because the input length of the encoder is limited, when the length of the article is greater than the input length, the article content greater than the input length is automatically deleted, so that some important information in the article is lost.
Disclosure of Invention
In view of the above defects or improvement requirements of the prior art, the present invention provides a method for generating an abstract based on attention mechanism, which aims to solve the technical problems of slow training speed and high training difficulty in the existing method for acquiring a generated abstract based on RNN, and the technical problems of some important information loss in an article.
To achieve the above object, according to one aspect of the present invention, there is provided a method for generating a summary based on an attention mechanism, comprising the steps of:
s1, acquiring an article from the Internet, and inputting the article into a trained sentence sequencing model to acquire a simplified article;
and S2, inputting the simplified article obtained in the step S1 into a trained abstract generation model to obtain an abstract of the article.
Preferably, the training process in the sentence ordering model is as follows:
(1) acquiring a title of an article, and inputting each word in the title into a title-level encoder of a sentence sequencing model to obtain a semantic vector of the title;
(2) obtaining sentences in the article, inputting each word in the sentences into a sentence-level encoder of a sentence sequencing model to obtain semantic vectors which correspond to the sentences and contain title information;
(3) and (3) calculating the similarity between the title and the sentence according to the semantic vector which is obtained in the step (2), corresponds to the sentence and contains the title information.
The specific step is that firstly, the maximum pooling operation is carried out on the semantic vector which is obtained in the step (2), corresponds to the sentence and contains the title information so as to obtain the final representation of the sentence, and the final representation is processed by using linear mapping and sigmoid activation function so as to obtain the similarity between the title and the sentence.
The calculation formula of the similarity between the title and the sentence is as follows:
Figure BDA0002319952060000021
where s represents the similarity between the title and the sentence, n represents the total number of words in the sentence, w1 represents the weight of the linear mapping, axpooling represents the maximum pooling operation, which means that the maximum value is selected as the result, the sigmoid activation function is to transform the continuous value of the input into the output between 0 and 1,
Figure BDA0002319952060000031
and (4) representing the semantic vector which corresponds to the sentence and contains the title information and is obtained in the step (2-4).
(4) Repeating the steps (2) and (3) for m times to obtain the similarity between the title and each sentence in the article, sequencing all m sentences in the article according to the obtained similarity from big to small, selecting N sentences corresponding to N similarity degrees with the top rank from the m sentences, and forming a new article according to the sequence of the N sentences in the article, wherein m represents the total number of sentences in the article, and N is an integer between 10 and 20;
(5) and (3) acquiring a text abstract data set, and executing the steps (1) to (4) on each article in the text abstract data set to obtain new texts, wherein all the new texts form the new data set.
Preferably, step (1) comprises the sub-steps of:
(1-1) inputting each word in the title into a word embedding layer of a title-level encoder, inputting an output result of the word embedding layer into a position coding layer of the title-level encoder as a first word vector to obtain a position coding vector of each word, and adding the position coding vector of each word and the first word vector to obtain a second word vector which corresponds to each word in the title and contains position information;
wherein, the position coding vector of the word is calculated by sine and cosine coding:
Figure BDA0002319952060000032
Figure BDA0002319952060000033
where pos denotes the position of the word, dmodelI represents a dimension number in the position-coding vector of the word, and the value of i is from 0 to (d)model-1)。
The second word vector is calculated by the following formula:
Figure BDA0002319952060000034
wherein EwordRepresenting a first word vector, xjRepresenting the jth word in title x,
Figure BDA0002319952060000035
a second word vector representing the jth word in title x, j having a value ranging from 0 to the length of the title.
(1-2) inputting the second word vector obtained in the step (1-1) into a multi-head self-attention layer of a header-level encoder to obtain a self-attention layer output result;
(1-3) embedding the position of the output result of the attention layer obtained in the step (1-2) which is input into a header-level encoder into a network layer to obtain a semantic vector of the header;
preferably, the step (1-2) is embodied in such a way that firstly, the second word vector obtained in the step (1-1) is taken as the question Q, the key K and the value V, then Q, K and V are linearly mapped and the dimension d is the dimensionmodelCut them into nheadPortions of each portion being divided into knotsAll fruits include problem QaKey KaAnd the value VaDimension of each segmentation result is dkAnd has dmodel=nhead×dkWherein n isheadRepresenting the number of heads of a multi-head self-attention layer;
then, taking each segmentation result as the input of a corresponding head in the multi-head self-attention layer, and calculating the self-attention output result of each head:
Figure BDA0002319952060000041
wherein the value range of a is 1 to the number of the head of the multi-head self-attention layer, softmax is an activation function, and the calculation formula is as follows:
Figure BDA0002319952060000042
Softi′is the i' th output value of the softmax activation function,
Figure BDA0002319952060000043
is the element of the i 'th dimension of the input, and the value range of j' is 0 to (d)model-1)。
Finally, all n are put togetherheadAnd splicing the self-attention output results of the individual heads to obtain the self-attention layer output result.
Preferably, the location embedding network layer comprises a first convolutional layer, a second convolutional layer and a Relu activation function which are connected in sequence;
wherein the input matrix of the first convolutional layer has a size of dmodel*lenq,lenqIndicates the length of the header and the convolution kernel size is dmodel2048 × 1, step size 1, and output matrix size 2048 × lenq
The input matrix size of the second convolutional layer is 2048 × lenqConvolution kernel size 2048 × d model1, step size 1, output matrix size dmodel*lenq
Calculation formula of Relu activation function:
Relu(x″)=max(0,x″)
the final output result of the position embedding network layer is:
FFN(x′)=conv2(Relu(conv1(x′))
where x' represents the output result from the attention layer, conv1 represents the first convolutional layer, conv2 represents the first convolutional layer. FFN (x') is a semantic vector of a title.
Preferably, step (2) comprises the sub-steps of:
(2-1) inputting each word of a sentence in the article into a word embedding layer of a sentence-level encoder, inputting an output result of the word embedding layer into a position encoding layer as a first word vector to obtain a position encoding vector of each word, and adding the position encoding vector of each word and the first word vector to obtain a second word vector which corresponds to each word in the sentence and contains position information;
(2-2) inputting the second word vector obtained in the step (2-1) into a multi-head self-attention layer of a sentence-level encoder to obtain a self-attention layer output result;
(2-3) inputting the semantic vector of the title obtained in the step (1) and the self-attention layer output result obtained in the step (2-2) into another multi-head self-attention layer of the sentence-level encoder together to obtain a semantic vector corresponding to the sentence and the title;
and (2-4) inputting the semantic vector corresponding to the sentence and the title obtained in the step (2-3) into a position of a sentence-level encoder to be embedded into a network layer so as to obtain a semantic vector corresponding to the sentence and containing title information.
Preferably, the training process of the abstract generation model is as follows:
(6) acquiring a sample from the new data set generated in the step (5), wherein the sample comprises an article X and a abstract Y of the article X, and inputting the article X in the sample into a abstract level encoder of a text abstract generation model to obtain an article semantic vector containing full-text information of the article X;
(7) inputting 0 th to Y-1 th words in a abstract Y of an article X (into a decoder of an abstract generation model to generate Y-1 abstract words, wherein Y represents the total number of words in the abstract;
(8) repeating the steps (6) and (7) on the new data set obtained in the step (5) to train the abstract generating model until the abstract generating model converges, thereby obtaining the trained abstract generating model.
Specifically, the condition for the digest creation model to converge is that loss cannot be made smaller, or the number of iterations reaches a set upper limit value 800000.
Preferably, step (6) comprises the sub-steps of:
(6-1) inputting each word in the article X into a word embedding layer of a abstract-level encoder, inputting an output result of the word embedding layer into a position coding layer of the abstract-level encoder as a first word vector to obtain a position coding vector of each word, and adding the position coding vector of each word and the first word vector to obtain a second word vector which corresponds to each word in the article X and contains position information;
(6-2) inputting the second word vector obtained in the step (6-1) into a multi-head self-attention layer to obtain a multi-head self-attention layer output result; and then the multi-head output result from the attention layer is input to the position of the abstract-level encoder to be embedded into a network layer so as to obtain an article semantic vector.
Preferably, step (7) comprises the sub-steps of:
(7-1) inputting the first Y-1 words in the abstract Y into a word embedding layer of a decoder, inputting the output result of the word embedding layer as a first word vector into a position coding layer to obtain position coding vectors of all the words, and adding the position coding vectors of all the words and the first word vector to obtain second word vectors which correspond to all the words in the abstract and contain position information;
(7-2) inputting second word vectors which correspond to all the words in the abstract obtained in the step (7-1) and contain position information into the multi-head self-attention layer to obtain an output result of the multi-head self-attention layer;
(7-3) processing the multi-head self-attention layer output result obtained in the step (7-2) by using a mask mechanism to obtain a processed multi-head self-attention layer output result;
wherein the mask matrix mask is a matrix with a size of (Y-1) × (Y-1), wherein Y is the total number of words in the summary Y and has:
Figure BDA0002319952060000071
and (7-4) inputting the article semantic vector obtained in the step (6) and the multi-head output result from the attention layer processed in the step (7-3) into a time penalty attention layer of a decoder together to obtain a context matrix containing article information and generated abstract words.
(7-5) inputting the context matrix obtained in the step (7-4) into a position feedforward network of a digest-level encoder to obtain a plurality of decoded words, mapping the decoded words onto a vocabulary through a full connection layer of a decoder to obtain the probability distribution of each decoded word in the vocabulary, and obtaining the probability that the decoded word is a real digest word according to the probability distribution;
specifically, this step first calculates the probability distribution of each decoded word in the vocabulary:
Pvocab=WV(FFN(C))
wherein WVIs the weight of the full connection layer;
then, according to the probability distribution PvocabCalculating the probability that the decoded word is a real abstract word
Figure BDA0002319952060000072
Figure BDA0002319952060000073
Wherein
Figure BDA0002319952060000074
Representing the real abstract word corresponding to the t-th decoding word;
(7-6) calculating a loss value according to the probability that the decoded word obtained in the step (7-5) is a true abstract word:
Figure BDA0002319952060000075
wherein T represents the total number of the decoded words obtained in the step (7-5), and the value of T is y-1.
Preferably, in the step (7-4), when the t-th abstract word is generated, the calculation manner of the attention distribution of the article by the abstract word is as follows:
first, the attention distribution for the article is calculated:
Figure BDA0002319952060000081
wherein
Figure BDA0002319952060000082
Wherein muloutput[t]Represents the t-th row element, enc, of the lower triangular matrix obtained in step (7-3)ouputRepresenting the semantic vector of the article obtained in the step (6), T representing transposition operation, Wh、WeAnd VvAre all weights of the linear mapping operation, tan is the activation function, and has:
Figure BDA0002319952060000083
and (4) finally, multiplying the attention distribution by the article semantic vector obtained in the step (6) to obtain a context matrix containing article information and generated abstract words.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
(1) because the sentence sequencing model and the abstract generating model do not need to be circulated, all words in the sequence are processed in parallel, and the context and the far words are combined by using the self-attention mechanism (namely using a multi-head self-attention layer), the training difficulty of the model is lower than that of RNN, and the training speed is much higher than that of RNN;
(2) the sentence sequencing model is characterized in that a title encoder is used for encoding titles of articles, sentences in the articles are encoded by the sentence encoder, so that semantic vectors containing title information and sentence information are obtained, then the similarity between the titles and the sentences is calculated according to the semantic vectors, and N sentences with the highest similarity between the titles are taken as the input of an abstract generation model, so that the technical problem that the article information is lost due to the fact that a part of the articles are directly cut off in the prior art can be solved;
(3) when the attention distribution between the decoder and the encoder is calculated, the invention provides a time-based attention mechanism through the step (7-4), and the condition that the generated abstract contains a plurality of repeated words can be relieved.
Drawings
FIG. 1 is an architectural diagram of a sentence ordering model used by the present invention.
FIG. 2 is an architectural diagram of a summary generation model used by the present invention.
FIG. 3 is a flow chart of a method for generating a summary based on an attention mechanism of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The input lengths of both the encoder and decoder that generate the digest generation model are set to fixed values when the input lengths are set to be excessively long. The model will be difficult to train and the model accuracy will also decrease. A common way to solve this problem is to truncate a part longer than a set maximum length, which may result in a part of the complete information of the text being truncated, and thus also in a failure of the model to completely summarize the text information. Therefore, it is meaningful to research the sentence ordering method of the article, and in this respect, for the article with the title, the semantic similarity between each sentence in the article and the title is calculated by the invention. And (4) summarizing the whole article by taking the N sentences with the highest similarity. For articles without titles, N sentences in the article are extracted as input of the generation model by using a Textrank unsupervised method.
As shown in fig. 3, the present invention provides a method for generating a summary based on an attention mechanism, comprising the following steps:
firstly, acquiring an article from the Internet, and inputting the article into a trained sentence sequencing model (shown in figure 1) to acquire a simplified article;
and secondly, inputting the simplified article obtained in the step one into a trained abstract generation model (as shown in figure 2) to obtain an abstract of the article.
The training process in the sentence sequencing model is as follows:
(1) acquiring a title of an article, and inputting each word in the title into a title-level encoder of a sentence sequencing model to obtain a semantic vector of the title;
the method comprises the following substeps:
(1-1) inputting each word in the title into a word embedding layer of a title-level encoder, inputting an output result of the word embedding layer into a position coding layer of the title-level encoder as a first word vector to obtain a position coding vector of each word, and adding the position coding vector of each word and the first word vector to obtain a second word vector which corresponds to each word in the title and contains position information;
specifically, the word embedding layer acquires a first word vector of each word in the title according to a pre-established word vector table; the word vector table is obtained by training word vectors using a wikipedia corpus.
The position encoding layer is to add relative position information of words (tokens) at the time of input in order to enable the title level encoder to utilize the order information of the respective words in the title,
the position-coding vector of a word is calculated by sine-cosine coding:
Figure BDA0002319952060000101
Figure BDA0002319952060000102
where pos denotes the position of the word, dmodelI represents a dimension number in the position-coding vector of the word, and the value of i is from 0 to (d)model-1)。
The second word vector is calculated by the following formula:
Figure BDA0002319952060000103
wherein EwordRepresenting a first word vector, xjRepresenting the jth word in title x,
Figure BDA0002319952060000104
a second word vector representing the jth word in title x, j having a value ranging from 0 to the length of the title.
(1-2) inputting the second word vector obtained in the step (1-1) into a Multi-head self-attention layer (Multi-head self-attention layer) of a title level encoder to obtain a self-attention layer output result;
the step is specifically that firstly, the second word vector obtained in the step (1-1) is used as a question (query, hereinafter referred to as Q), a key (key, hereinafter referred to as K) and a value (value, hereinafter referred to as V); q, K, and V are then linearly mapped and in dimension dmodelCut them into nheadPart (wherein n)headRepresenting the number of heads in a multi-headed self-attention horizon), each segmentation result includes a question QaKey KaAnd the value VaDimension of each segmentation result is dkAnd has dmodel=nhead×dk
Then, taking each segmentation result as the input of a corresponding head in the multi-head self-attention layer, and calculating the self-attention output result of each head:
Figure BDA0002319952060000111
wherein the value range of a is 1 to the number of the head of the multi-head self-attention layer, softmax is an activation function, and the calculation formula is as follows:
Figure BDA0002319952060000112
Softi′is the i' th output value of the softmax activation function,
Figure BDA0002319952060000113
is the element of the i 'th dimension of the input, and the value range of j' is 0 to (d)model-1)。
Finally, all n are put togetherheadAnd splicing the self-attention output results of the individual heads to obtain the self-attention layer output result.
(1-3) embedding the position of the output result of the attention layer obtained in the step (1-2) which is input into a header-level encoder into a network layer to obtain a semantic vector of the header;
specifically, the location-embedded network layer comprises a first convolutional layer, a second convolutional layer and a Relu activation function which are connected in sequence;
wherein the input matrix of the first convolutional layer has a size of dmodel*lenq,lenqIndicates the length of the header and the convolution kernel size is dmodel2048 × 1, step size 1, and output matrix size 2048 × lenq
The input matrix size of the second convolutional layer is 2048 × lenqConvolution kernel size 2048 × d model1, step size 1, output matrix size dmodel*lenq
Calculation formula of Relu activation function:
Relu(x″)=max(0,x″)
the final output in this step is:
FFN(x′)=conv2(Relu(conv1(x′))
where x' represents the output result from the attention layer, conv1 represents the first convolutional layer, conv2 represents the first convolutional layer. FFN (x') is a semantic vector of a title.
(2) Obtaining sentences in the article, inputting each word in the sentences into a sentence-level encoder of a sentence sequencing model to obtain semantic vectors which correspond to the sentences and contain title information;
the method comprises the following substeps:
(2-1) inputting each word of a sentence in the article into a word embedding layer of a sentence-level encoder, inputting an output result of the word embedding layer into a position encoding layer as a first word vector to obtain a position encoding vector of each word, and adding the position encoding vector of each word and the first word vector to obtain a second word vector which corresponds to each word in the sentence and contains position information;
specifically, the word embedding layer and the position coding layer in this step are completely the same as those in the step (1-1), and are not described herein again;
(2-2) inputting the second word vector obtained in the step (2-1) into a Multi-head self-attention layer (Multi-head self-attention layer) of a sentence-level encoder to obtain a self-attention layer output result;
specifically, the multi-head self-attention layer in this step is completely the same as the multi-head self-attention layer in step (1-2), and the problem, key and value are the second word vector output in step (2-1), which is not described herein again;
(2-3) inputting the semantic vector of the title obtained in the step (1) and the self-attention layer output result obtained in the step (2-2) into another multi-head self-attention layer of the sentence-level encoder together to obtain a semantic vector corresponding to the sentence and the title;
specifically, in this step, the semantic vector of the title obtained in step (1) is first used as the key sum value, the self-attention layer output result obtained in step (2-2) is used as the question, and then the question, the key sum value are linearly mapped and d 'is measured in the dimension thereof'modelTo cut them into n'headEach partition result includes a problem Qa′Key Ka′And the value Va′Dimension of each segmentation result is d'kAnd is of d'model=n′head×d′k
Then, taking each segmentation result as the input of a corresponding head in the multi-head self-attention layer, and calculating the self-attention output result of each head:
Figure BDA0002319952060000131
wherein the value of a' ranges from 1 to the number of the heads of the multi-head self-attention layer, and softmax is an activation function, which is the same as the activation function of the step (1-2) above.
Finally, n 'is'headThe self-attention output results of the individual heads are concatenated to obtain semantic vectors corresponding to the sentence and the title.
(2-4) inputting the semantic vector corresponding to the sentence and the title obtained in the step (2-3) into a position of a sentence-level encoder to be embedded into a network layer so as to obtain a semantic vector corresponding to the sentence and containing title information;
specifically, the position embedded network layer in this step is completely the same as the position embedded network layer in the above step (1-3), and is not described herein again;
(3) and (3) calculating the similarity between the title and the sentence according to the semantic vector which is obtained in the step (2), corresponds to the sentence and contains the title information.
The step is specifically, firstly, performing max-pooling (max-pooling) operation on the semantic vector which is obtained in the step (2) and corresponds to the sentence and contains the title information to obtain a final representation of the sentence, and processing the final representation by using linear mapping and sigmoid activation function to obtain similarity between the title and the sentence.
The calculation formula of the similarity between the title and the sentence is as follows:
Figure BDA0002319952060000132
where s represents the similarity between the title and the sentence, n represents the total number of words in the sentence, w1 represents the weight of the linear mapping, axpooling represents the maximum pooling operation, which means that the maximum value is selected as the result, the sigmoid activation function is to transform the continuous value of the input into the output between 0 and 1,
Figure BDA0002319952060000141
and (4) representing the semantic vector which corresponds to the sentence and contains the title information and is obtained in the step (2-4).
(4) Repeating the steps (2) and (3) for m times (wherein m represents the total number of sentences in the article), thereby obtaining the similarity between the title and each sentence in the article, sequencing all m sentences in the article from large to small according to the obtained similarity, selecting N sentences corresponding to N similarity (wherein N is an integer between 10 and 20, preferably 15) ranked at the top from the m sentences, and forming a new article according to the sequence of the new article in the article;
(5) and (3) acquiring a text abstract data set, and executing the steps (1) to (4) on each article in the text abstract data set to obtain new texts, wherein all the new texts form the new data set.
The training process of the abstract generation model is as follows:
(6) acquiring a sample from the new data set generated in the step (5), wherein the sample comprises an article X and a abstract Y of the article X, and inputting the article X in the sample into a abstract level encoder of a text abstract generation model to obtain an article semantic vector containing full-text information of the article X;
the method comprises the following substeps:
(6-1) inputting each word in the article X into a word embedding layer of a abstract-level encoder, inputting an output result of the word embedding layer into a position coding layer of the abstract-level encoder as a first word vector to obtain a position coding vector of each word, and adding the position coding vector of each word and the first word vector to obtain a second word vector which corresponds to each word in the article X and contains position information;
specifically, the word embedding layer and the position coding layer in this step are completely the same as those in the step (1-1), and are not described herein again;
(6-2) inputting the second word vector obtained in the step (6-1) into a Multi-head self-attention layer (Multi-head self-attention layer) to obtain a Multi-head self-attention layer output result; and then the multi-head output result from the attention layer is input to the position of the abstract-level encoder to be embedded into a network layer so as to obtain an article semantic vector.
Specifically, the multi-head self-attention layer in this step is completely the same as the multi-head self-attention layer in step (1-2), the problem, key and value are the second word vector output in step (6-1), and the position embedding network layer is completely the same as the position embedding network layer in step (1-3), and is not described herein again;
(7) the 0 th to Y-1 th words in the abstract Y of the article X (where Y represents the total number of words in the abstract) are input into a decoder of the abstract generating model to generate Y-1 abstract words.
The method comprises the following steps:
(7-1) inputting the first Y-1 words in the abstract Y into a word embedding layer of a decoder, inputting the output result of the word embedding layer as a first word vector into a position coding layer to obtain position coding vectors of all the words, and adding the position coding vectors of all the words and the first word vector to obtain second word vectors which correspond to all the words in the abstract and contain position information;
specifically, the word embedding layer and the position encoding layer in this step are completely the same as those in the step (1-1), and are not described herein again;
(7-2) inputting a second word vector which corresponds to each word in the abstract obtained in the step (7-1) and contains position information into a Multi-head self-attention layer (Multi-head self-attention layer) to obtain a Multi-head self-attention layer output result;
specifically, the question, the key, and the value of the multi-head attention layer are all second word vectors corresponding to the words in the summary obtained in step (7-1) and containing position information, and the structure of the second word vectors is completely the same as that of the multi-head attention layer in step (1-2), and will not be described again here.
(7-3) processing the multi-head self-attention layer output result obtained in the step (7-2) by using a Mask (Mask) mechanism to obtain a processed multi-head self-attention layer output result;
specifically, since a abstract word is generated, which is only related to the article and the abstract word that has been generated before, but not related to the abstract word to be generated after the abstract word, the first y-1 words of the abstract are used as input in step (7-1), and the multi-head self-attention layer output result of step (7-2) contains information of all abstract words, it is necessary to hide the result of the abstract word after the abstract word to be generated currently by applying a mask mechanism to the multi-head self-attention layer output result of step (7-2).
Specifically, this step is to multiply the multi-headed self-attention layer output result of step (7-2) by the following mask matrix, thereby obtaining a lower triangular matrix indicating that each word in the abstract will focus only on the abstract word generated before it.
The mask matrix mask is:
Figure BDA0002319952060000161
it can be seen that the mask matrix is a matrix of size (Y-1) × (Y-1), where Y is the total number of words in the summary Y.
And (7-4) inputting the article semantic vector obtained in the step (6) and the multi-head self-attention layer output result processed in the step (7-3) into a time-penalty attention layer (time-penalty attention layer) of a decoder together to obtain a context matrix containing article information and generated abstract words.
Specifically, when the t-th abstract word is generated, the attention distribution of the abstract word to the article is calculated in the following manner:
Figure BDA0002319952060000162
wherein muloutput[t]To representThe t row element, enc, of the lower triangular matrix obtained in the step (7-3)ouputRepresenting the semantic vector of the article obtained in the step (6), T representing transposition operation, Wh、WeAnd VvAre the weights of the linear mapping operation (these weights are initialized at training time, are not fixed, and are trained to give the best results). tan is an activation function, which is calculated by the formula:
Figure BDA0002319952060000163
in order to prevent the article information concerned about when the t-th abstract word is generated from being similar to the article information concerned about by the abstract word generated in the previous step, a penalty mechanism is adopted to penalty the words with higher attention obtained in the previous step, and the attention distribution calculation mode for the article in the penalty mechanism is as follows:
Figure BDA0002319952060000171
and (4) multiplying the attention distribution by the article semantic vector obtained in the step (6) to obtain a context matrix C containing article information and generated abstract words.
(7-5) inputting the context matrix obtained in the step (7-4) into a position feedforward network of a digest-level encoder to obtain a plurality of decoded words, mapping the decoded words onto a vocabulary through a full connection layer of a decoder to obtain the probability distribution of each decoded word in the vocabulary, and obtaining the probability that the decoded word is a real digest word according to the probability distribution;
specifically, this step first calculates the probability distribution of each decoded word in the vocabulary:
Pvocab=WV(FFN(C))
wherein WVIs the weight of the full connection layer;
then, according to the probability distribution PvocabCalculating the probability that the decoded word is a real abstract word
Figure BDA0002319952060000172
Figure BDA0002319952060000173
Wherein
Figure BDA0002319952060000174
And representing the true abstract word corresponding to the t-th decoding word.
(7-6) calculating a loss value according to the probability that the decoded word obtained in the step (7-5) is a true abstract word:
Figure BDA0002319952060000175
wherein T represents the total number of the decoded words obtained in the step (7-5), and the value of T is y-1.
(8) Repeating the steps (6) and (7) on the new data set obtained in the step (5) to train the abstract generating model until the abstract generating model converges, thereby obtaining a trained abstract generating model;
specifically, the condition for the digest creation model to converge is that loss cannot be made smaller, or the number of iterations reaches a set upper limit value 800000.
In contrast to RNN-based methods, the present invention does not require loops, but rather processes all words or symbols in a sequence in parallel, while using the self-attention mechanism to combine context with more distant words; by processing all words in parallel and letting each word notice other words in the sentence in multiple processing steps, the training speed of the present invention is much faster than RNN, and the training results applied to the machine translation task are much better than with RNN.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for generating an abstract based on an attention mechanism is characterized by comprising the following steps:
s1, acquiring an article from the Internet, and inputting the article into a trained sentence sequencing model to acquire a simplified article;
and S2, inputting the simplified article obtained in the step S1 into a trained abstract generation model to obtain an abstract of the article.
2. The method of claim 1, wherein the training process in the sentence ordering model is as follows:
(1) acquiring a title of an article, and inputting each word in the title into a title-level encoder of a sentence sequencing model to obtain a semantic vector of the title;
(2) obtaining sentences in the article, inputting each word in the sentences into a sentence-level encoder of a sentence sequencing model to obtain semantic vectors which correspond to the sentences and contain title information;
(3) and (3) calculating the similarity between the title and the sentence according to the semantic vector which is obtained in the step (2), corresponds to the sentence and contains the title information.
The specific step is that firstly, the maximum pooling operation is carried out on the semantic vector which is obtained in the step (2), corresponds to the sentence and contains the title information so as to obtain the final representation of the sentence, and the final representation is processed by using linear mapping and sigmoid activation function so as to obtain the similarity between the title and the sentence.
The calculation formula of the similarity between the title and the sentence is as follows:
Figure FDA0002319952050000011
where s represents the similarity between the title and the sentence, n represents the total number of words in the sentence, w1 represents the weight of the linear mapping, axpooling represents the maximum pooling operation, which means that the maximum value is selected as the result, the sigmoid activation function is to transform the continuous value of the input into the output between 0 and 1,
Figure FDA0002319952050000012
and (4) representing the semantic vector which corresponds to the sentence and contains the title information and is obtained in the step (2-4).
(4) Repeating the steps (2) and (3) for m times to obtain the similarity between the title and each sentence in the article, sequencing all m sentences in the article according to the obtained similarity from big to small, selecting N sentences corresponding to N similarity degrees with the top rank from the m sentences, and forming a new article according to the sequence of the N sentences in the article, wherein m represents the total number of sentences in the article, and N is an integer between 10 and 20;
(5) and (3) acquiring a text abstract data set, and executing the steps (1) to (4) on each article in the text abstract data set to obtain new texts, wherein all the new texts form the new data set.
3. The method according to claim 1, wherein step (1) comprises the sub-steps of:
(1-1) inputting each word in the title into a word embedding layer of a title-level encoder, inputting an output result of the word embedding layer into a position coding layer of the title-level encoder as a first word vector to obtain a position coding vector of each word, and adding the position coding vector of each word and the first word vector to obtain a second word vector which corresponds to each word in the title and contains position information;
wherein, the position coding vector of the word is calculated by sine and cosine coding:
Figure FDA0002319952050000021
Figure FDA0002319952050000022
where pos denotes the position of the word, dmodelI represents a dimension number in the position-coding vector of the word, and the value of i is from 0 to (d)model-1)。
The second word vector is calculated by the following formula:
Figure FDA0002319952050000023
wherein EwordRepresenting a first word vector, xjRepresenting the jth word in title x,
Figure FDA0002319952050000024
a second word vector representing the jth word in title x, j having a value ranging from 0 to the length of the title.
(1-2) inputting the second word vector obtained in the step (1-1) into a multi-head self-attention layer of a header-level encoder to obtain a self-attention layer output result;
and (1-3) embedding the position of the output result of the self-attention layer obtained in the step (1-2) input into the header-level encoder into a network layer to obtain a semantic vector of the header.
4. The method according to claim 3, wherein the step (1-2) is implemented by first taking the second word vector obtained in the step (1-1) as the question Q, the key K, and the value V, then linearly mapping Q, K and V, and performing the dimension dmodelCut them into nheadEach partition result includes a problem QaKey KaAnd the value VaDimension of each segmentation result is dkAnd has dmodel=nhead×dkWherein n isheadRepresenting the number of heads of a multi-head self-attention layer;
then, taking each segmentation result as the input of a corresponding head in the multi-head self-attention layer, and calculating the self-attention output result of each head:
Figure FDA0002319952050000031
wherein the value range of a is 1 to the number of the head of the multi-head self-attention layer, softmax is an activation function, and the calculation formula is as follows:
Figure FDA0002319952050000032
Softi′is the i' th output value of the softmax activation function,
Figure FDA0002319952050000033
is the element of the i 'th dimension of the input, and the value range of j' is 0 to (d)model-1)。
Finally, all n are put togetherheadAnd splicing the self-attention output results of the individual heads to obtain the self-attention layer output result.
5. The method of claim 4,
the position embedded network layer comprises a first convolution layer, a second convolution layer and a Relu activation function which are connected in sequence;
wherein the input matrix of the first convolutional layer has a size of dmodel*lenq,lenqIndicates the length of the header and the convolution kernel size is dmodel2048 × 1, step size 1, and output matrix size 2048 × lenq
The input matrix size of the second convolutional layer is 2048 × lenqConvolution kernel size 2048 × dmodel1, step size 1, output matrix size dmodel*lenq
Calculation formula of Relu activation function:
Relu(x″)=max(0,x″)
the final output result of the position embedding network layer is:
FFN(x′)=conv2(Relu(conv1(x′))
where x' represents the output result from the attention layer, conv1 represents the first convolutional layer, conv2 represents the first convolutional layer. FFN (x') is a semantic vector of a title.
6. The method according to claim 5, wherein step (2) comprises the sub-steps of:
(2-1) inputting each word of a sentence in the article into a word embedding layer of a sentence-level encoder, inputting an output result of the word embedding layer into a position encoding layer as a first word vector to obtain a position encoding vector of each word, and adding the position encoding vector of each word and the first word vector to obtain a second word vector which corresponds to each word in the sentence and contains position information;
(2-2) inputting the second word vector obtained in the step (2-1) into a multi-head self-attention layer of a sentence-level encoder to obtain a self-attention layer output result;
(2-3) inputting the semantic vector of the title obtained in the step (1) and the self-attention layer output result obtained in the step (2-2) into another multi-head self-attention layer of the sentence-level encoder together to obtain a semantic vector corresponding to the sentence and the title;
and (2-4) inputting the semantic vector corresponding to the sentence and the title obtained in the step (2-3) into a position of a sentence-level encoder to be embedded into a network layer so as to obtain a semantic vector corresponding to the sentence and containing title information.
7. The method of claim 6, wherein the abstract generation model is trained as follows:
(6) acquiring a sample from the new data set generated in the step (5), wherein the sample comprises an article X and a abstract Y of the article X, and inputting the article X in the sample into a abstract level encoder of a text abstract generation model to obtain an article semantic vector containing full-text information of the article X;
(7) inputting 0 th to Y-1 th words in a abstract Y of an article X (into a decoder of an abstract generation model to generate Y-1 abstract words, wherein Y represents the total number of words in the abstract;
(8) repeating the steps (6) and (7) on the new data set obtained in the step (5) to train the abstract generating model until the abstract generating model converges, thereby obtaining the trained abstract generating model.
Specifically, the condition for the digest creation model to converge is that loss cannot be made smaller, or the number of iterations reaches a set upper limit value 800000.
8. The method according to claim 7, characterized in that step (6) comprises the following sub-steps:
(6-1) inputting each word in the article X into a word embedding layer of a abstract-level encoder, inputting an output result of the word embedding layer into a position coding layer of the abstract-level encoder as a first word vector to obtain a position coding vector of each word, and adding the position coding vector of each word and the first word vector to obtain a second word vector which corresponds to each word in the article X and contains position information;
(6-2) inputting the second word vector obtained in the step (6-1) into a multi-head self-attention layer to obtain a multi-head self-attention layer output result; and then the multi-head output result from the attention layer is input to the position of the abstract-level encoder to be embedded into a network layer so as to obtain an article semantic vector.
9. The method according to claim 8, characterized in that step (7) comprises the following sub-steps:
(7-1) inputting the first Y-1 words in the abstract Y into a word embedding layer of a decoder, inputting the output result of the word embedding layer as a first word vector into a position coding layer to obtain position coding vectors of all the words, and adding the position coding vectors of all the words and the first word vector to obtain second word vectors which correspond to all the words in the abstract and contain position information;
(7-2) inputting second word vectors which correspond to all the words in the abstract obtained in the step (7-1) and contain position information into the multi-head self-attention layer to obtain an output result of the multi-head self-attention layer;
(7-3) processing the multi-head self-attention layer output result obtained in the step (7-2) by using a mask mechanism to obtain a processed multi-head self-attention layer output result;
wherein the mask matrix mask is a matrix with a size of (Y-1) × (Y-1), wherein Y is the total number of words in the summary Y and has:
Figure FDA0002319952050000061
and (7-4) inputting the article semantic vector obtained in the step (6) and the multi-head output result from the attention layer processed in the step (7-3) into a time penalty attention layer of a decoder together to obtain a context matrix containing article information and generated abstract words.
(7-5) inputting the context matrix obtained in the step (7-4) into a position feedforward network of a digest-level encoder to obtain a plurality of decoded words, mapping the decoded words onto a vocabulary through a full connection layer of a decoder to obtain the probability distribution of each decoded word in the vocabulary, and obtaining the probability that the decoded word is a real digest word according to the probability distribution;
specifically, this step first calculates the probability distribution of each decoded word in the vocabulary:
Pvocab=WV(FFN(C))
wherein WVIs the weight of the full connection layer;
then, according to the probability distribution PvocabCalculating the probability that the decoded word is a real abstract word
Figure FDA0002319952050000062
Figure FDA0002319952050000063
Wherein
Figure FDA0002319952050000064
Representing the real abstract word corresponding to the t-th decoding word;
(7-6) calculating a loss value according to the probability that the decoded word obtained in the step (7-5) is a true abstract word:
Figure FDA0002319952050000065
wherein T represents the total number of the decoded words obtained in the step (7-5), and the value of T is y-1.
10. The method according to claim 9, wherein the step (7-4) is specifically that, when the t-th abstract word is generated, the attention distribution of the article by the abstract word is calculated in a manner that:
first, the attention distribution for the article is calculated:
Figure FDA0002319952050000071
wherein
Figure FDA0002319952050000072
Wherein muloutput[t]Represents the t-th row element, enc, of the lower triangular matrix obtained in step (7-3)ouputRepresenting the semantic vector of the article obtained in the step (6), T representing transposition operation, Wh、WeAnd VvAre all weights of the linear mapping operation, tan is the activation function, and has:
Figure FDA0002319952050000073
and (4) finally, multiplying the attention distribution by the article semantic vector obtained in the step (6) to obtain a context matrix containing article information and generated abstract words.
CN201911293797.8A 2019-12-16 2019-12-16 Method for generating abstract based on attention mechanism Active CN111061862B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911293797.8A CN111061862B (en) 2019-12-16 2019-12-16 Method for generating abstract based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911293797.8A CN111061862B (en) 2019-12-16 2019-12-16 Method for generating abstract based on attention mechanism

Publications (2)

Publication Number Publication Date
CN111061862A true CN111061862A (en) 2020-04-24
CN111061862B CN111061862B (en) 2020-12-15

Family

ID=70301924

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911293797.8A Active CN111061862B (en) 2019-12-16 2019-12-16 Method for generating abstract based on attention mechanism

Country Status (1)

Country Link
CN (1) CN111061862B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111831814A (en) * 2020-06-04 2020-10-27 北京百度网讯科技有限公司 Pre-training method and device of abstract generation model, electronic equipment and storage medium
CN111966820A (en) * 2020-07-21 2020-11-20 西北工业大学 Method and system for constructing and extracting generative abstract model
CN112417865A (en) * 2020-12-02 2021-02-26 中山大学 Abstract extraction method and system based on dynamic fusion of articles and titles
CN112668305A (en) * 2020-12-03 2021-04-16 华中科技大学 Paper quote amount prediction method and system based on attention mechanism
CN112749253A (en) * 2020-12-28 2021-05-04 湖南大学 Multi-text abstract generation method based on text relation graph
CN113221967A (en) * 2021-04-23 2021-08-06 中国农业大学 Feature extraction method, feature extraction device, electronic equipment and storage medium
CN113537459A (en) * 2021-06-28 2021-10-22 淮阴工学院 Method for predicting humiture of drug storage room
CN113824624A (en) * 2020-06-19 2021-12-21 阿里巴巴集团控股有限公司 Training method of mail title generation model and mail title generation method
CN114169312A (en) * 2021-12-08 2022-03-11 湘潭大学 Two-stage hybrid automatic summarization method for judicial official documents
JP2022115160A (en) * 2021-01-28 2022-08-09 ヤフー株式会社 Information processing device, information processing system, information processing method, and program
CN114997143A (en) * 2022-08-04 2022-09-02 北京澜舟科技有限公司 Text generation model training method and system, text generation method and storage medium
WO2022228127A1 (en) * 2021-04-29 2022-11-03 京东科技控股股份有限公司 Element text processing method and apparatus, electronic device, and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9535899B2 (en) * 2013-02-20 2017-01-03 International Business Machines Corporation Automatic semantic rating and abstraction of literature
CN108319668A (en) * 2018-01-23 2018-07-24 义语智能科技(上海)有限公司 Generate the method and apparatus of text snippet
CN110472238A (en) * 2019-07-25 2019-11-19 昆明理工大学 Text snippet method based on level interaction attention

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9535899B2 (en) * 2013-02-20 2017-01-03 International Business Machines Corporation Automatic semantic rating and abstraction of literature
CN108319668A (en) * 2018-01-23 2018-07-24 义语智能科技(上海)有限公司 Generate the method and apparatus of text snippet
CN110472238A (en) * 2019-07-25 2019-11-19 昆明理工大学 Text snippet method based on level interaction attention

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
VASWANI A等: "Attention Is All You Need", 《31ST CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NIPS 2017)》 *
郭洪杰: "基于深度学习的生成式自动摘要技术研究", 《万方数据》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111831814B (en) * 2020-06-04 2023-06-23 北京百度网讯科技有限公司 Pre-training method and device for abstract generation model, electronic equipment and storage medium
CN111831814A (en) * 2020-06-04 2020-10-27 北京百度网讯科技有限公司 Pre-training method and device of abstract generation model, electronic equipment and storage medium
CN113824624B (en) * 2020-06-19 2023-10-17 阿里巴巴集团控股有限公司 Training method of mail header generation model and mail header generation method
CN113824624A (en) * 2020-06-19 2021-12-21 阿里巴巴集团控股有限公司 Training method of mail title generation model and mail title generation method
CN111966820A (en) * 2020-07-21 2020-11-20 西北工业大学 Method and system for constructing and extracting generative abstract model
CN112417865A (en) * 2020-12-02 2021-02-26 中山大学 Abstract extraction method and system based on dynamic fusion of articles and titles
CN112668305B (en) * 2020-12-03 2024-02-09 华中科技大学 Attention mechanism-based thesis reference quantity prediction method and system
CN112668305A (en) * 2020-12-03 2021-04-16 华中科技大学 Paper quote amount prediction method and system based on attention mechanism
CN112749253B (en) * 2020-12-28 2022-04-05 湖南大学 Multi-text abstract generation method based on text relation graph
CN112749253A (en) * 2020-12-28 2021-05-04 湖南大学 Multi-text abstract generation method based on text relation graph
JP2022115160A (en) * 2021-01-28 2022-08-09 ヤフー株式会社 Information processing device, information processing system, information processing method, and program
JP7287992B2 (en) 2021-01-28 2023-06-06 ヤフー株式会社 Information processing device, information processing system, information processing method, and program
CN113221967A (en) * 2021-04-23 2021-08-06 中国农业大学 Feature extraction method, feature extraction device, electronic equipment and storage medium
CN113221967B (en) * 2021-04-23 2023-11-24 中国农业大学 Feature extraction method, device, electronic equipment and storage medium
WO2022228127A1 (en) * 2021-04-29 2022-11-03 京东科技控股股份有限公司 Element text processing method and apparatus, electronic device, and storage medium
CN113537459A (en) * 2021-06-28 2021-10-22 淮阴工学院 Method for predicting humiture of drug storage room
CN113537459B (en) * 2021-06-28 2024-04-26 淮阴工学院 Drug warehouse temperature and humidity prediction method
CN114169312A (en) * 2021-12-08 2022-03-11 湘潭大学 Two-stage hybrid automatic summarization method for judicial official documents
CN114997143A (en) * 2022-08-04 2022-09-02 北京澜舟科技有限公司 Text generation model training method and system, text generation method and storage medium

Also Published As

Publication number Publication date
CN111061862B (en) 2020-12-15

Similar Documents

Publication Publication Date Title
CN111061862B (en) Method for generating abstract based on attention mechanism
Al-Sabahi et al. A hierarchical structured self-attentive model for extractive document summarization (HSSAS)
Tunstall et al. Natural language processing with transformers
CN110413986B (en) Text clustering multi-document automatic summarization method and system for improving word vector model
CN110390103B (en) Automatic short text summarization method and system based on double encoders
CN109214003B (en) The method that Recognition with Recurrent Neural Network based on multilayer attention mechanism generates title
Dashtipour et al. Exploiting deep learning for Persian sentiment analysis
CN107273913B (en) Short text similarity calculation method based on multi-feature fusion
CN110597961B (en) Text category labeling method and device, electronic equipment and storage medium
CN112989834A (en) Named entity identification method and system based on flat grid enhanced linear converter
CN110619043A (en) Automatic text abstract generation method based on dynamic word vector
EP3732592A1 (en) Intelligent routing services and systems
Huang et al. Character-level convolutional network for text classification applied to chinese corpus
Magdum et al. A survey on deep learning-based automatic text summarization models
CN110807326A (en) Short text keyword extraction method combining GPU-DMM and text features
CN113609840B (en) Chinese law judgment abstract generation method and system
CN114281982B (en) Book propaganda abstract generation method and system adopting multi-mode fusion technology
CN116737938A (en) Fine granularity emotion detection method and device based on fine tuning large model online data network
Dong et al. Neural question generation with semantics of question type
Zhuang et al. An ensemble approach to conversation generation
Hsu et al. Prompt-learning for cross-lingual relation extraction
Rabut et al. Multi-class document classification using improved word embeddings
CN113641789B (en) Viewpoint retrieval method and system based on hierarchical fusion multi-head attention network and convolution network
Ramesh et al. Abstractive text summarization using t5 architecture
CN110275957B (en) Name disambiguation method and device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant