CN111723196A - Single document abstract generation model construction method and device based on multi-task learning - Google Patents
Single document abstract generation model construction method and device based on multi-task learning Download PDFInfo
- Publication number
- CN111723196A CN111723196A CN202010435810.5A CN202010435810A CN111723196A CN 111723196 A CN111723196 A CN 111723196A CN 202010435810 A CN202010435810 A CN 202010435810A CN 111723196 A CN111723196 A CN 111723196A
- Authority
- CN
- China
- Prior art keywords
- word
- sentence
- abstract
- text
- single document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a method and a device for constructing a single document abstract generation model based on multi-task learning. Wherein, the two coders are divided into word-level coders and sentence-level coders; multi-task training is made possible; the attention distribution of the decoder is more reasonable, and regular terms calculated according to the attention weight of the decoder and the importance weight of the sentence are added, so that the sentence which is more suitable for being used as the abstract can be distributed with more weights; and model potentials are explored through different tasks, so that the readability and the information quantity of the generated abstract sentences are improved.
Description
Technical Field
The invention relates to a method and a device for constructing a single document abstract generating model, in particular to a method and a device for constructing a single document abstract generating model based on multi-task learning.
Background
With the arrival of the big data era, the quantity of text information which can be obtained by people is increased rapidly, and the vigorous development of the internet also widens the channel for obtaining novel and various text contents. News media, social networks and the like output massive text information everyday in large quantities, which inevitably causes the problem of information overload for people. Furthermore, the amount of text information is limited in the content of novelty, and there is a large amount of redundant content, and people cannot deal with the amount of text information with limited effort or discriminate valuable information from a large amount of text that is of personal interest. Therefore, it is an urgent need to extract concise and clear important information from the text and provide the reader with reference. But the process of writing abstracts for the articles is very tedious and inefficient, which causes waste of labor cost, namely the existing value of the text abstraction task. The method can summarize the information in the original document by efficiently and accurately summarizing the original document through the text automatic summarization technology.
The automatic text summarization technique is distinguished according to the type of a source document as input, and can be divided into a single-document text summary and a multi-document text summary. The automatic text summarization technique is distinguished according to the generation mode of the summary sentence, and can be divided into an extraction type text summary and a generation type text summary. In recent years, as deep learning techniques have been further developed, deep learning models have been widely adopted to cope with automatic summarization tasks. The extraction type abstract model and the generation type abstract model can be conveniently fused into a comprehensive model based on deep learning, and the document abstract problem is further solved better.
The 'two-stage abstract model' in the prior art needs an extraction model with good identification capability and coding capability for key information, otherwise important information is lost in the extraction stage. In addition, the generative model is required to have a good sentence compression capability. Moreover, the operation of extraction is not guided, and the gradient cannot be propagated back to the extraction model from the result of the generation model, so that the extraction model and the generation model cannot be trained jointly during training.
Disclosure of Invention
The invention aims to provide a method and a device for constructing a single document abstract generation model based on multi-task learning, which are used for solving the problems of unreasonable attention distribution, poor model generalization performance, poor readability of generated abstract sentences and small contained information amount in the single document abstract generation method in the prior art.
In order to realize the task, the invention adopts the following technical scheme:
compared with the prior art, the invention has the following technical effects:
1. the model structure designed in the method and the device for constructing the single document abstract generation model based on the multitask learning adopts a Transformer basic model, and an encoder and a classifier are added at the output end of the encoder. Wherein, the two coders are divided into word-level coders and sentence-level coders; multi-task training is made possible; the attention distribution of the decoder is more reasonable, and regular terms calculated according to the attention weight of the decoder and the importance weight of the sentence are added, so that the sentence which is more suitable for being used as the abstract can be distributed with more weights;
2. according to the method and the device for constructing the single-document abstract generation model based on multi-task learning, the traditional training mode is changed, namely a plurality of training targets are used for training the same model, the final model is still a generative automatic abstract model, the abstract model is added only for improving the training effect, and the model potential is excavated through different tasks, so that the readability and the information content of the generated abstract sentence are improved.
Drawings
FIG. 1 is a schematic diagram of a conventional Transformer model in the prior art;
FIG. 2 is a structural diagram of a single document abstract generation model based on multi-task learning according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and examples. So that those skilled in the art can better understand the present invention. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
The following definitions or conceptual connotations relating to the present invention are provided for illustration:
the embedded representation: since the text cannot be directly processed by the computer, a feature representation of the text needs to be found. The feature vector corresponding to each word is found in a table look-up index mode, namely a mode of embedding the words in a high-dimensional space into a continuous low-dimensional vector space is also called word embedding. The obtained feature representation of the word is the embedded representation of the word.
Transformer encoder: the Transformer encoder is obtained by sequentially connecting a multi-head self-attention module, a residual error link and layer standardization module, a position feedforward module and a residual error link and layer standardization module in series.
A Transformer decoder: the Transformer decoder is obtained by sequentially connecting a multi-head self-attention module, a residual link and layer standardization module, a multi-head external attention module, a residual link and layer standardization module, a position feedforward module and a residual link and layer standardization module in series.
Label _ smoothening loss function: label smoothing regularization (Label smoothing regularization) is used to improve the over-fitting problem that arises when cross-entropy loss functions are computed with One-Hot Vector (One-Hot Vector) labels in the classification problem. And performing label smoothing and regularization, namely smoothing the one-hot vector labels, and then calculating the cross entropy.
KL divergence: the KL Divergence (Kullback-Leibler Divergence) is used to quantify the difference between two probability distributions, also called relative entropy.
Focal local Loss function: focal local was first applied to the task of target detection in the field of computer vision. In this task, there are too many negative examples in the sample compared to the positive examples, and many are easily classified. Focal local reduces the weight of samples which are easy to classify by increasing the weight of the positive examples of difficult classification, so that the model training process can focus more on the samples which are difficult to classify.
Example one
The embodiment discloses a method for constructing a single document abstract generation model based on multi-task learning.
The method is executed according to the following steps:
step 1, obtaining a plurality of sections of texts to obtain a text data set; each piece of text comprises a plurality of sentences, each sentence comprising a plurality of words;
obtaining an abstract corresponding to each text segment to obtain a first label set;
obtaining the correlation between each section of text and the corresponding abstract, and obtaining a second label set;
due to the powerful parallel capability of the model proposed by this patent, it should be possible to process longer texts. After simple data preprocessing, the model provided by the patent can be migrated to a Chinese text summarization task, and has excellent summarization capability.
In this embodiment, the text is:
[-lrb-cnn-rrb-ahmed farouq did n't have the prestige of fellow alqaeda figure osama bin laden,the influence of anwar al-awlaki,or thenotoriety of adam gadahn.
still,he was a big deal.
that's the assessment of multiple sources on a man who may not havebeen well-known in the west,but nonetheless had a special role in theterrorist group.
farouq--an american--died in a u.s.counterterrorism airstrike injanuary,according to the white house.
two al qaeda hostages,warren weinstein of the united states andgiovanni lo porto from italy,were killed in the same strike,while gadahn diedin another u.s.operation that month.
before that,farouq was the deputy emir of al qaeda in the indiansubcontinent,or aqis,a branch of the islamist extremist group that formed inrecent years.
the branch made its presence known in september 2014,when militantsinfiltrated pakistan's navy and tried to hijack one of its ships,according tothe site institute,which monitors terror groups.
the group's spokesman,usama mahmoud,on twitter compared the pakistaninaval officers involved in the attempted hijacking to nidal hasan,sitereported.
hasan is the u.s.army psychiatrist sentenced to death for killing13people at fort hood,texas.
osama mehmood,a spokesman for al qaeda in the indian subcontinent,said that farouq and another top figure,qari abdullah mansur,were killed in ajanuary 15drone strike in pakistan's shawal valley.
they were senior al qaeda leaders,according to mehmood.
american mouthpiece for al qaeda killed.
cnn's sophia saifi contributed to this report.]
label one (abstract) is:
ahmed farouq was a leader in al qaeda's india branch.he was killed ina u.s.counterterrorism airstrike in january.like adam gadahn,farouq wasamerican and part of al qaeda.
label two (sentence classification label) is:
[0,0,1,0,0,0,0,0,0,1,1,0,0]
step 2, preprocessing each section of text in the text data set to obtain the embedded representation of each word in each section of text;
adding the position of each word in the sentence as a position code into the embedded representation of each word, obtaining a new embedded representation of each word, and obtaining a training set;
in this embodiment, the whole segment of input text in the text data set is first divided into sentence levels to obtain a sequence of sentences { s }1,s2,…,sn}; then for each sentence SiDividing the word hierarchy to obtain a word sequence { wi1,wi2,…,wim}. Then, the coding at word level and the coding at sentence level are carried out on each sample, and then the samples are processed by a downstream classifier or decoder.
In this embodiment, S1,S2,……,SnRepresenting a divided sentence, S1Representing a first sentence, S2Representing a second sentence, SnRepresenting the nth sentence, wherein n is a positive integer, and S represents the sentence;
carrying out word hierarchy division on the ith sentence, wherein i is less than or equal to n, and wi11 st word, w, representing the ith sentencei22 nd word, w, representing the ith sentenceimThe mth word representing the ith sentence, m being a positive integer.
In order to construct a fixed vocabulary, in the embodiment, it is also necessary to count vocabularies appearing in a source text in an original data set, select a certain number of high-frequency vocabularies as the fixed vocabulary, index words in the vocabulary, and construct an index dictionary; each word is associated with a vector one-to-one using an indexing dictionary.
In this embodiment, each position has a fixed position code, where pos is the position corresponding to dmodelVector of dimensions (embedded representation of word is also d)modelVectors of dimensions, so two vectors can be added).
The coding formula mode of the position coding is to use a trigonometric function to calculate each element of the position coding vector corresponding to the pos position one by one. The odd bits in the position-coding vector are calculated using a sine function and the even bits are calculated using a cosine function.
In this embodiment, the source text is:
['england saw their champions trophy title hopes extinguished bygermany after suffering a 2-0quarter-final defeat at the tournament inindia.',
'olympic gold medallists germany scored late in the second and fourthquarters,through moritz furste and christopher ruhr,to set up a semi-finalagainst australia in bhubaneswar on saturday.',
"bobby crutchley's england side had decent second-half chances to getback on level terms,but an equaliser was beyond them,and ruhr killed off thematch with a 57th-minute close-range finish.",
"german player benedikt furk dives to stop the ball as england's nickcatlin watches on",
'alexandre de paeuw of germany is challenged by ashley jackson ofengland',
'germany took the lead on the stroke of half-time when furste struckhome into the top right of the net from a penalty corner.',
"they had an opportunity todouble their advantage midway through thethird quarter but pilt arnold's driven cross from the left sped across goal,with mats grambusch unable to cash in.",
'england looked like levelling in the 49th minute but amid a scramblesomehow germany kept the ball from crossing the line,with goalkeeper nicolasjacobi saving from barry middleton and adam dixon also lurking.',
"germany's tobias hauke dribbles past england's ashley jackson atkalinga stadium",
"ashley jackson could not drill in from england's first penaltycorner,with jacobi making a solid save,which he repeated later on in thematch.",
'england were pressing germany hard but in the 58th minute theirhopes were finally dashed.',
'a sharp turn from grambusch bought him space to send in a cross thatruhr converted,skilfully lifting the ball high into the net.']
index of source text word:
[101,2563,2387,2037,3966,5384,2516,8069,27705,2011,2762,2044,6114,1037,1016,29624,2692,4284,29624,16294,2389,4154,2012,1996,2977,1999,2634,1012,102,101,4386,2751,28595,2015,2762,3195,2397,1999,1996,2117,1998,2959,7728,1010,2083,28461,6519,13473,1998,5696,21766,8093,1010,2000,2275,2039,1037,4100,29624,16294,2389,2114,2660,1999,1038,6979,27543,26760,2906,2006,5095,1012,102,101,6173,13675,4904,2818,3051,1005,2015,2563,2217,2018,11519,2117,29624,8865,2546,9592,2000,2131,2067,2006,2504,3408,1010,2021,2019,5020,17288,2001,3458,2068,1010,1998,21766,8093,2730,2125,1996,2674,2007,1037,28623,29624,10020,10421,2485,29624,24388,2063,3926,1012,102,101,2446,2447,3841,2098,5480,2102,6519,2243,11529,2015,2000,2644,1996,3608,2004,2563,1005,2015,4172,4937,4115,12197,2006,102,101,16971,2139,6643,13765,2860,1997,2762,2003,8315,2011,9321,4027,1997,2563,102,101,2762,2165,1996,2599,2006,1996,6909,1997,2431,29624,7292,2043,6519,13473,4930,2188,2046,1996,2327,2157,1997,1996,5658,2013,1037,6531,3420,1012,102,101,2027,2018,2019,4495,2000,3313,2037,5056,12213,2083,1996,2353,4284,2021,14255,7096,7779,1005,2015,5533,2892,2013,1996,2187,16887,2408,3125,1010,2007,22281,13250,8286,2818,4039,2000,5356,1999,1012,102,101,2563,2246,2066,2504,2989,1999,1996,25726,3371,2021,13463,1037,25740,5064,2762,2921,1996,3608,2013,5153,1996,2240,1010,2007,9653,9473,6213,2072,7494,2013,6287,17756,1998,4205,11357,2036,24261,1012,102,101,2762,1005,2015,16858,5292,15851,2852,12322,13510,2627,2563,1005,2015,9321,4027,2012,19924,13807,3346,102,101,9321,4027,2071,2025,12913,1999,2013,2563,1005,2015,2034,6531,3420,1010,2007,6213,2072,2437,1037,5024,3828,1010,2029,2002,5567,2101,2006,1999,1996,2674,1012,102,101,2563,2020,7827,2762,2524,2021,1999,1996,5388,2705,3371,2037,8069,2020,2633,18198,1012,102,101,1037,4629,2735,2013,13250,8286,2818,4149,2032,2686,2000,4604,1999,1037,2892,2008,21766,8093,4991,1010,8301,10270,18083,2100,8783,1996,3608,2152,2046,1996,5658,1012,102]
the target abstract is as follows:
"olympic gold medallists germany scored late in the second and fourthquarters<q>bobby crutchley's england side had decent second-half chance<q>moritz furste and christopher ruhr scored for germany"
index of target summary words:
[1,4386,2751,28595,2015,2762,3195,2397,1999,1996,2117,1998,2959,7728,3,6173,13675,4904,2818,3051,1005,2015,2563,2217,2018,11519,2117,29624,8865,2546,3382,3,28461,6519,13473,1998,5696,21766,8093,3195,2005,2762,2]
position of each sentence in the source text:
[0,29,73,125,150,166,196,236,276,297,330,349]
the classification labels of sentences in the source text are:
[0,1,0,0,0,0,0,0,0,0,0,0]
the embedding of the source text is represented as:
tensor([[-0.4228,-0.0648,1.7755,...,-0.0450,-0.1446,0.0397],
[-1.5119,1.5509,0.8044,...,0.1634,-0.2404,-0.0408],
[-1.3100,-0.2735,-0.6489,...,0.7900,-0.4179,0.7393],
...,
[0.0259,1.4194,1.1651,...,-0.0327,0.6284,-0.1963],
[1.3589,1.0458,1.0282,...,0.5275,-1.5156,1.5691],
[0.3824,-0.3591,-2.2065,...,0.7527,-0.3730,0.6119]],
device is 0' cuda, and grad _ fn is AddcmulBackward), which is a tensor of [ text sequence length, implicit vector dimension ] shape.
Step 3, taking the training set as input, taking the first label set and the second label set as reference output, and training a neural network;
the neural network comprises a word coding network, a sentence coding network, a decoding network, a full connection layer and an output layer which are arranged in sequence;
the output of the sentence coding network is also connected with a classifier;
the word coding network comprises a plurality of transform encoders connected in series;
the sentence coding network comprises a plurality of transform encoders connected in series;
the decoding network comprises a plurality of Transformer decoders connected in series;
and obtaining a single document abstract generating model.
In this embodiment, the transform unit is used to implement the part of the layer sequence encoder and decoder in the model. The transform proposed by the article Attention Is All You Need Is a complete encoder-decoder framework, and the present invention improves the original transform so that the encoder part can encode the source text in layers. The specific structure of the conventional Transformer model is shown in fig. 1.
In this embodiment, FIG. 1 shows the internal details of the transform model.
After the Transformer model carries out embedded representation on the input text and codes fixed positions of words at different positions, the input text can be sent to a Transformer coder for coding.
The encoder comprises a multi-head self-attention module for performing attention calculation on input texts. Residual linking refers to adding the computation result from attention to the embedded representation of the input text. And then, carrying out layer standardization, namely, carrying out standardization on the characteristic dimension of the data, wherein the specific operation is to subtract the mean value on the characteristic dimension and divide the mean value by the standard deviation on the characteristic dimension.
Two linear mappings are performed in this block over a feed-forward fully-connected network. Namely, mapping the characteristic dimension to a high dimension, and then compressing the characteristic dimension to the original dimension. The residual linking and layer normalization are performed similarly thereafter.
At one end of the decoder, after the output text is embedded and expressed, the fixed position coding of words at different positions is added, and then the output text can be sent to a transform decoder for decoding.
The decoder internally comprises a multi-head self-attention module for performing attention calculation on output texts. Residual linking and layer normalization are the same as in the encoder.
Unlike the encoder, the decoder adds a multi-headed external attention module in order to accept information from the encoder. Likewise, residual linking and layer normalization is performed after the multi-headed external attention layer.
Two linear mappings are performed in this block over a feed-forward fully-connected network. Namely, mapping the characteristic dimension to a high dimension, and then compressing the characteristic dimension to the original dimension. The residual linking and layer normalization are performed similarly thereafter.
The output of the decoder is mapped to the dimensionality of the word list through the full connection layer, the probability distribution of the current decoding moment on the word list is obtained through the activation function, and the current decoding output can be obtained by sampling the probability distribution.
The single document abstract generation model provided by the invention, as shown in fig. 2, comprises a layer sequence encoder (a word encoding network and a sentence encoding network), a decoding network and a classifier. The specific structure is shown in fig. 2 (the specific structure in the transform module is omitted for simplicity).
In the structure diagram of the generative model provided by the present invention in fig. 2, the structure of the Transformer word encoder and the Transformer sentence encoder are the same as those of the Transformer encoder in fig. 1; and the Transformer decoder in fig. 2 is also the same as the Transformer decoder in fig. 1.
The embedded representation of the input text is encoded by a transform word encoder after adding a fixed position code to the embedded representation of the input text, resulting in a word code. And performing attention calculation on the word code corresponding to each sentence, and compressing the dimension into 1 to obtain the sentence code. And (4) coding the sentence codes by a Transformer sentence coder to obtain the characteristic representation of the sentence.
In the embodiment, on one hand, the feature representation of the sentence is sent to a classifier to obtain the classification result of the abstraction task; on the other hand, it is passed to the Transformer decoder for calculation of the extrinsic attention.
The classifier is internally composed of a linear full-connection layer and an activation function layer, and the probability distribution of the classification label is calculated by using the full-connection layer to compress characteristic dimensions and using the activation function layer. Therefore, the classification result of the abstraction task can be calculated.
The structure and function of the transform decoder portion in fig. 2 and the transform decoder portion in fig. 1 are completely the same, and therefore, the detailed description thereof is omitted.
In a layer sequence encoder (word encoding network and sentence encoding network), let d ═ s1,s2,...,sn}, sentence si={wi1,wi2,...,wimIn which wijThe jth word in the ith sentence in the source document representing a sample. Here Enc is usedwordRepresenting word-coded networks, using EncsentRepresenting a sentence coding network.
The representation of the words in each sentence is obtained using a word encoding network:
(ei1,ei2,...,eim)=Encword(wi1,wi2,...,wim) (1)
given a vector q as a query, EncwordThe output word representation is used as key and value, and the multi-head attention layer is sent to obtain the uncoded sentence representationThis layer is denoted as Attnsent。
The sentence code is sent into a sentence code network to obtain the vector representation of the sentence after being coded
In decoding networks, the partial sequences already generated are knownThen the decoding time at time t will be based on the partially decoded sequenceAnd encoded sentence vector representationAnd decoding is carried out.
First, by self-attention layer pairCoding is carried out to obtain a coding vector of a decoding partial sequence
Then encoded by an external attention layer toAs a query, withAs keys and values, attention vectors are calculated.
By means of a generator, encoding the vectorThe vector dimension is mapped to the word list dimension, and then the probability distribution on the word list is obtained through the softmax function(one dimension is a vector of word list sizes). According to probability distribution on word listSampling is carried out, and decoding output at the time t can be obtained.
The loss function uses a Label _ smoothening loss function, whereinIs a predicted label of a word, and $ y ^ { word } $ is a real label of the word corresponding to the target abstract:
the external attention of the decoder is recorded, the attention distribution attndit is averaged over the sentences, and the KL divergence is calculated from the importance distribution ScoreDist of the sentences.
Lkl=KL_Divergence(AttnDist,ScoreDist) (8)
The classifier is implemented as an encoder, and the basic structure is a Transformer encoder, which is herein denoted as Enccls. The layer accepts the output of the sentence decoder and performs self-attention calculations to obtain a sentence representation.
The dimensionality of the hidden layer is compressed by the linear layer, and then mapped into binary probability distribution through a softmax function.
The Loss function uses Focal local, whereAs predictive labels for sentences, ysentFor sentence true tags:
wherein, the calculation formula of Focal local is as follows:
optionally, when the neural network is trained in the step 3, a multi-task learning method is adopted;
wherein the loss function L is L ═ Lls+γLfl+λLklWherein γ and λ are both weight parameters, γ and λ are both greater than 0,yworda predictive tag that represents a word or words,a true tag representing a word, Label _ smoothening _ loss () representing a Label _ smoothening loss function,ysenta predictive tag that represents a sentence is presented,a true tag representing a sentence, Focal _ Loss () representing a Focal local Loss function, LklKL _ subvence (AttnDist, ScoreDist), AttnDist represents the average attention distribution of a sentence, ScoreDist represents the importance distribution of a sentence, and KL _ subvence () represents KL Divergence calculation.
The automatic abstract model mainly utilizes Multi-Task Learning (MTL) to train the abstract model, namely, an extraction type abstract Task is added on the basis of the traditional generative abstract model to serve as an auxiliary training target, so that the model has generalization.
Loss function is L ═ Lls+γLfl+λLklWherein, gamma and lambda are two hyper-parameters which need to be manually set and are used as weights for limiting the classification task and the condition constraint, and both gamma and lambda are larger than 0.
In the method for constructing a single-document abstract generation model based on multi-task learning in this embodiment, an extraction model is used as an Auxiliary object (automatic Objective) for joint training, so that an encoder of the generation model has generalization capability, and an external Attention module (Target-to-Source Attention) of a decoder allocates more Attention weights to key sentences.
Example two
The method for generating the single document abstract based on the multitask learning is characterized in that a single document to be abstracted is input into a single document abstract generating model obtained by the method for constructing the single document abstract generating model based on the multitask learning in the first embodiment, and an abstract is obtained.
In the present embodiment, a commonly used Evaluation index for text summarization task is "ROUGE" (Recall-organized Understudy for Gisting Evaluation). The ROUGE evaluation index is commonly used in the fields of automatic document summarization and machine translation, and an index for measuring the similarity between the generated text and a target sample in a data set is provided by comparing the generated text with the target sample in the data set. The ROUGE evaluation criteria are divided into two categories: ROUGE-N and ROUGE-L.
After reading the source document, the linguistic experts generally summarize and compile the summary of the source document in a manual mode; then the abstract generated by the modelAnd comparing the standard abstract y, and counting the number of the co-occurring N-grams (N-grams) in the two groups of data to serve as an evaluation index of the text abstract quality. By comparing with the multi-expert artificial abstract, the effectiveness of the evaluation system can be improved.
N in the ROUGE-N represents that the statistical unit is N-gram, and the value of N can be 1, 2 and the like. The ROUGE-1 calculates the repetition rate of the single words (Uni-gram) in the native formation abstract of the target abstract; the ROUGE-2 calculates the repetition rate of binary phrases (Bi-grams) in the native generated digest of the target digest. The calculation formula is as follows:
the ROUGE-L calculates the Longest Common subsequence (Longest Common Sub-Sequence) between the standard digest and the generated digest, and the calculation formula is defined as follows: the recall rate isWith a precision ratio ofF1 score of
Traditional algorithms often train a recurrent neural network with an objective function of a generative summarization task to achieve summarization. The multitask learning single document abstract generation model construction device provided by the invention is obviously different from the traditional method.
Firstly, the invention selects to use a Transformer module to construct a model. The multi-head attention mechanism in the Transformer module can not only perform parallel computation, but also capture long-distance dependency relationship and better encode the context. The method not only improves the overall calculation efficiency of the model, but also improves the accuracy of input text information processing.
Secondly, the invention uses a training mode of multi-task learning to replace a single training target. The extraction type abstract task related to the generation type abstract task (main task) is used as an auxiliary task to train the model, the generalization capability of the model is improved, so that the method is suitable for more various input texts and ensures the quality of the abstract result.
Thirdly, constraint items are added on the training target of the invention, and the attention distribution of a decoder is utilized to be related to the importance of sentences. The method also heuristically enables the model to pay more attention to the important sentences during decoding, so that the significance information amount of the summary result can be improved.
EXAMPLE III
The embodiment provides a single document abstract generation model construction device based on multitask learning, which comprises a data acquisition module, a preprocessing module and a model construction module;
the data acquisition module is used for acquiring a plurality of sections of texts and acquiring a text data set; each piece of text comprises a plurality of sentences, each sentence comprising a plurality of words;
obtaining an abstract corresponding to each text segment to obtain a first label set;
obtaining the correlation between each section of text and the corresponding abstract, and obtaining a second label set;
the preprocessing module is used for preprocessing each section of text data in the text data set to obtain the embedded representation of each word in each section of text;
adding the position of each word in the sentence as a position code into the embedded representation of each word, obtaining a new embedded representation of each word, and obtaining a training set;
the model building module is used for taking the training set as input, taking the first label set and the second label set as reference output and training the neural network;
the neural network comprises a word coding network, a sentence coding network, a decoding network, a full connection layer and an output layer which are arranged in sequence;
the output of the sentence coding network is also connected with a classifier;
the word coding network comprises a plurality of transform encoders connected in series;
the sentence coding network comprises a plurality of transform encoders connected in series;
the decoding network comprises a plurality of Transformer decoders connected in series;
and obtaining a single document abstract generating model.
Optionally, when the model building module trains the neural network, a multi-task learning method is adopted;
wherein the loss function L is L ═ Lls+γLfl+λLklWherein γ and λ are both weight parameters, γ and λ are both greater than 0,yworda predictive tag that represents a word or words,a true tag representing a word, Label _ smoothening _ loss () representing a Label _ smoothening loss function,ysenta predictive tag that represents a sentence is presented,a true tag representing a sentence, Focal _ Loss () representing a Focal local Loss function, LklKL _ subvence (AttnDist, ScoreDist), AttnDist represents the average attention distribution of a sentence, ScoreDist represents the importance distribution of a sentence, and KL _ subvence () represents KL Divergence calculation.
Example four
The embodiment provides a single document abstract generating device based on multitask learning, which is characterized by comprising a data acquisition module and an abstract generating module;
the data acquisition module is used for acquiring a single document of the abstract to be extracted;
the abstract generating module is used for inputting the single document to be abstracted into the single document abstract generating model obtained by the single document abstract generating model building device based on multi-task learning in the third embodiment to obtain the abstract.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus necessary general hardware, and certainly may also be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present invention may be substantially implemented or a part of the technical solutions contributing to the prior art may be embodied in the form of a software product, where the computer software product is stored in a readable storage medium, such as a floppy disk, a hard disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Claims (6)
1. A method for constructing a single document abstract generation model based on multitask learning is characterized by comprising the following steps:
step 1, obtaining a plurality of sections of texts to obtain a text data set; each text segment comprises a plurality of sentences, and each sentence comprises a plurality of words;
obtaining an abstract corresponding to each text segment to obtain a first label set;
obtaining the correlation between each section of text and the corresponding abstract, and obtaining a second label set;
step 2, preprocessing each section of text in the text data set to obtain the embedded representation of each word in each section of text;
adding the position of each word in the sentence as a position code into the embedded representation of each word, obtaining a new embedded representation of each word, and obtaining a training set;
step 3, taking the training set as input, taking the first label set and the second label set as reference output, and training a neural network;
the neural network comprises a word coding network, a sentence coding network, a decoding network, a full connection layer and an output layer which are arranged in sequence;
the output of the sentence coding network is also connected with a classifier;
the word coding network comprises a plurality of transform encoders connected in series;
the sentence coding network comprises a plurality of transform encoders connected in series;
the decoding network comprises a plurality of Transformer decoders connected in series;
and obtaining a single document abstract generating model.
2. The method for constructing a single document abstract generation model based on multitask learning according to claim 1, characterized in that when the neural network is trained in step 3, a multitask learning method is adopted;
wherein the loss function L during training is L ═ Lls+γLfl+λLklWherein γ and λ are both weight parameters, γ and λ are both greater than 0,yworda predictive tag that represents a word or words,the true Label of the word, Label _ smoothening _ loss, Label _ smoothening loss function,ysenta predictive tag that represents a sentence is presented,real tags representing sentences, Focal _ Loss representing Focal local Loss function, LklKL _ subvangece (attndit, ScoreDist), attndit represents the average attention distribution of the sentence, ScoreDist represents the importance distribution of the sentence, and KL _ subvangece represents the KL Divergence calculation.
3. A single document abstract generating method based on multitask learning is characterized in that a single document to be abstracted is input into a single document abstract generating model obtained by the single document abstract generating model building method based on multitask learning according to any one of claims 1-2, and an abstract is obtained.
4. A single document abstract generation model construction device based on multi-task learning is characterized by comprising a data acquisition module, a preprocessing module and a model construction module;
the data acquisition module is used for acquiring a plurality of sections of texts to acquire a text data set; each text segment comprises a plurality of sentences, and each sentence comprises a plurality of words;
obtaining an abstract corresponding to each text segment to obtain a first label set;
obtaining the correlation between each section of text and the corresponding abstract, and obtaining a second label set;
the preprocessing module is used for preprocessing each section of text data in the text data set to obtain the embedded representation of each word in each section of text;
adding the position of each word in the sentence as a position code into the embedded representation of each word, obtaining a new embedded representation of each word, and obtaining a training set;
the model building module is used for taking the training set as input, taking the first label set and the second label set as reference output and training a neural network;
the neural network comprises a word coding network, a sentence coding network, a decoding network, a full connection layer and an output layer which are arranged in sequence;
the output of the sentence coding network is also connected with a classifier;
the word coding network comprises a plurality of transform encoders connected in series;
the sentence coding network comprises a plurality of transform encoders connected in series;
the decoding network comprises a plurality of Transformer decoders connected in series;
and obtaining a single document abstract generating model.
5. The device for constructing the single-document abstract generation model based on the multitask learning as claimed in claim 4, wherein when the model construction module trains the neural network, a multitask learning method is adopted;
wherein the loss function L during training is L ═ Lls+γLfl+λLklWherein γ and λ are both weight parameters, γ and λ are both greater than 0,yworda predictive tag that represents a word or words,the true Label of the word, Label _ smoothening _ loss, Label _ smoothening loss function,ysenta predictive tag that represents a sentence is presented,real tags representing sentences, Focal _ Loss representing Focal local Loss function, LklKL _ subvangece (attndit, ScoreDist), attndit represents the average attention distribution of the sentence, ScoreDist represents the importance distribution of the sentence, and KL _ subvangece represents the KL Divergence calculation.
6. A single document abstract generating device based on multitask learning is characterized by comprising a data acquisition module and an abstract generating module;
the data acquisition module is used for acquiring a single document of the abstract to be extracted;
the abstract generating module is used for inputting a single document to be abstracted into a single document abstract generating model obtained by the single document abstract generating model building device based on multitask learning of any claim 4-5 to obtain an abstract.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010435810.5A CN111723196B (en) | 2020-05-21 | 2020-05-21 | Single document abstract generation model construction method and device based on multi-task learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010435810.5A CN111723196B (en) | 2020-05-21 | 2020-05-21 | Single document abstract generation model construction method and device based on multi-task learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111723196A true CN111723196A (en) | 2020-09-29 |
CN111723196B CN111723196B (en) | 2023-03-24 |
Family
ID=72564888
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010435810.5A Active CN111723196B (en) | 2020-05-21 | 2020-05-21 | Single document abstract generation model construction method and device based on multi-task learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111723196B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112463956A (en) * | 2020-11-26 | 2021-03-09 | 重庆邮电大学 | Text summary generation system and method based on counterstudy and hierarchical neural network |
CN113569049A (en) * | 2021-08-10 | 2021-10-29 | 燕山大学 | Multi-label text classification algorithm based on hierarchy Trans-CNN |
CN113762459A (en) * | 2021-01-26 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Model training method, text generation method, device, medium and equipment |
CN113761197A (en) * | 2021-07-29 | 2021-12-07 | 中国科学院计算机网络信息中心 | Application book multi-label hierarchical classification method capable of utilizing expert knowledge |
CN113808075A (en) * | 2021-08-04 | 2021-12-17 | 上海大学 | Two-stage tongue picture identification method based on deep learning |
CN114091429A (en) * | 2021-10-15 | 2022-02-25 | 山东师范大学 | Text abstract generation method and system based on heterogeneous graph neural network |
CN117313704A (en) * | 2023-11-28 | 2023-12-29 | 江西师范大学 | Mixed readability evaluation method and system based on public and private feature decomposition |
CN114091429B (en) * | 2021-10-15 | 2024-10-22 | 山东师范大学 | Text abstract generation method and system based on heterogeneous graph neural network |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018067199A (en) * | 2016-10-20 | 2018-04-26 | 日本電信電話株式会社 | Abstract generating device, text converting device, and methods and programs therefor |
US20180300400A1 (en) * | 2017-04-14 | 2018-10-18 | Salesforce.Com, Inc. | Deep Reinforced Model for Abstractive Summarization |
CN110162778A (en) * | 2019-04-02 | 2019-08-23 | 阿里巴巴集团控股有限公司 | The generation method and device of text snippet |
CN110297885A (en) * | 2019-05-27 | 2019-10-01 | 中国科学院深圳先进技术研究院 | Generation method, device, equipment and the storage medium of real-time event abstract |
CN110334334A (en) * | 2019-06-19 | 2019-10-15 | 腾讯科技(深圳)有限公司 | A kind of abstraction generating method, device and computer equipment |
CN110413986A (en) * | 2019-04-12 | 2019-11-05 | 上海晏鼠计算机技术股份有限公司 | A kind of text cluster multi-document auto-abstracting method and system improving term vector model |
CN110472238A (en) * | 2019-07-25 | 2019-11-19 | 昆明理工大学 | Text snippet method based on level interaction attention |
CN110532554A (en) * | 2019-08-26 | 2019-12-03 | 南京信息职业技术学院 | Chinese abstract generation method, system and storage medium |
US20190384810A1 (en) * | 2018-06-15 | 2019-12-19 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method of training a descriptive text generating model, and method and apparatus for generating descriptive text |
CN110737769A (en) * | 2019-10-21 | 2020-01-31 | 南京信息工程大学 | pre-training text abstract generation method based on neural topic memory |
CN110825870A (en) * | 2019-10-31 | 2020-02-21 | 腾讯科技(深圳)有限公司 | Document abstract acquisition method and device, storage medium and electronic device |
CN110929024A (en) * | 2019-12-10 | 2020-03-27 | 哈尔滨工业大学 | Extraction type text abstract generation method based on multi-model fusion |
CN111177366A (en) * | 2019-12-30 | 2020-05-19 | 北京航空航天大学 | Method, device and system for automatically generating extraction type document abstract based on query mechanism |
-
2020
- 2020-05-21 CN CN202010435810.5A patent/CN111723196B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018067199A (en) * | 2016-10-20 | 2018-04-26 | 日本電信電話株式会社 | Abstract generating device, text converting device, and methods and programs therefor |
US20180300400A1 (en) * | 2017-04-14 | 2018-10-18 | Salesforce.Com, Inc. | Deep Reinforced Model for Abstractive Summarization |
US20190384810A1 (en) * | 2018-06-15 | 2019-12-19 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method of training a descriptive text generating model, and method and apparatus for generating descriptive text |
CN110162778A (en) * | 2019-04-02 | 2019-08-23 | 阿里巴巴集团控股有限公司 | The generation method and device of text snippet |
CN110413986A (en) * | 2019-04-12 | 2019-11-05 | 上海晏鼠计算机技术股份有限公司 | A kind of text cluster multi-document auto-abstracting method and system improving term vector model |
CN110297885A (en) * | 2019-05-27 | 2019-10-01 | 中国科学院深圳先进技术研究院 | Generation method, device, equipment and the storage medium of real-time event abstract |
CN110334334A (en) * | 2019-06-19 | 2019-10-15 | 腾讯科技(深圳)有限公司 | A kind of abstraction generating method, device and computer equipment |
CN110472238A (en) * | 2019-07-25 | 2019-11-19 | 昆明理工大学 | Text snippet method based on level interaction attention |
CN110532554A (en) * | 2019-08-26 | 2019-12-03 | 南京信息职业技术学院 | Chinese abstract generation method, system and storage medium |
CN110737769A (en) * | 2019-10-21 | 2020-01-31 | 南京信息工程大学 | pre-training text abstract generation method based on neural topic memory |
CN110825870A (en) * | 2019-10-31 | 2020-02-21 | 腾讯科技(深圳)有限公司 | Document abstract acquisition method and device, storage medium and electronic device |
CN110929024A (en) * | 2019-12-10 | 2020-03-27 | 哈尔滨工业大学 | Extraction type text abstract generation method based on multi-model fusion |
CN111177366A (en) * | 2019-12-30 | 2020-05-19 | 北京航空航天大学 | Method, device and system for automatically generating extraction type document abstract based on query mechanism |
Non-Patent Citations (5)
Title |
---|
AHMED ELNAGGAR 等: "Multi-Task Deep Learning for Legal Document Translation, Summarization and Multi-Label Classification", 《ACM》 * |
SANCHIT AGARWAL 等: "Single-Document Summarization Using Sentence Embeddings and K-Means Clustering", 《IEEE》 * |
岳一峰等: "一种基于BERT的自动文本摘要模型构建方法", 《计算机与现代化》 * |
赵洪: "生成式自动文摘的深度学习方法综述", 《情报学报》 * |
陈雪雯: "基于子词单元的深度学习摘要生成方法", 《计算机应用与软件》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112463956A (en) * | 2020-11-26 | 2021-03-09 | 重庆邮电大学 | Text summary generation system and method based on counterstudy and hierarchical neural network |
CN113762459A (en) * | 2021-01-26 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Model training method, text generation method, device, medium and equipment |
CN113761197A (en) * | 2021-07-29 | 2021-12-07 | 中国科学院计算机网络信息中心 | Application book multi-label hierarchical classification method capable of utilizing expert knowledge |
CN113761197B (en) * | 2021-07-29 | 2022-07-26 | 中国科学院计算机网络信息中心 | Application form multi-label hierarchical classification method capable of utilizing expert knowledge |
CN113808075A (en) * | 2021-08-04 | 2021-12-17 | 上海大学 | Two-stage tongue picture identification method based on deep learning |
CN113569049A (en) * | 2021-08-10 | 2021-10-29 | 燕山大学 | Multi-label text classification algorithm based on hierarchy Trans-CNN |
CN113569049B (en) * | 2021-08-10 | 2024-03-29 | 燕山大学 | Multi-label text classification method based on hierarchical Trans-CNN |
CN114091429A (en) * | 2021-10-15 | 2022-02-25 | 山东师范大学 | Text abstract generation method and system based on heterogeneous graph neural network |
CN114091429B (en) * | 2021-10-15 | 2024-10-22 | 山东师范大学 | Text abstract generation method and system based on heterogeneous graph neural network |
CN117313704A (en) * | 2023-11-28 | 2023-12-29 | 江西师范大学 | Mixed readability evaluation method and system based on public and private feature decomposition |
CN117313704B (en) * | 2023-11-28 | 2024-02-23 | 江西师范大学 | Mixed readability evaluation method and system based on public and private feature decomposition |
Also Published As
Publication number | Publication date |
---|---|
CN111723196B (en) | 2023-03-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111723196B (en) | Single document abstract generation model construction method and device based on multi-task learning | |
CN111897949B (en) | Guided text abstract generation method based on Transformer | |
CN109472031B (en) | Aspect level emotion classification model and method based on double memory attention | |
CN110348016B (en) | Text abstract generation method based on sentence correlation attention mechanism | |
CN109871535B (en) | French named entity recognition method based on deep neural network | |
CN111858932B (en) | Multiple-feature Chinese and English emotion classification method and system based on Transformer | |
CN110119765A (en) | A kind of keyword extracting method based on Seq2seq frame | |
CN109492227A (en) | It is a kind of that understanding method is read based on the machine of bull attention mechanism and Dynamic iterations | |
CN109214003A (en) | The method that Recognition with Recurrent Neural Network based on multilayer attention mechanism generates title | |
CN111026869B (en) | Method for predicting multi-guilty names by using sequence generation network based on multilayer attention | |
CN109062897A (en) | Sentence alignment method based on deep neural network | |
CN109062910A (en) | Sentence alignment method based on deep neural network | |
CN110287323B (en) | Target-oriented emotion classification method | |
CN112231472A (en) | Judicial public opinion sensitive information identification method integrated with domain term dictionary | |
CN112559730B (en) | Text abstract automatic generation method and system based on global feature extraction | |
CN113239663B (en) | Multi-meaning word Chinese entity relation identification method based on Hopkinson | |
CN114169312A (en) | Two-stage hybrid automatic summarization method for judicial official documents | |
CN115033659A (en) | Clause-level automatic abstract model system based on deep learning and abstract generation method | |
CN117763093B (en) | Information record data blocking method based on pre-training language model | |
CN114972907A (en) | Image semantic understanding and text generation based on reinforcement learning and contrast learning | |
CN113935308A (en) | Method and system for automatically generating text abstract facing field of geoscience | |
Zhang et al. | A hierarchical attention seq2seq model with copynet for text summarization | |
TWI724644B (en) | Spoken or text documents summarization system and method based on neural network | |
Zhao et al. | Generating summary using sequence to sequence model | |
CN113255344B (en) | Keyword generation method integrating theme information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |