CN111723196B - Single document abstract generation model construction method and device based on multi-task learning - Google Patents

Single document abstract generation model construction method and device based on multi-task learning Download PDF

Info

Publication number
CN111723196B
CN111723196B CN202010435810.5A CN202010435810A CN111723196B CN 111723196 B CN111723196 B CN 111723196B CN 202010435810 A CN202010435810 A CN 202010435810A CN 111723196 B CN111723196 B CN 111723196B
Authority
CN
China
Prior art keywords
word
sentence
abstract
text
single document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010435810.5A
Other languages
Chinese (zh)
Other versions
CN111723196A (en
Inventor
蔡晓妍
刘森
戴航
杨黎斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202010435810.5A priority Critical patent/CN111723196B/en
Publication of CN111723196A publication Critical patent/CN111723196A/en
Application granted granted Critical
Publication of CN111723196B publication Critical patent/CN111723196B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method and a device for constructing a single document abstract generation model based on multi-task learning. Wherein, the two coders are divided into word-level coders and sentence-level coders; multi-task training is made possible; the attention distribution of the decoder is more reasonable, and regular terms calculated according to the attention weight of the decoder and the importance weight of the sentence are added, so that the sentence which is more suitable for being used as the abstract can be distributed with more weights; and model potentials are explored through different tasks, so that the readability and the information quantity of the generated abstract sentences are improved.

Description

Single document abstract generation model construction method and device based on multi-task learning
Technical Field
The invention relates to a method and a device for constructing a single document abstract generating model, in particular to a method and a device for constructing a single document abstract generating model based on multi-task learning.
Background
With the arrival of the big data era, the quantity of text information which can be obtained by people is increased rapidly, and the vigorous development of the internet also widens the channel for obtaining novel and various text contents. News media, social networks and the like output massive text information everyday in large quantities, which inevitably causes the problem of information overload for people. Furthermore, the amount of text information is limited in the content of novelty, and there is a large amount of redundant content, and people cannot deal with the amount of text information with limited effort or discriminate valuable information from a large amount of text that is of personal interest. Therefore, it is an urgent need to extract concise and clear important information from the text and provide the reader with reference. However, the process of writing abstracts for the articles is very tedious and inefficient, which causes waste of labor cost, namely the existence value of the text abstraction task. The method can summarize the information in the original document by efficiently and accurately summarizing the original document through the text automatic summarization technology.
The automatic text summarization technique is distinguished according to the type of a source document as input, and can be divided into a single-document text summary and a multi-document text summary. The automatic text summarization technique is distinguished according to the generation mode of the summary sentence, and can be divided into an extraction type text summary and a generation type text summary. In recent years, with the further development of deep learning technology, deep learning models are also widely adopted to cope with automatic summarization tasks. The extraction type abstract model and the generation type abstract model can be conveniently fused into a comprehensive model based on deep learning, and the document abstract problem is further solved better.
The 'two-stage abstract model' in the prior art needs an extraction model with good identification capability and coding capability for key information, otherwise important information is lost in the extraction stage. In addition, the generative model is required to have a good sentence compression capability. And the operation of extraction is not conductive, and the gradient cannot be propagated back to the extraction model from the result of the generation model, so that the extraction model and the generation model cannot be trained jointly during training.
Disclosure of Invention
The invention aims to provide a method and a device for constructing a single document abstract generation model based on multi-task learning, which are used for solving the problems of unreasonable attention distribution, poor model generalization performance, poor readability of generated abstract sentences and small contained information amount in the single document abstract generation method in the prior art.
In order to realize the task, the invention adopts the following technical scheme:
compared with the prior art, the invention has the following technical effects:
1. the model structure designed in the method and the device for constructing the single document abstract generation model based on the multitask learning adopts a Transformer basic model, and an encoder and a classifier are added at the output end of the encoder. Wherein, the two coders are divided into word-level coders and sentence-level coders; multi-task training is made possible; the attention distribution of the decoder is more reasonable, and regular terms calculated according to the attention weight of the decoder and the importance weight of the sentence are added, so that the sentence which is more suitable for being used as the abstract can be distributed with more weights;
2. according to the method and the device for constructing the single-document abstract generation model based on multi-task learning, the traditional training mode is changed, namely a plurality of training targets are used for training the same model, the final model is still a generative automatic abstract model, the abstract model is added only for improving the training effect, and the model potential is excavated through different tasks, so that the readability and the information content of the generated abstract sentence are improved.
Drawings
FIG. 1 is a schematic diagram of a conventional Transformer model in the prior art;
fig. 2 is a schematic structural diagram of a single-document abstract generation model based on multi-task learning according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the drawings and examples. So that those skilled in the art can better understand the present invention. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the main content of the present invention.
The following definitions or conceptual connotations related to the present invention are provided for illustration:
the embedded representation: since the text cannot be directly processed by the computer, a feature representation of the text needs to be found. The feature vector corresponding to each word is found in a table look-up index mode, namely a mode of embedding the words in a high-dimensional space into a continuous low-dimensional vector space is also called word embedding. The obtained feature representation of the word is the embedded representation of the word.
Transformer encoder: the Transformer encoder is obtained by sequentially connecting a multi-head self-attention module, a residual error link and layer standardization module, a position feedforward module and a residual error link and layer standardization module in series.
A Transformer decoder: the Transformer decoder is obtained by sequentially connecting a multi-head self-attention module, a residual link and layer standardization module, a multi-head external attention module, a residual link and layer standardization module, a position feedforward module and a residual link and layer standardization module in series.
Label _ smoothening loss function: label Smoothing Regularization (Label Smoothing Regularization) is used to improve the over-fitting problem caused when cross-entropy loss functions are computed with One-Hot Vector (One-Hot Vector) labels in the classification problem. And performing label smoothing and regularization, namely smoothing the one-hot vector labels, and then calculating the cross entropy.
KL divergence: the KL Divergence (Kullback-Leibler Divergence) is used to quantify the difference between two probability distributions, also called relative entropy.
Focal local Loss function: focal local was first applied to the task of target detection in the field of computer vision. In this task, there are too many negative examples in the sample compared to the positive examples, and many are easily classified. Focal local reduces the weight of samples which are easy to classify by increasing the weight of the positive examples of difficult classification, so that the model training process can focus more on the samples which are difficult to classify.
Example one
The embodiment discloses a method for constructing a single document abstract generation model based on multi-task learning.
The method is executed according to the following steps:
step 1, obtaining a plurality of sections of texts to obtain a text data set; each piece of text comprises a plurality of sentences, each sentence comprising a plurality of words;
obtaining an abstract corresponding to each text segment to obtain a first label set;
obtaining the correlation between each section of text and the corresponding abstract, and obtaining a second label set;
due to the powerful parallel capability of the model proposed by this patent, it should be possible to process longer texts. After simple data preprocessing, the model provided by the patent can be migrated to a Chinese text summarization task, and has excellent summarization capability.
In this embodiment, the text is:
[-lrb-cnn-rrb-ahmed farouq did n't have the prestige of fellow al qaeda figure osama bin laden,the influence of anwar al-awlaki,or the notoriety of adam gadahn.
still,he was a big deal.
that's the assessment of multiple sources on a man who may not have been well-known in the west,but nonetheless had a special role in the terrorist group.
farouq--an american--died in a u.s.counterterrorism airstrike in january,according to the white house.
two al qaeda hostages,warren weinstein of the united states and giovanni lo porto from italy,were killed in the same strike,while gadahn died in another u.s.operation that month.
before that,farouq was the deputy emir of al qaeda in the indian subcontinent,or aqis,a branch of the islamist extremist group that formed in recent years.
the branch made its presence known in september 2014,when militants infiltrated pakistan's navy and tried to hijack one of its ships,according to the site institute,which monitors terror groups.
the group's spokesman,usama mahmoud,on twitter compared the pakistani naval officers involved in the attempted hijacking to nidal hasan,site reported.
hasan is the u.s.army psychiatrist sentenced to death for killing 13people at fort hood,texas.
osama mehmood,a spokesman for al qaeda in the indian subcontinent,said that farouq and another top figure,qari abdullah mansur,were killed in a january 15drone strike in pakistan's shawal valley.
they were senior al qaeda leaders,according to mehmood.
american mouthpiece for al qaeda killed.
cnn's sophia saifi contributed to this report.]
label one (abstract) is:
ahmed farouq was a leader in al qaeda's india branch.he was killed in a u.s.counterterrorism airstrike in january.like adam gadahn,farouq was american and part of al qaeda.
label two (sentence classification label) is:
[0,0,1,0,0,0,0,0,0,1,1,0,0]
step 2, preprocessing each section of text in the text data set to obtain the embedded representation of each word in each section of text;
adding the position of each word in the sentence as a position code into the embedded representation of each word, obtaining a new embedded representation of each word, and obtaining a training set;
in this embodiment, the whole segment of input text in the text data set is first divided into sentence levels to obtain a sequence of sentences { s } 1 ,s 2 ,…,s n }; then for each sentence S i Dividing the word hierarchy to obtain a word sequence { w i1 ,w i2 ,…,w im }. Then, the coding at word level and the coding at sentence level are carried out on each sample, and then the samples are processed by a downstream classifier or decoder.
In this embodiment, S 1 ,S 2 ,……,S n Representing a divided sentence, S 1 Represents the first sentence, S 2 Representing a second sentence, S n Representing the nth sentence, wherein n is a positive integer, and S represents the sentence;
carrying out word hierarchy division on the ith sentence, wherein i is less than or equal to n, and w i1 1 st word, w, representing the ith sentence i2 2 nd word, w, representing the ith sentence im The mth word representing the ith sentence, m being a positive integer.
In order to construct a fixed vocabulary, in the embodiment, it is also necessary to count vocabularies appearing in a source text in an original data set, select a certain number of high-frequency vocabularies as the fixed vocabulary, index words in the vocabulary, and construct an index dictionary; each word is associated with a vector one-to-one using an indexing dictionary.
In this embodiment, each position has a fixed position code, where pos is the position corresponding to d model Vector of dimensions (embedded representation of word is also d) model Vectors of dimensions, so two vectors can be added).
The coding formula mode of the position coding is to use a trigonometric function to calculate each element of the position coding vector corresponding to the pos position one by one. The odd bits in the position-coding vector are calculated using a sine function and the even bits are calculated using a cosine function.
Figure BDA0002502201070000081
Figure BDA0002502201070000082
In this embodiment, the source text is:
['england saw their champions trophy title hopes extinguished by germany after suffering a 2-0quarter-final defeat at the tournament in india.',
'olympic gold medallists germany scored late in the second and fourth quarters,through moritz furste and christopher ruhr,to set up a semi-final against australia in bhubaneswar on saturday.',
"bobby crutchley's england side had decent second-half chances to get back on level terms,but an equaliser was beyond them,and ruhr killed off the match with a 57th-minute close-range finish.",
"german player benedikt furk dives to stop the ball as england's nick catlin watches on",
'alexandre de paeuw of germany is challenged by ashley jackson of england',
'germany took the lead on the stroke of half-time when furste struck home into the top right of the net from a penalty corner.',
"they had an opportunity to double their advantage midway through the third quarter but pilt arnold's driven cross from the left sped across goal,with mats grambusch unable to cash in.",
'england looked like levelling in the 49th minute but amid a scramble somehow germany kept the ball from crossing the line,with goalkeeper nicolas jacobi saving from barry middleton and adam dixon also lurking.',
"germany's tobias hauke dribbles past england's ashley jackson at kalinga stadium",
"ashley jackson could not drill in from england's first penalty corner,with jacobi making a solid save,which he repeated later on in the match.",
'england were pressing germany hard but in the 58th minute their hopes were finally dashed.',
'a sharp turn from grambusch bought him space to send in a cross that ruhr converted,skilfully lifting the ball high into the net.']
index of source text word:
[101,2563,2387,2037,3966,5384,2516,8069,27705,2011,2762,2044,6114,1037,1016,29624,2692,4284,29624,16294,2389,4154,2012,1996,2977,1999,2634,1012,102,101,4386,2751,28595,2015,2762,3195,2397,1999,1996,2117,1998,2959,7728,1010,2083,28461,6519,13473,1998,5696,21766,8093,1010,2000,2275,2039,1037,4100,29624,16294,2389,2114,2660,1999,1038,6979,27543,26760,2906,2006,5095,1012,102,101,6173,13675,4904,2818,3051,1005,2015,2563,2217,2018,11519,2117,29624,8865,2546,9592,2000,2131,2067,2006,2504,3408,1010,2021,2019,5020,17288,2001,3458,2068,1010,1998,21766,8093,2730,2125,1996,2674,2007,1037,28623,29624,10020,10421,2485,29624,24388,2063,3926,1012,102,101,2446,2447,3841,2098,5480,2102,6519,2243,11529,2015,2000,2644,1996,3608,2004,2563,1005,2015,4172,4937,4115,12197,2006,102,101,16971,2139,6643,13765,2860,1997,2762,2003,8315,2011,9321,4027,1997,2563,102,101,2762,2165,1996,2599,2006,1996,6909,1997,2431,29624,7292,2043,6519,13473,4930,2188,2046,1996,2327,2157,1997,1996,5658,2013,1037,6531,3420,1012,102,101,2027,2018,2019,4495,2000,3313,2037,5056,12213,2083,1996,2353,4284,2021,14255,7096,7779,1005,2015,5533,2892,2013,1996,2187,16887,2408,3125,1010,2007,22281,13250,8286,2818,4039,2000,5356,1999,1012,102,101,2563,2246,2066,2504,2989,1999,1996,25726,3371,2021,13463,1037,25740,5064,2762,2921,1996,3608,2013,5153,1996,2240,1010,2007,9653,9473,6213,2072,7494,2013,6287,17756,1998,4205,11357,2036,24261,1012,102,101,2762,1005,2015,16858,5292,15851,2852,12322,13510,2627,2563,1005,2015,9321,4027,2012,19924,13807,3346,102,101,9321,4027,2071,2025,12913,1999,2013,2563,1005,2015,2034,6531,3420,1010,2007,6213,2072,2437,1037,5024,3828,1010,2029,2002,5567,2101,2006,1999,1996,2674,1012,102,101,2563,2020,7827,2762,2524,2021,1999,1996,5388,2705,3371,2037,8069,2020,2633,18198,1012,102,101,1037,4629,2735,2013,13250,8286,2818,4149,2032,2686,2000,4604,1999,1037,2892,2008,21766,8093,4991,1010,8301,10270,18083,2100,8783,1996,3608,2152,2046,1996,5658,1012,102]
the target abstract is as follows:
"olympic gold medallists germany scored late in the second and fourth quarters<q>bobby crutchley's england side had decent second-half chance<q>moritz furste and christopher ruhr scored for germany"
index of target summary words:
[1,4386,2751,28595,2015,2762,3195,2397,1999,1996,2117,1998,2959,7728,3,6173,13675,4904,2818,3051,1005,2015,2563,2217,2018,11519,2117,29624,8865,2546,3382,3,28461,6519,13473,1998,5696,21766,8093,3195,2005,2762,2]
position of each sentence in the source text:
[0,29,73,125,150,166,196,236,276,297,330,349]
the classification labels of sentences in the source text are:
[0,1,0,0,0,0,0,0,0,0,0,0]
the embedding of the source text is represented as:
tensor([[-0.4228,-0.0648,1.7755,...,-0.0450,-0.1446,0.0397],
[-1.5119,1.5509,0.8044,...,0.1634,-0.2404,-0.0408],
[-1.3100,-0.2735,-0.6489,...,0.7900,-0.4179,0.7393],
[0.0259,1.4194,1.1651,...,-0.0327,0.6284,-0.1963],
[1.3589,1.0458,1.0282,...,0.5275,-1.5156,1.5691],
[0.3824,-0.3591,-2.2065,...,0.7527,-0.3730,0.6119]],
device = 'cuda:0', grad _ fn = < AddcmulBackward >) is a tensor of a [ text sequence length, implicit vector dimension ] shape.
Step 3, taking the training set as input, taking the first label set and the second label set as reference output, and training a neural network;
the neural network comprises a word coding network, a sentence coding network, a decoding network, a full connection layer and an output layer which are arranged in sequence;
the output of the sentence coding network is also connected with a classifier;
the word coding network comprises a plurality of transform encoders connected in series;
the sentence coding network comprises a plurality of transform encoders connected in series;
the decoding network comprises a plurality of Transformer decoders connected in series;
and obtaining a single document abstract generating model.
In the present embodiment, the transform unit is used to implement the part of the layer sequence encoder and decoder in the model. The transform proposed by the article Attention Is All You Need Is a complete encoder-decoder framework, and the present invention improves the original transform so that the encoder part can encode the source text in layers. The specific structure of the conventional Transformer model is shown in fig. 1.
In this embodiment, FIG. 1 shows the internal details of the transform model.
After the Transformer model carries out embedded representation on the input text and codes fixed positions of words at different positions, the input text can be sent to a Transformer coder for coding.
The encoder comprises a multi-head self-attention module for performing attention calculation on input texts. Residual linking refers to adding the computation result from attention to the embedded representation of the input text. And then, carrying out layer standardization, namely, standardizing the characteristic dimension of the data, wherein the specific operation is to subtract the mean value on the characteristic dimension and divide the mean value by the standard deviation on the characteristic dimension.
Two linear mappings are performed in this block over a feed-forward fully-connected network. Namely, mapping the characteristic dimension to a high dimension, and then compressing the characteristic dimension to the original dimension. The residual linking and layer normalization are performed similarly thereafter.
At one end of the decoder, after the output text is embedded and expressed, the fixed position coding of words at different positions is added, and then the output text can be sent to a transform decoder for decoding.
The decoder internally comprises a multi-head self-attention module for performing attention calculation on output texts. Residual linking and layer normalization are the same as in the encoder.
Unlike the encoder, the decoder adds a multi-headed external attention module in order to accept information from the encoder. Likewise, residual linking and layer normalization is performed after the multi-headed external attention layer.
Two linear mappings are performed in this block over a feed-forward fully-connected network. I.e. mapping the feature dimension to a high dimension, and then compressing the feature dimension to the original dimension. The residual linking and layer normalization are performed similarly thereafter.
The output of the decoder is mapped to the dimensionality of the word list through the full connection layer, the probability distribution of the current decoding moment on the word list is obtained through the activation function, and the current decoding output can be obtained by sampling the probability distribution.
The model for generating a single document abstract provided by the invention, as shown in fig. 2, comprises a layer sequence encoder (a word encoding network and a sentence encoding network), a decoding network and a classifier. The specific structure is shown in fig. 2 (for simplicity, the specific structure in the Transformer module is omitted).
In the structure diagram of the generative model provided by the present invention in fig. 2, the structure of the Transformer word encoder and the Transformer sentence encoder are the same as those of the Transformer encoder in fig. 1; and the Transformer decoder in figure 2 is the same as the Transformer decoder in figure 1.
The embedded representation of the input text is coded by a Transformer word coder after adding a fixed position code to obtain a word code. And performing attention calculation on the word code corresponding to each sentence, and compressing the dimension into 1 to obtain the sentence code. And (4) coding the sentences by a Transformer sentence coder to obtain the characteristic representation of the sentences.
In the embodiment, on one hand, the feature representation of the sentence is sent to a classifier to obtain the classification result of the abstraction task; on the other hand, it is passed to the Transformer decoder for calculation of the extrinsic attention.
The classifier is internally composed of a linear full-connection layer and an activation function layer, and the probability distribution of the classification label is calculated by using the full-connection layer to compress characteristic dimensions and using the activation function layer. Therefore, the classification result of the abstraction task can be calculated.
The structure and function of the part of the transform decoder in fig. 2 are completely the same as those of the transform decoder in fig. 1, and therefore are not described again.
In the layer sequence encoder (word encoding network and sentence encoding network), let document d = { s = { s } 1 ,s 2 ,...,s n }, sentence s i ={w i1 ,w i2 ,...,w im In which w ij The jth word in the ith sentence in the source document representing a sample. Here Enc is used word Representing word-coded networks, using Enc sent Representing a sentence coding network.
The representation of the words in each sentence is obtained using a word encoding network:
(e i1 ,e i2 ,...,e im )=Enc word (w i1 ,w i2 ,...,w im ) (1)
given a vector q as a query, enc word The output word representation is used as key and value, and the multi-head attention layer is sent to obtain the uncoded sentence representation
Figure BDA0002502201070000151
This layer is denoted as Attn sent
Figure BDA0002502201070000152
The sentence code is sent into the sentence code network to obtain the coded sentence vector representation
Figure BDA0002502201070000161
Figure BDA0002502201070000162
In decoding networks, the partial sequences already generated are known
Figure BDA0002502201070000163
The decoding instant at instant t will be based on the partial decoding sequence @>
Figure BDA0002502201070000164
And encoded sentence vector representation
Figure BDA00025022010700001611
And decoding is carried out.
First, by self-attention layer pair
Figure BDA00025022010700001612
Coding is carried out to obtain a coding vector of a decoding partial sequence
Figure BDA0002502201070000165
Figure BDA0002502201070000166
Then encoded by an external attention layer to
Figure BDA00025022010700001613
As query, to->
Figure BDA0002502201070000167
As keys and values, attention vectors are calculated.
Figure BDA0002502201070000168
By means of a generator, encoding the vector
Figure BDA00025022010700001614
The vector dimension of (a) is mapped to the word list dimension, and then the probability distribution on the word list is obtained through the softmax function>
Figure BDA00025022010700001615
(one dimension is a vector of word list sizes). According to probability distribution on word list
Figure BDA00025022010700001616
Sampling is carried out, and decoding output at the time t can be obtained.
Figure BDA0002502201070000169
The loss function uses a Label _ smoothening loss function, wherein
Figure BDA00025022010700001610
Is a predicted label of a word, and $ y ^ { word } $ is a real label of the word corresponding to the target abstract:
Figure BDA0002502201070000171
the external attention of the decoder is recorded, the attention distribution attndit is averaged over the sentences, and the KL divergence is calculated from the importance distribution ScoreDist of the sentences.
L kl =KL_Divergence(AttnDist,ScoreDist) (8)
The classifier is implemented as an encoder, and the basic structure is a Transformer encoder, which is herein denoted as Enc cls . The layer accepts the output of the sentence decoder and performs a self-attention calculation to obtain a sentence representation.
Figure BDA0002502201070000172
The dimensionality of the hidden layer is compressed by the linear layer, and then mapped into binary probability distribution through a softmax function.
Figure BDA0002502201070000173
The Loss function uses Focal Loss, where
Figure BDA0002502201070000176
As predictive labels for sentences, y sent For sentence true tags:
Figure BDA0002502201070000174
wherein, the calculation formula of the Focal local is as follows:
Figure BDA0002502201070000175
optionally, when the neural network is trained in step 3, a multi-task learning method is adopted;
wherein the loss function L is L = L ls +γL fl +λL kl Wherein γ and λ are both weight parameters, γ and λ are both greater than 0,
Figure BDA0002502201070000181
y word a predictive tag representing a word, <' > or>
Figure BDA0002502201070000182
The real Label representing the word or words, label _ smoothening _ loss () represents the Label _ smoothening loss function, which is greater than or equal to>
Figure BDA0002502201070000183
y sent Predictive tag for representing a sentence, <' > based on a time period>
Figure BDA0002502201070000184
A true tag representing a sentence, focal _ Loss () representing a Focal local Loss function, L kl = KL _ subvangece (attndit, scoreDist), attnDist represents the average attention distribution of a sentence, scoreDist represents the importance distribution of a sentence, KL _ subvangece () represents KL Divergence calculation.
The automatic abstract model mainly utilizes Multi-Task Learning (MTL) to train the abstract model, namely, an extraction type abstract Task is added on the basis of the traditional generative abstract model to serve as an auxiliary training target, so that the model has generalization.
Loss function L = L ls +γL fl +λL kl Wherein, gamma and lambda are two hyper-parameters which need to be manually set and are used as weights for limiting the classification task and the condition constraint, and both gamma and lambda are larger than 0.
In the method for constructing a single-document abstract generation model based on multi-task learning in this embodiment, an extraction model is used as an Auxiliary object (automatic Objective) for joint training, so that an encoder of the generation model has generalization capability, and an external Attention module (Target-to-Source Attention) of a decoder allocates more Attention weights to key sentences.
Example two
The method for generating the single document abstract based on the multitask learning is characterized in that a single document to be abstracted is input into a single document abstract generating model obtained by the method for constructing the single document abstract generating model based on the multitask learning in the first embodiment, and an abstract is obtained.
In the present embodiment, a commonly used Evaluation index for text summarization task is ROUGE (call-organized outstanding for learning Evaluation). The ROUGE evaluation index is commonly used in the fields of automatic document summarization and machine translation, and an index for measuring the similarity between the generated text and a target sample in a data set is provided by comparing the generated text with the target sample in the data set. The ROUGE evaluation criteria are divided into two categories: ROUGE-N and ROUGE-L.
After reading the source document, the linguistic experts generally summarize and compile the summary of the source document in a manual mode; then the abstract generated by the model
Figure BDA0002502201070000192
And comparing the standard abstract y, and counting the number of the co-occurring N-grams (N-grams) in the two groups of data to serve as an evaluation index of the text abstract quality. By comparing with the multi-expert artificial abstract, the effectiveness of the evaluation system can be improved.
N in the ROUGE-N represents that a statistical unit is N-gram, and the value of N can be 1,2 and the like. The ROUGE-1 calculates the repetition rate of the single words (Uni-gram) in the native formation abstract of the target abstract; the ROUGE-2 calculates the repetition rate of binary phrases (Bi-grams) in the native generated digest of the target digest. The calculation formula is as follows:
Figure BDA0002502201070000191
the ROUGE-L calculates the Longest Common subsequence (Longest Common Sub-Sequence) between the standard digest and the generated digest, and the calculation formula is defined as follows: the recall rate is
Figure BDA0002502201070000201
Has an accuracy of->
Figure BDA0002502201070000202
An F1 score of->
Figure BDA0002502201070000203
Traditional algorithms often train a recurrent neural network with an objective function of a generative summarization task to achieve summarization. The multitask learning single document abstract generation model construction device provided by the invention is obviously different from the traditional method.
1. The invention chooses to use a Transformer module to construct the model. The multi-head attention mechanism in the Transformer module can not only perform parallel computation, but also capture long-distance dependency relationship and better encode the context. The method not only improves the overall calculation efficiency of the model, but also improves the accuracy of input text information processing.
2. The present invention uses a training approach of multi-task learning instead of a single training goal. The extraction type abstract task related to the generation type abstract task (main task) is used as an auxiliary task to train the model, the generalization capability of the model is improved, so that the method is suitable for more various input texts and ensures the quality of the abstract result.
3. The invention adds a constraint term on the training target, and utilizes the relation between the attention distribution of a decoder and the importance of sentences. The method also heuristically enables the model to pay more attention to the important sentences during decoding, so that the significance information amount of the summary result can be improved.
EXAMPLE III
The embodiment provides a single document abstract generation model construction device based on multitask learning, which comprises a data acquisition module, a preprocessing module and a model construction module;
the data acquisition module is used for acquiring a plurality of sections of texts and acquiring a text data set; each piece of text comprises a plurality of sentences, and each sentence comprises a plurality of words;
obtaining an abstract corresponding to each text segment to obtain a first label set;
obtaining the correlation between each text and the corresponding abstract to obtain a second label set;
the preprocessing module is used for preprocessing each section of text data in the text data set to obtain the embedded representation of each word in each section of text;
adding the position of each word in the sentence as a position code into the embedded representation of each word, obtaining a new embedded representation of each word, and obtaining a training set;
the model building module is used for taking the training set as input, taking the first label set and the second label set as reference output and training the neural network;
the neural network comprises a word coding network, a sentence coding network, a decoding network, a full connection layer and an output layer which are arranged in sequence;
the output of the sentence coding network is also connected with a classifier;
the word coding network comprises a plurality of transform encoders connected in series;
the sentence coding network comprises a plurality of transform encoders connected in series;
the decoding network comprises a plurality of Transformer decoders connected in series;
and obtaining a single document abstract generating model.
Optionally, when the model building module trains the neural network, a multi-task learning method is adopted;
wherein the loss function L is L = L ls +γL fl +λL kl Wherein γ and λ are both weight parameters, γ and λ are both greater than 0,
Figure BDA0002502201070000211
y word a predictive tag representing a word, <' > or>
Figure BDA0002502201070000212
The real Label representing the word or words, label _ smoothening _ loss () represents the Label _ smoothening loss function, which is greater than or equal to>
Figure BDA0002502201070000221
y sent A predictive tag representing a sentence, <' > or>
Figure BDA0002502201070000222
A true tag representing a sentence, focal _ Loss () representing a Focal local Loss function, L kl = KL _ subvangece (attndit, scoreDist), attnDist represents the average attention distribution of a sentence, scoreDist represents the importance distribution of a sentence, KL _ subvangece () represents KL Divergence calculation.
Example four
The embodiment provides a single document abstract generating device based on multitask learning, which is characterized by comprising a data acquisition module and an abstract generating module;
the data acquisition module is used for acquiring a single document of the abstract to be extracted;
the abstract generating module is used for inputting the single document of which the abstract is to be extracted into the single document abstract generating model obtained by the single document abstract generating model establishing device based on multi-task learning in the third embodiment to obtain the abstract.
Through the description of the above embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus necessary general hardware, and certainly may also be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present invention may be substantially implemented or a part of the technical solutions contributing to the prior art may be embodied in the form of a software product, where the computer software product is stored in a readable storage medium, such as a floppy disk, a hard disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Claims (4)

1. A method for constructing a single document abstract generation model based on multitask learning is characterized by comprising the following steps:
step 1, obtaining a plurality of sections of texts to obtain a text data set; each text segment comprises a plurality of sentences, and each sentence comprises a plurality of words;
obtaining an abstract corresponding to each text segment to obtain a first label set;
obtaining the correlation between each section of text and the corresponding abstract, and obtaining a second label set;
step 2, preprocessing each section of text in the text data set to obtain the embedded representation of each word in each section of text;
adding the position of each word in the sentence as a position code into the embedded representation of each word, obtaining a new embedded representation of each word, and obtaining a training set;
step 3, taking the training set as input, taking the first label set and the second label set as reference output, and training a neural network;
the neural network comprises a word coding network, a sentence coding network, a decoding network, a full connection layer and an output layer which are arranged in sequence;
the output of the sentence coding network is also connected with a classifier;
the word coding network comprises a plurality of transform encoders connected in series;
the sentence coding network comprises a plurality of transform encoders connected in series;
the decoding network comprises a plurality of Transformer decoders connected in series;
obtaining a single document abstract generating model;
when the neural network is trained in the step 3, a multi-task learning method is adopted;
wherein the loss function L during training is L = L ls +γL fl +λL kl Wherein γ and λ are both weight parameters, γ and λ are both greater than 0,
Figure FDA0003974404890000021
y word a predictive tag that represents a word or words,
Figure FDA0003974404890000022
the true Label of the word, label _ smoothening _ loss, label _ smoothening loss function,
Figure FDA0003974404890000023
y sent a predictive tag that represents a sentence is presented,
Figure FDA0003974404890000024
real tags representing sentences, focal _ Loss representing Focal local Loss function, L kl = KL _ subvangece (attndit, scoreDist), attnDist denotes the average attention distribution of sentences, scoreDist denotes the importance distribution of sentences, KL _ subvangement denotes the KL Divergence calculation.
2. A single document abstract generating method based on multitask learning is characterized in that a single document to be abstracted is input into a single document abstract generating model obtained by the single document abstract generating model building method based on the multitask learning according to claim 1, and an abstract is obtained.
3. A single document abstract generation model construction device based on multi-task learning is characterized by comprising a data acquisition module, a preprocessing module and a model construction module;
the data acquisition module is used for acquiring a plurality of sections of texts to acquire a text data set; each text segment comprises a plurality of sentences, and each sentence comprises a plurality of words;
obtaining an abstract corresponding to each text segment to obtain a first label set;
obtaining the correlation between each section of text and the corresponding abstract, and obtaining a second label set;
the preprocessing module is used for preprocessing each section of text data in the text data set to obtain the embedded representation of each word in each section of text;
adding the position of each word in the sentence as a position code into the embedded representation of each word, obtaining a new embedded representation of each word, and obtaining a training set;
the model building module is used for taking the training set as input, taking the first label set and the second label set as reference output and training a neural network;
the neural network comprises a word coding network, a sentence coding network, a decoding network, a full connection layer and an output layer which are arranged in sequence;
the output of the sentence coding network is also connected with a classifier;
the word coding network comprises a plurality of transform encoders connected in series;
the sentence coding network comprises a plurality of transform encoders connected in series;
the decoding network comprises a plurality of Transformer decoders connected in series;
obtaining a single document abstract generating model;
when the model building module trains the neural network, a multi-task learning method is adopted;
wherein the loss function L during training is L = L ls +γL fl +λL kl Wherein γ and λ are both weight parameters, γ and λ are both greater than 0,
Figure FDA0003974404890000031
y word a predictive tag that represents a word or words,
Figure FDA0003974404890000032
a true Label representing a word, label _ smoothing _ loss representing a Label _ smoothing loss function,
Figure FDA0003974404890000033
y sent a predictive tag that represents a sentence is used,
Figure FDA0003974404890000034
real tags representing sentences, focal _ Loss representing Focal local Loss function, L kl =KL_Divergence(AttnDist,ScoreDist), attnDist represents the average attention distribution of a sentence, scoreDist represents the importance distribution of a sentence, and KL _ Divergene represents KL Divergence calculation.
4. A single document abstract generating device based on multitask learning is characterized by comprising a data acquisition module and an abstract generating module;
the data acquisition module is used for acquiring a single document of the abstract to be extracted;
the abstract generating module is used for inputting a single document to be abstracted into a single document abstract generating model obtained by the single document abstract generating model building device based on multitask learning according to claim 3, so as to obtain an abstract.
CN202010435810.5A 2020-05-21 2020-05-21 Single document abstract generation model construction method and device based on multi-task learning Active CN111723196B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010435810.5A CN111723196B (en) 2020-05-21 2020-05-21 Single document abstract generation model construction method and device based on multi-task learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010435810.5A CN111723196B (en) 2020-05-21 2020-05-21 Single document abstract generation model construction method and device based on multi-task learning

Publications (2)

Publication Number Publication Date
CN111723196A CN111723196A (en) 2020-09-29
CN111723196B true CN111723196B (en) 2023-03-24

Family

ID=72564888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010435810.5A Active CN111723196B (en) 2020-05-21 2020-05-21 Single document abstract generation model construction method and device based on multi-task learning

Country Status (1)

Country Link
CN (1) CN111723196B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463956B (en) * 2020-11-26 2022-08-23 重庆邮电大学 Text abstract generation system and method based on antagonistic learning and hierarchical neural network
CN113762459A (en) * 2021-01-26 2021-12-07 北京沃东天骏信息技术有限公司 Model training method, text generation method, device, medium and equipment
CN113761197B (en) * 2021-07-29 2022-07-26 中国科学院计算机网络信息中心 Application form multi-label hierarchical classification method capable of utilizing expert knowledge
CN113808075B (en) * 2021-08-04 2024-06-18 上海大学 Two-stage tongue picture identification method based on deep learning
CN113569049B (en) * 2021-08-10 2024-03-29 燕山大学 Multi-label text classification method based on hierarchical Trans-CNN
CN114091429A (en) * 2021-10-15 2022-02-25 山东师范大学 Text abstract generation method and system based on heterogeneous graph neural network
CN117313704B (en) * 2023-11-28 2024-02-23 江西师范大学 Mixed readability evaluation method and system based on public and private feature decomposition

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018067199A (en) * 2016-10-20 2018-04-26 日本電信電話株式会社 Abstract generating device, text converting device, and methods and programs therefor
CN110413986A (en) * 2019-04-12 2019-11-05 上海晏鼠计算机技术股份有限公司 A kind of text cluster multi-document auto-abstracting method and system improving term vector model
CN110737769A (en) * 2019-10-21 2020-01-31 南京信息工程大学 pre-training text abstract generation method based on neural topic memory
CN111177366A (en) * 2019-12-30 2020-05-19 北京航空航天大学 Method, device and system for automatically generating extraction type document abstract based on query mechanism

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10474709B2 (en) * 2017-04-14 2019-11-12 Salesforce.Com, Inc. Deep reinforced model for abstractive summarization
CN109062937B (en) * 2018-06-15 2019-11-26 北京百度网讯科技有限公司 The method of training description text generation model, the method and device for generating description text
CN110162778B (en) * 2019-04-02 2023-05-26 创新先进技术有限公司 Text abstract generation method and device
CN110297885B (en) * 2019-05-27 2021-08-17 中国科学院深圳先进技术研究院 Method, device and equipment for generating real-time event abstract and storage medium
CN110334334B (en) * 2019-06-19 2024-05-14 腾讯科技(深圳)有限公司 Digest generation method and device and computer equipment
CN110472238B (en) * 2019-07-25 2022-11-18 昆明理工大学 Text summarization method based on hierarchical interaction attention
CN110532554B (en) * 2019-08-26 2023-05-05 南京信息职业技术学院 Chinese abstract generation method, system and storage medium
CN110825870B (en) * 2019-10-31 2023-07-14 腾讯科技(深圳)有限公司 Method and device for acquiring document abstract, storage medium and electronic device
CN110929024B (en) * 2019-12-10 2021-07-02 哈尔滨工业大学 Extraction type text abstract generation method based on multi-model fusion

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018067199A (en) * 2016-10-20 2018-04-26 日本電信電話株式会社 Abstract generating device, text converting device, and methods and programs therefor
CN110413986A (en) * 2019-04-12 2019-11-05 上海晏鼠计算机技术股份有限公司 A kind of text cluster multi-document auto-abstracting method and system improving term vector model
CN110737769A (en) * 2019-10-21 2020-01-31 南京信息工程大学 pre-training text abstract generation method based on neural topic memory
CN111177366A (en) * 2019-12-30 2020-05-19 北京航空航天大学 Method, device and system for automatically generating extraction type document abstract based on query mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Multi-Task Deep Learning for Legal Document Translation, Summarization and Multi-Label Classification;Ahmed Elnaggar 等;《ACM》;20181221;第9-15页 *
一种基于BERT的自动文本摘要模型构建方法;岳一峰等;《计算机与现代化》;20200115(第01期);第63-68页 *

Also Published As

Publication number Publication date
CN111723196A (en) 2020-09-29

Similar Documents

Publication Publication Date Title
CN111723196B (en) Single document abstract generation model construction method and device based on multi-task learning
CN110119765B (en) Keyword extraction method based on Seq2Seq framework
CN109472031B (en) Aspect level emotion classification model and method based on double memory attention
CN110348016B (en) Text abstract generation method based on sentence correlation attention mechanism
Oord et al. Representation learning with contrastive predictive coding
CN109214003B (en) The method that Recognition with Recurrent Neural Network based on multilayer attention mechanism generates title
CN111026869B (en) Method for predicting multi-guilty names by using sequence generation network based on multilayer attention
CN111897908A (en) Event extraction method and system fusing dependency information and pre-training language model
CN112926303B (en) Malicious URL detection method based on BERT-BiGRU
CN111858932A (en) Multiple-feature Chinese and English emotion classification method and system based on Transformer
CN109190131A (en) A kind of English word and its capital and small letter unified prediction based on neural machine translation
CN109062897A (en) Sentence alignment method based on deep neural network
CN109062910A (en) Sentence alignment method based on deep neural network
CN112559730B (en) Text abstract automatic generation method and system based on global feature extraction
CN113239663B (en) Multi-meaning word Chinese entity relation identification method based on Hopkinson
Zhang et al. Exploring deep recurrent convolution neural networks for subjectivity classification
Yu et al. Neural network language model compression with product quantization and soft binarization
CN114757183A (en) Cross-domain emotion classification method based on contrast alignment network
Zhang et al. A hierarchical attention seq2seq model with copynet for text summarization
TWI724644B (en) Spoken or text documents summarization system and method based on neural network
Zhao et al. Generating summary using sequence to sequence model
CN113255344B (en) Keyword generation method integrating theme information
CN114048749B (en) Chinese named entity recognition method suitable for multiple fields
CN109992774A (en) The key phrase recognition methods of word-based attribute attention mechanism
CN112613316B (en) Method and system for generating ancient Chinese labeling model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant