CN111723196B

CN111723196B - Single document abstract generation model construction method and device based on multi-task learning

Info

Publication number: CN111723196B
Application number: CN202010435810.5A
Authority: CN
Inventors: 蔡晓妍; 刘森; 戴航; 杨黎斌
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2020-05-21
Filing date: 2020-05-21
Publication date: 2023-03-24
Anticipated expiration: 2040-05-21
Also published as: CN111723196A

Abstract

The invention discloses a method and a device for constructing a single document abstract generation model based on multi-task learning. Wherein, the two coders are divided into word-level coders and sentence-level coders; multi-task training is made possible; the attention distribution of the decoder is more reasonable, and regular terms calculated according to the attention weight of the decoder and the importance weight of the sentence are added, so that the sentence which is more suitable for being used as the abstract can be distributed with more weights; and model potentials are explored through different tasks, so that the readability and the information quantity of the generated abstract sentences are improved.

Description

Single document abstract generation model construction method and device based on multi-task learning

Technical Field

The invention relates to a method and a device for constructing a single document abstract generating model, in particular to a method and a device for constructing a single document abstract generating model based on multi-task learning.

Background

With the arrival of the big data era, the quantity of text information which can be obtained by people is increased rapidly, and the vigorous development of the internet also widens the channel for obtaining novel and various text contents. News media, social networks and the like output massive text information everyday in large quantities, which inevitably causes the problem of information overload for people. Furthermore, the amount of text information is limited in the content of novelty, and there is a large amount of redundant content, and people cannot deal with the amount of text information with limited effort or discriminate valuable information from a large amount of text that is of personal interest. Therefore, it is an urgent need to extract concise and clear important information from the text and provide the reader with reference. However, the process of writing abstracts for the articles is very tedious and inefficient, which causes waste of labor cost, namely the existence value of the text abstraction task. The method can summarize the information in the original document by efficiently and accurately summarizing the original document through the text automatic summarization technology.

The automatic text summarization technique is distinguished according to the type of a source document as input, and can be divided into a single-document text summary and a multi-document text summary. The automatic text summarization technique is distinguished according to the generation mode of the summary sentence, and can be divided into an extraction type text summary and a generation type text summary. In recent years, with the further development of deep learning technology, deep learning models are also widely adopted to cope with automatic summarization tasks. The extraction type abstract model and the generation type abstract model can be conveniently fused into a comprehensive model based on deep learning, and the document abstract problem is further solved better.

The 'two-stage abstract model' in the prior art needs an extraction model with good identification capability and coding capability for key information, otherwise important information is lost in the extraction stage. In addition, the generative model is required to have a good sentence compression capability. And the operation of extraction is not conductive, and the gradient cannot be propagated back to the extraction model from the result of the generation model, so that the extraction model and the generation model cannot be trained jointly during training.

Disclosure of Invention

The invention aims to provide a method and a device for constructing a single document abstract generation model based on multi-task learning, which are used for solving the problems of unreasonable attention distribution, poor model generalization performance, poor readability of generated abstract sentences and small contained information amount in the single document abstract generation method in the prior art.

In order to realize the task, the invention adopts the following technical scheme:

compared with the prior art, the invention has the following technical effects:

1. the model structure designed in the method and the device for constructing the single document abstract generation model based on the multitask learning adopts a Transformer basic model, and an encoder and a classifier are added at the output end of the encoder. Wherein, the two coders are divided into word-level coders and sentence-level coders; multi-task training is made possible; the attention distribution of the decoder is more reasonable, and regular terms calculated according to the attention weight of the decoder and the importance weight of the sentence are added, so that the sentence which is more suitable for being used as the abstract can be distributed with more weights;

2. according to the method and the device for constructing the single-document abstract generation model based on multi-task learning, the traditional training mode is changed, namely a plurality of training targets are used for training the same model, the final model is still a generative automatic abstract model, the abstract model is added only for improving the training effect, and the model potential is excavated through different tasks, so that the readability and the information content of the generated abstract sentence are improved.

Drawings

FIG. 1 is a schematic diagram of a conventional Transformer model in the prior art;

fig. 2 is a schematic structural diagram of a single-document abstract generation model based on multi-task learning according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the drawings and examples. So that those skilled in the art can better understand the present invention. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the main content of the present invention.

The following definitions or conceptual connotations related to the present invention are provided for illustration:

the embedded representation: since the text cannot be directly processed by the computer, a feature representation of the text needs to be found. The feature vector corresponding to each word is found in a table look-up index mode, namely a mode of embedding the words in a high-dimensional space into a continuous low-dimensional vector space is also called word embedding. The obtained feature representation of the word is the embedded representation of the word.

Transformer encoder: the Transformer encoder is obtained by sequentially connecting a multi-head self-attention module, a residual error link and layer standardization module, a position feedforward module and a residual error link and layer standardization module in series.

A Transformer decoder: the Transformer decoder is obtained by sequentially connecting a multi-head self-attention module, a residual link and layer standardization module, a multi-head external attention module, a residual link and layer standardization module, a position feedforward module and a residual link and layer standardization module in series.

Label _ smoothening loss function: label Smoothing Regularization (Label Smoothing Regularization) is used to improve the over-fitting problem caused when cross-entropy loss functions are computed with One-Hot Vector (One-Hot Vector) labels in the classification problem. And performing label smoothing and regularization, namely smoothing the one-hot vector labels, and then calculating the cross entropy.

KL divergence: the KL Divergence (Kullback-Leibler Divergence) is used to quantify the difference between two probability distributions, also called relative entropy.

Focal local Loss function: focal local was first applied to the task of target detection in the field of computer vision. In this task, there are too many negative examples in the sample compared to the positive examples, and many are easily classified. Focal local reduces the weight of samples which are easy to classify by increasing the weight of the positive examples of difficult classification, so that the model training process can focus more on the samples which are difficult to classify.

Example one

The embodiment discloses a method for constructing a single document abstract generation model based on multi-task learning.

The method is executed according to the following steps:

step 1, obtaining a plurality of sections of texts to obtain a text data set; each piece of text comprises a plurality of sentences, each sentence comprising a plurality of words;

obtaining an abstract corresponding to each text segment to obtain a first label set;

obtaining the correlation between each section of text and the corresponding abstract, and obtaining a second label set;

due to the powerful parallel capability of the model proposed by this patent, it should be possible to process longer texts. After simple data preprocessing, the model provided by the patent can be migrated to a Chinese text summarization task, and has excellent summarization capability.

In this embodiment, the text is:

[-lrb-cnn-rrb-ahmed farouq did n't have the prestige of fellow al qaeda figure osama bin laden,the influence of anwar al-awlaki,or the notoriety of adam gadahn.

still,he was a big deal.

that's the assessment of multiple sources on a man who may not have been well-known in the west,but nonetheless had a special role in the terrorist group.

farouq--an american--died in a u.s.counterterrorism airstrike in january,according to the white house.

two al qaeda hostages,warren weinstein of the united states and giovanni lo porto from italy,were killed in the same strike,while gadahn died in another u.s.operation that month.

before that,farouq was the deputy emir of al qaeda in the indian subcontinent,or aqis,a branch of the islamist extremist group that formed in recent years.

the branch made its presence known in september 2014,when militants infiltrated pakistan's navy and tried to hijack one of its ships,according to the site institute,which monitors terror groups.

the group's spokesman,usama mahmoud,on twitter compared the pakistani naval officers involved in the attempted hijacking to nidal hasan,site reported.

hasan is the u.s.army psychiatrist sentenced to death for killing 13people at fort hood,texas.

osama mehmood,a spokesman for al qaeda in the indian subcontinent,said that farouq and another top figure,qari abdullah mansur,were killed in a january 15drone strike in pakistan's shawal valley.

they were senior al qaeda leaders,according to mehmood.

american mouthpiece for al qaeda killed.

cnn's sophia saifi contributed to this report.]

label one (abstract) is:

ahmed farouq was a leader in al qaeda's india branch.he was killed in a u.s.counterterrorism airstrike in january.like adam gadahn,farouq was american and part of al qaeda.

label two (sentence classification label) is:

[0,0,1,0,0,0,0,0,0,1,1,0,0]

step 2, preprocessing each section of text in the text data set to obtain the embedded representation of each word in each section of text;

adding the position of each word in the sentence as a position code into the embedded representation of each word, obtaining a new embedded representation of each word, and obtaining a training set;

in this embodiment, the whole segment of input text in the text data set is first divided into sentence levels to obtain a sequence of sentences { s } ₁ ，s ₂ ，…，s _n }; then for each sentence S _i Dividing the word hierarchy to obtain a word sequence { w _i1 ，w _i2 ，…，w _im }. Then, the coding at word level and the coding at sentence level are carried out on each sample, and then the samples are processed by a downstream classifier or decoder.

In this embodiment, S ₁ ，S ₂ ，……，S _n Representing a divided sentence, S ₁ Represents the first sentence, S ₂ Representing a second sentence, S _n Representing the nth sentence, wherein n is a positive integer, and S represents the sentence;

carrying out word hierarchy division on the ith sentence, wherein i is less than or equal to n, and w _i1 1 st word, w, representing the ith sentence _i2 2 nd word, w, representing the ith sentence _im The mth word representing the ith sentence, m being a positive integer.

In order to construct a fixed vocabulary, in the embodiment, it is also necessary to count vocabularies appearing in a source text in an original data set, select a certain number of high-frequency vocabularies as the fixed vocabulary, index words in the vocabulary, and construct an index dictionary; each word is associated with a vector one-to-one using an indexing dictionary.

In this embodiment, each position has a fixed position code, where pos is the position corresponding to d _model Vector of dimensions (embedded representation of word is also d) _model Vectors of dimensions, so two vectors can be added).

The coding formula mode of the position coding is to use a trigonometric function to calculate each element of the position coding vector corresponding to the pos position one by one. The odd bits in the position-coding vector are calculated using a sine function and the even bits are calculated using a cosine function.

In this embodiment, the source text is:

['england saw their champions trophy title hopes extinguished by germany after suffering a 2-0quarter-final defeat at the tournament in india.',

'olympic gold medallists germany scored late in the second and fourth quarters,through moritz furste and christopher ruhr,to set up a semi-final against australia in bhubaneswar on saturday.',

"bobby crutchley's england side had decent second-half chances to get back on level terms,but an equaliser was beyond them,and ruhr killed off the match with a 57th-minute close-range finish.",

"german player benedikt furk dives to stop the ball as england's nick catlin watches on",

'alexandre de paeuw of germany is challenged by ashley jackson of england',

'germany took the lead on the stroke of half-time when furste struck home into the top right of the net from a penalty corner.',

"they had an opportunity to double their advantage midway through the third quarter but pilt arnold's driven cross from the left sped across goal,with mats grambusch unable to cash in.",

'england looked like levelling in the 49th minute but amid a scramble somehow germany kept the ball from crossing the line,with goalkeeper nicolas jacobi saving from barry middleton and adam dixon also lurking.',

"germany's tobias hauke dribbles past england's ashley jackson at kalinga stadium",

"ashley jackson could not drill in from england's first penalty corner,with jacobi making a solid save,which he repeated later on in the match.",

'england were pressing germany hard but in the 58th minute their hopes were finally dashed.',

'a sharp turn from grambusch bought him space to send in a cross that ruhr converted,skilfully lifting the ball high into the net.']

index of source text word:

[101,2563,2387,2037,3966,5384,2516,8069,27705,2011,2762,2044,6114,1037,1016,29624,2692,4284,29624,16294,2389,4154,2012,1996,2977,1999,2634,1012,102,101,4386,2751,28595,2015,2762,3195,2397,1999,1996,2117,1998,2959,7728,1010,2083,28461,6519,13473,1998,5696,21766,8093,1010,2000,2275,2039,1037,4100,29624,16294,2389,2114,2660,1999,1038,6979,27543,26760,2906,2006,5095,1012,102,101,6173,13675,4904,2818,3051,1005,2015,2563,2217,2018,11519,2117,29624,8865,2546,9592,2000,2131,2067,2006,2504,3408,1010,2021,2019,5020,17288,2001,3458,2068,1010,1998,21766,8093,2730,2125,1996,2674,2007,1037,28623,29624,10020,10421,2485,29624,24388,2063,3926,1012,102,101,2446,2447,3841,2098,5480,2102,6519,2243,11529,2015,2000,2644,1996,3608,2004,2563,1005,2015,4172,4937,4115,12197,2006,102,101,16971,2139,6643,13765,2860,1997,2762,2003,8315,2011,9321,4027,1997,2563,102,101,2762,2165,1996,2599,2006,1996,6909,1997,2431,29624,7292,2043,6519,13473,4930,2188,2046,1996,2327,2157,1997,1996,5658,2013,1037,6531,3420,1012,102,101,2027,2018,2019,4495,2000,3313,2037,5056,12213,2083,1996,2353,4284,2021,14255,7096,7779,1005,2015,5533,2892,2013,1996,2187,16887,2408,3125,1010,2007,22281,13250,8286,2818,4039,2000,5356,1999,1012,102,101,2563,2246,2066,2504,2989,1999,1996,25726,3371,2021,13463,1037,25740,5064,2762,2921,1996,3608,2013,5153,1996,2240,1010,2007,9653,9473,6213,2072,7494,2013,6287,17756,1998,4205,11357,2036,24261,1012,102,101,2762,1005,2015,16858,5292,15851,2852,12322,13510,2627,2563,1005,2015,9321,4027,2012,19924,13807,3346,102,101,9321,4027,2071,2025,12913,1999,2013,2563,1005,2015,2034,6531,3420,1010,2007,6213,2072,2437,1037,5024,3828,1010,2029,2002,5567,2101,2006,1999,1996,2674,1012,102,101,2563,2020,7827,2762,2524,2021,1999,1996,5388,2705,3371,2037,8069,2020,2633,18198,1012,102,101,1037,4629,2735,2013,13250,8286,2818,4149,2032,2686,2000,4604,1999,1037,2892,2008,21766,8093,4991,1010,8301,10270,18083,2100,8783,1996,3608,2152,2046,1996,5658,1012,102]

the target abstract is as follows:

"olympic gold medallists germany scored late in the second and fourth quarters<q>bobby crutchley's england side had decent second-half chance<q>moritz furste and christopher ruhr scored for germany"

index of target summary words:

[1,4386,2751,28595,2015,2762,3195,2397,1999,1996,2117,1998,2959,7728,3,6173,13675,4904,2818,3051,1005,2015,2563,2217,2018,11519,2117,29624,8865,2546,3382,3,28461,6519,13473,1998,5696,21766,8093,3195,2005,2762,2]

position of each sentence in the source text:

[0,29,73,125,150,166,196,236,276,297,330,349]

the classification labels of sentences in the source text are:

[0,1,0,0,0,0,0,0,0,0,0,0]

the embedding of the source text is represented as:

tensor([[-0.4228,-0.0648,1.7755,...,-0.0450,-0.1446,0.0397],

[-1.5119,1.5509,0.8044,...,0.1634,-0.2404,-0.0408],

[-1.3100,-0.2735,-0.6489,...,0.7900,-0.4179,0.7393],

[0.0259,1.4194,1.1651,...,-0.0327,0.6284,-0.1963],

[1.3589,1.0458,1.0282,...,0.5275,-1.5156,1.5691],

[0.3824,-0.3591,-2.2065,...,0.7527,-0.3730,0.6119]],

device = 'cuda:0', grad _ fn = < AddcmulBackward >) is a tensor of a [ text sequence length, implicit vector dimension ] shape.

Step 3, taking the training set as input, taking the first label set and the second label set as reference output, and training a neural network;

the neural network comprises a word coding network, a sentence coding network, a decoding network, a full connection layer and an output layer which are arranged in sequence;

the output of the sentence coding network is also connected with a classifier;

the word coding network comprises a plurality of transform encoders connected in series;

the sentence coding network comprises a plurality of transform encoders connected in series;

the decoding network comprises a plurality of Transformer decoders connected in series;

and obtaining a single document abstract generating model.

In the present embodiment, the transform unit is used to implement the part of the layer sequence encoder and decoder in the model. The transform proposed by the article Attention Is All You Need Is a complete encoder-decoder framework, and the present invention improves the original transform so that the encoder part can encode the source text in layers. The specific structure of the conventional Transformer model is shown in fig. 1.

In this embodiment, FIG. 1 shows the internal details of the transform model.

After the Transformer model carries out embedded representation on the input text and codes fixed positions of words at different positions, the input text can be sent to a Transformer coder for coding.

The encoder comprises a multi-head self-attention module for performing attention calculation on input texts. Residual linking refers to adding the computation result from attention to the embedded representation of the input text. And then, carrying out layer standardization, namely, standardizing the characteristic dimension of the data, wherein the specific operation is to subtract the mean value on the characteristic dimension and divide the mean value by the standard deviation on the characteristic dimension.

Two linear mappings are performed in this block over a feed-forward fully-connected network. Namely, mapping the characteristic dimension to a high dimension, and then compressing the characteristic dimension to the original dimension. The residual linking and layer normalization are performed similarly thereafter.

At one end of the decoder, after the output text is embedded and expressed, the fixed position coding of words at different positions is added, and then the output text can be sent to a transform decoder for decoding.

The decoder internally comprises a multi-head self-attention module for performing attention calculation on output texts. Residual linking and layer normalization are the same as in the encoder.

Unlike the encoder, the decoder adds a multi-headed external attention module in order to accept information from the encoder. Likewise, residual linking and layer normalization is performed after the multi-headed external attention layer.

Two linear mappings are performed in this block over a feed-forward fully-connected network. I.e. mapping the feature dimension to a high dimension, and then compressing the feature dimension to the original dimension. The residual linking and layer normalization are performed similarly thereafter.

The output of the decoder is mapped to the dimensionality of the word list through the full connection layer, the probability distribution of the current decoding moment on the word list is obtained through the activation function, and the current decoding output can be obtained by sampling the probability distribution.

The model for generating a single document abstract provided by the invention, as shown in fig. 2, comprises a layer sequence encoder (a word encoding network and a sentence encoding network), a decoding network and a classifier. The specific structure is shown in fig. 2 (for simplicity, the specific structure in the Transformer module is omitted).

In the structure diagram of the generative model provided by the present invention in fig. 2, the structure of the Transformer word encoder and the Transformer sentence encoder are the same as those of the Transformer encoder in fig. 1; and the Transformer decoder in figure 2 is the same as the Transformer decoder in figure 1.

The embedded representation of the input text is coded by a Transformer word coder after adding a fixed position code to obtain a word code. And performing attention calculation on the word code corresponding to each sentence, and compressing the dimension into 1 to obtain the sentence code. And (4) coding the sentences by a Transformer sentence coder to obtain the characteristic representation of the sentences.

In the embodiment, on one hand, the feature representation of the sentence is sent to a classifier to obtain the classification result of the abstraction task; on the other hand, it is passed to the Transformer decoder for calculation of the extrinsic attention.

The classifier is internally composed of a linear full-connection layer and an activation function layer, and the probability distribution of the classification label is calculated by using the full-connection layer to compress characteristic dimensions and using the activation function layer. Therefore, the classification result of the abstraction task can be calculated.

The structure and function of the part of the transform decoder in fig. 2 are completely the same as those of the transform decoder in fig. 1, and therefore are not described again.

In the layer sequence encoder (word encoding network and sentence encoding network), let document d = { s = { s } ₁ ，s ₂ ，...，s _n }, sentence s _i ＝{w _i1 ，w _i2 ，...，w _im In which w _ij The jth word in the ith sentence in the source document representing a sample. Here Enc is used ^word Representing word-coded networks, using Enc ^sent Representing a sentence coding network.

The representation of the words in each sentence is obtained using a word encoding network:

(e _i1 ，e _i2 ，...，e _im )＝Enc ^word (w _i1 ，w _i2 ，...，w _im ) (1)

given a vector q as a query, enc ^word The output word representation is used as key and value, and the multi-head attention layer is sent to obtain the uncoded sentence representation

This layer is denoted as Attn ^sent 。

The sentence code is sent into the sentence code network to obtain the coded sentence vector representation

In decoding networks, the partial sequences already generated are known

The decoding instant at instant t will be based on the partial decoding sequence @>

And encoded sentence vector representation

And decoding is carried out.

First, by self-attention layer pair

Coding is carried out to obtain a coding vector of a decoding partial sequence

Then encoded by an external attention layer to

As query, to->

As keys and values, attention vectors are calculated.

By means of a generator, encoding the vector

The vector dimension of (a) is mapped to the word list dimension, and then the probability distribution on the word list is obtained through the softmax function>

(one dimension is a vector of word list sizes). According to probability distribution on word list

Sampling is carried out, and decoding output at the time t can be obtained.

The loss function uses a Label _ smoothening loss function, wherein

Is a predicted label of a word, and $ y ^ { word } $ is a real label of the word corresponding to the target abstract:

the external attention of the decoder is recorded, the attention distribution attndit is averaged over the sentences, and the KL divergence is calculated from the importance distribution ScoreDist of the sentences.

L _kl ＝KL_Divergence(AttnDist，ScoreDist) (8)

The classifier is implemented as an encoder, and the basic structure is a Transformer encoder, which is herein denoted as Enc ^cls . The layer accepts the output of the sentence decoder and performs a self-attention calculation to obtain a sentence representation.

The dimensionality of the hidden layer is compressed by the linear layer, and then mapped into binary probability distribution through a softmax function.

The Loss function uses Focal Loss, where

As predictive labels for sentences, y ^sent For sentence true tags:

wherein, the calculation formula of the Focal local is as follows:

optionally, when the neural network is trained in step 3, a multi-task learning method is adopted;

wherein the loss function L is L = L _ls +γL _fl +λL _kl Wherein γ and λ are both weight parameters, γ and λ are both greater than 0,

y ^word a predictive tag representing a word, <' > or>

The real Label representing the word or words, label _ smoothening _ loss () represents the Label _ smoothening loss function, which is greater than or equal to>

y ^sent Predictive tag for representing a sentence, <' > based on a time period>

A true tag representing a sentence, focal _ Loss () representing a Focal local Loss function, L _kl = KL _ subvangece (attndit, scoreDist), attnDist represents the average attention distribution of a sentence, scoreDist represents the importance distribution of a sentence, KL _ subvangece () represents KL Divergence calculation.

The automatic abstract model mainly utilizes Multi-Task Learning (MTL) to train the abstract model, namely, an extraction type abstract Task is added on the basis of the traditional generative abstract model to serve as an auxiliary training target, so that the model has generalization.

Loss function L = L _ls +γL _fl +λL _kl Wherein, gamma and lambda are two hyper-parameters which need to be manually set and are used as weights for limiting the classification task and the condition constraint, and both gamma and lambda are larger than 0.

In the method for constructing a single-document abstract generation model based on multi-task learning in this embodiment, an extraction model is used as an Auxiliary object (automatic Objective) for joint training, so that an encoder of the generation model has generalization capability, and an external Attention module (Target-to-Source Attention) of a decoder allocates more Attention weights to key sentences.

Example two

The method for generating the single document abstract based on the multitask learning is characterized in that a single document to be abstracted is input into a single document abstract generating model obtained by the method for constructing the single document abstract generating model based on the multitask learning in the first embodiment, and an abstract is obtained.

In the present embodiment, a commonly used Evaluation index for text summarization task is ROUGE (call-organized outstanding for learning Evaluation). The ROUGE evaluation index is commonly used in the fields of automatic document summarization and machine translation, and an index for measuring the similarity between the generated text and a target sample in a data set is provided by comparing the generated text with the target sample in the data set. The ROUGE evaluation criteria are divided into two categories: ROUGE-N and ROUGE-L.

After reading the source document, the linguistic experts generally summarize and compile the summary of the source document in a manual mode; then the abstract generated by the model

And comparing the standard abstract y, and counting the number of the co-occurring N-grams (N-grams) in the two groups of data to serve as an evaluation index of the text abstract quality. By comparing with the multi-expert artificial abstract, the effectiveness of the evaluation system can be improved.

N in the ROUGE-N represents that a statistical unit is N-gram, and the value of N can be 1,2 and the like. The ROUGE-1 calculates the repetition rate of the single words (Uni-gram) in the native formation abstract of the target abstract; the ROUGE-2 calculates the repetition rate of binary phrases (Bi-grams) in the native generated digest of the target digest. The calculation formula is as follows:

the ROUGE-L calculates the Longest Common subsequence (Longest Common Sub-Sequence) between the standard digest and the generated digest, and the calculation formula is defined as follows: the recall rate is

Has an accuracy of->

An F1 score of->

Traditional algorithms often train a recurrent neural network with an objective function of a generative summarization task to achieve summarization. The multitask learning single document abstract generation model construction device provided by the invention is obviously different from the traditional method.

1. The invention chooses to use a Transformer module to construct the model. The multi-head attention mechanism in the Transformer module can not only perform parallel computation, but also capture long-distance dependency relationship and better encode the context. The method not only improves the overall calculation efficiency of the model, but also improves the accuracy of input text information processing.

2. The present invention uses a training approach of multi-task learning instead of a single training goal. The extraction type abstract task related to the generation type abstract task (main task) is used as an auxiliary task to train the model, the generalization capability of the model is improved, so that the method is suitable for more various input texts and ensures the quality of the abstract result.

3. The invention adds a constraint term on the training target, and utilizes the relation between the attention distribution of a decoder and the importance of sentences. The method also heuristically enables the model to pay more attention to the important sentences during decoding, so that the significance information amount of the summary result can be improved.

EXAMPLE III

The embodiment provides a single document abstract generation model construction device based on multitask learning, which comprises a data acquisition module, a preprocessing module and a model construction module;

the data acquisition module is used for acquiring a plurality of sections of texts and acquiring a text data set; each piece of text comprises a plurality of sentences, and each sentence comprises a plurality of words;

obtaining the correlation between each text and the corresponding abstract to obtain a second label set;

the preprocessing module is used for preprocessing each section of text data in the text data set to obtain the embedded representation of each word in each section of text;

the model building module is used for taking the training set as input, taking the first label set and the second label set as reference output and training the neural network;

the output of the sentence coding network is also connected with a classifier;

and obtaining a single document abstract generating model.

Optionally, when the model building module trains the neural network, a multi-task learning method is adopted;

y ^word a predictive tag representing a word, <' > or>

y ^sent A predictive tag representing a sentence, <' > or>

Example four

The embodiment provides a single document abstract generating device based on multitask learning, which is characterized by comprising a data acquisition module and an abstract generating module;

the data acquisition module is used for acquiring a single document of the abstract to be extracted;

the abstract generating module is used for inputting the single document of which the abstract is to be extracted into the single document abstract generating model obtained by the single document abstract generating model establishing device based on multi-task learning in the third embodiment to obtain the abstract.

Through the description of the above embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus necessary general hardware, and certainly may also be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present invention may be substantially implemented or a part of the technical solutions contributing to the prior art may be embodied in the form of a software product, where the computer software product is stored in a readable storage medium, such as a floppy disk, a hard disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Claims

1. A method for constructing a single document abstract generation model based on multitask learning is characterized by comprising the following steps:

step 1, obtaining a plurality of sections of texts to obtain a text data set; each text segment comprises a plurality of sentences, and each sentence comprises a plurality of words;

the output of the sentence coding network is also connected with a classifier;

obtaining a single document abstract generating model;

when the neural network is trained in the step 3, a multi-task learning method is adopted;

wherein the loss function L during training is L = L _ls +γL _fl +λL _kl Wherein γ and λ are both weight parameters, γ and λ are both greater than 0,

y ^word a predictive tag that represents a word or words,

the true Label of the word, label _ smoothening _ loss, label _ smoothening loss function,

y ^sent a predictive tag that represents a sentence is presented,

real tags representing sentences, focal _ Loss representing Focal local Loss function, L _kl = KL _ subvangece (attndit, scoreDist), attnDist denotes the average attention distribution of sentences, scoreDist denotes the importance distribution of sentences, KL _ subvangement denotes the KL Divergence calculation.

2. A single document abstract generating method based on multitask learning is characterized in that a single document to be abstracted is input into a single document abstract generating model obtained by the single document abstract generating model building method based on the multitask learning according to claim 1, and an abstract is obtained.

3. A single document abstract generation model construction device based on multi-task learning is characterized by comprising a data acquisition module, a preprocessing module and a model construction module;

the data acquisition module is used for acquiring a plurality of sections of texts to acquire a text data set; each text segment comprises a plurality of sentences, and each sentence comprises a plurality of words;

the model building module is used for taking the training set as input, taking the first label set and the second label set as reference output and training a neural network;

the output of the sentence coding network is also connected with a classifier;

obtaining a single document abstract generating model;

when the model building module trains the neural network, a multi-task learning method is adopted;

y ^word a predictive tag that represents a word or words,

a true Label representing a word, label _ smoothing _ loss representing a Label _ smoothing loss function,

y ^sent a predictive tag that represents a sentence is used,

real tags representing sentences, focal _ Loss representing Focal local Loss function, L _kl ＝KL_Divergence(AttnDist,ScoreDist), attnDist represents the average attention distribution of a sentence, scoreDist represents the importance distribution of a sentence, and KL _ Divergene represents KL Divergence calculation.

4. A single document abstract generating device based on multitask learning is characterized by comprising a data acquisition module and an abstract generating module;

the abstract generating module is used for inputting a single document to be abstracted into a single document abstract generating model obtained by the single document abstract generating model building device based on multitask learning according to claim 3, so as to obtain an abstract.