CN113434664A

CN113434664A - Text abstract generation method, device, medium and electronic equipment

Info

Publication number: CN113434664A
Application number: CN202110741468.6A
Authority: CN
Inventors: 司世景; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2021-09-24
Anticipated expiration: 2041-06-30
Also published as: CN113434664B

Abstract

The application relates to the field of natural language processing, and discloses a text abstract generating method, a text abstract generating device, a text abstract generating medium and electronic equipment. The method comprises the following steps: acquiring a text data set; training based on a text data set to obtain a text abstract generation model; inputting a target text to be abstracted into a text abstract generating model; outputting attention information of each word in a target text through a coding unit in a text abstract generation model, wherein the attention information of each word in the target text is obtained through calculation according to a key value vector corresponding to a part word in the target text; outputting, by a decoding unit in a text digest generation model, decoding feature information according to the attention information from the encoding unit; and acquiring a summary text which is generated and output by an output unit in the text summary generation model according to the decoding characteristic information and corresponds to the target text. The method can realize the efficient automatic abstract generation of the texts such as the thesis, the novel and the like under the condition of limited calculation capacity and memory.

Description

Text abstract generation method, device, medium and electronic equipment

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a text summary generation method, apparatus, medium, and electronic device.

Background

In an information explosion age, an information processing technology which helps people to quickly screen and utilize effective information is important in the face of an information overload phenomenon. The conventional TextRank, BertSum algorithm has been able to complete this task, but when the text length is long, simple selection of the original sentence is not enough to summarize the full text. Generative Summarization (Generative Summarization) generates new summary sentences more generally, and records semantic information of the text to generate a short and accurate summary. The text can be well summarized by adding a sequence of an attention mechanism into a sequence mapping model (seq2seq + attribute), but the model needs to be combined with RNN for use, and the output of each time step depends on the output of the previous time step, so that the model has no way to be parallel and has low efficiency; the multi-head attention mechanism and the bidirectional Encoder (Encoder) in the transform-based Bert model make pre-training more effective, but have great limitation on the length of the input text, and the computational complexity of the global attention matrix and the requirement on the memory are in the square level of the length of the input text, so the computational power and memory requirements of the scheme are also sharply increased.

Disclosure of Invention

In order to solve the above technical problems in the technical field of natural language processing, the present application aims to provide a text summary generation method, apparatus, medium and electronic device.

According to an aspect of the present application, there is provided a text summary generation method, including:

acquiring a text data set, wherein the text data set comprises a plurality of pre-training texts and a standard abstract text corresponding to each pre-training text;

training based on the text data set to obtain a text abstract generating model, wherein the text abstract generating model comprises a coding unit, a decoding unit and an output unit;

inputting a target text to be subjected to abstract generation into the text abstract generation model;

outputting attention information of each word in the target text through a coding unit in the text abstract generation model, wherein the attention information of each word in the target text is obtained through calculation according to a key value vector corresponding to a part word in the target text;

outputting, by a decoding unit in the text digest generation model, decoding feature information according to the attention information from the encoding unit;

and acquiring a summary text which is generated and output by an output unit in the text summary generation model according to the decoding characteristic information and corresponds to the target text.

According to another aspect of the present application, there is provided a text summary generation apparatus, the apparatus including:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is configured to acquire a text data set, and the text data set comprises a plurality of pre-training texts and standard abstract texts corresponding to each pre-training text;

the training module is configured to obtain a text abstract generating model based on the text data set training, wherein the text abstract generating model comprises an encoding unit, a decoding unit and an output unit;

the input module is configured to input a target text of the abstract to be generated into the text abstract generation model;

the encoding module is configured to output attention information of each word in the target text through an encoding unit in the text abstract generation model, wherein the attention information of each word in the target text is obtained through calculation according to a key value vector corresponding to a part of word in the target text;

a decoding module configured to output, by a decoding unit in the text digest generation model, decoding feature information according to the attention information from the encoding unit;

and the second acquisition module is configured to acquire the abstract text which is generated and output by the output unit in the text abstract generation model according to the decoding characteristic information and corresponds to the target text.

According to another aspect of the present application, there is provided a computer readable program medium storing computer program instructions which, when executed by a computer, cause the computer to perform the method as previously described.

According to another aspect of the present application, there is provided an electronic device including:

a processor;

a memory having computer readable instructions stored thereon which, when executed by the processor, implement the method as previously described.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

for the text abstract generation method, the text abstract generation device, the text abstract generation medium and the electronic equipment, the text abstract generation method comprises the following steps: acquiring a text data set, wherein the text data set comprises a plurality of pre-training texts and a standard abstract text corresponding to each pre-training text; training based on the text data set to obtain a text abstract generating model, wherein the text abstract generating model comprises a coding unit, a decoding unit and an output unit; inputting a target text to be subjected to abstract generation into the text abstract generation model; outputting attention information of each word in the target text through a coding unit in the text abstract generation model, wherein the attention information of each word in the target text is obtained through calculation according to a key value vector corresponding to a part word in the target text; outputting, by a decoding unit in the text digest generation model, decoding feature information according to the attention information from the encoding unit; and acquiring a summary text which is generated and output by an output unit in the text summary generation model according to the decoding characteristic information and corresponds to the target text.

Under the method, the coding unit arranged in the text abstract generation model only calculates the attention information of each word in the target text according to the key value vectors corresponding to partial words in the target text, and does not need to calculate the attention information of each word according to the key value vectors corresponding to all the words; therefore, the calculation amount of attention information is reduced, the memory consumption is also reduced, the limitation of text summarization tasks on the length of the text and the high dependence on the calculation force can be reduced, and the efficient automatic summarization generation of the text such as a thesis and a novel can be realized under the condition of limited calculation force and memory.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a system architecture diagram illustrating a text summary generation method in accordance with an exemplary embodiment;

FIG. 2 is a flow diagram illustrating a text summary generation method in accordance with an exemplary embodiment;

FIG. 3 is a model architecture diagram of a text classification model shown in accordance with an exemplary embodiment;

4A-4D are diagrams illustrating an inter-word attention calculation relationship in accordance with an exemplary embodiment;

FIG. 5 is a schematic workflow diagram illustrating a text summary generation method in accordance with an exemplary embodiment;

FIG. 6 is a block diagram illustrating a text summary generation apparatus in accordance with an exemplary embodiment;

FIG. 7 is a block diagram illustrating an example of an electronic device implementing the text summary generation method described above, according to an example embodiment;

fig. 8 is a program product for implementing the text summary generation method according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

Furthermore, the drawings are merely schematic illustrations of the present application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities.

The application firstly provides a text abstract generating method. The text abstract generating method is a method capable of generating a corresponding abstract according to a text sequence, where the text sequence may be in various forms, such as a sentence, a paragraph, an article, or even a book. The abstract finally generated by the method is also a text sequence. The summary is actually generated by summarizing the text sequence, and therefore the length of the generated summary is typically lower than the length of the text sequence on which the summary is generated. The text abstract generation method can realize high-efficiency automatic abstract generation on the texts such as thesis, novels and the like.

The implementation terminal of the present application may be any device having computing, processing, and communication functions, and the device may be connected to an external device for receiving or sending data, and specifically may be a portable mobile device, such as a smart phone, a tablet computer, a notebook computer, a pda (personal Digital assistant), or the like, or may be a fixed device, such as a computer device, a field terminal, a desktop computer, a server, a workstation, or the like, or may be a set of multiple devices, such as a physical infrastructure of cloud computing or a server cluster.

Optionally, the implementation terminal of the present application may be a server or a physical infrastructure of cloud computing.

Fig. 1 is a system architecture diagram illustrating a text summary generation method according to an exemplary embodiment. As shown in fig. 1, the system architecture includes a personal computer 110, a server 120 and a database 130, and the personal computer 110 and the server 120 and the database 130 are connected via communication links, which can be used to send or receive data. The server 120 is an implementation terminal in the present embodiment, on which an initialization model is deployed, and the database 130 stores a text data set. When the text abstract generating method provided by the present application is applied to the system architecture shown in fig. 1, one process may be as follows: first, the server 120 obtains a text data set from the database 130; then, the server 120 trains the initialization model by using a text data set to obtain a text abstract generating model, wherein the text abstract generating model comprises a multilayer encoder and a multilayer decoder, and each layer of decoder comprises an attention sparse attention module for calculating a current word in a current text according to a key value vector corresponding to a part word in the current text; then, the personal computer 110 submits the target text of the abstract to be generated to the server 120; finally, after obtaining the target text, the server 120 inputs the target text into the trained text abstract generating model, and finally obtains the abstract text output by the text abstract generating model, and the server 120 may also return the abstract text to the personal computer 110.

It is worth mentioning that fig. 1 is only one embodiment of the present application. Although the implementing terminal is a server and the source terminal of the target text is a personal computer in the present embodiment, in other embodiments, the implementing terminal and the source terminal of the target text may be various terminals or devices as described above; although in the present embodiment, the text data sets for the target text and the training model are both from a terminal device other than the implementing terminal, in other embodiments or specific applications, the text data sets for the target text or the training model may be pre-stored locally in the implementing terminal. The present application is not limited in this respect, and the scope of protection of the present application should not be limited thereby.

Fig. 2 is a flow diagram illustrating a text summary generation method in accordance with an exemplary embodiment. The text summary generation method provided in this embodiment may be executed by a server, as shown in fig. 2, and includes the following steps:

step 210, a text data set is obtained.

Wherein the text data set comprises a plurality of pre-training texts and a standard abstract text corresponding to each pre-training text.

The pre-training text can be texts with various lengths, and the standard abstract text is a text obtained by summarizing and extracting the pre-training text.

In one embodiment of the present application, the standard digest text is text that is manually set by an expert based on pre-training text.

In an embodiment of the application, the acquiring the text data set includes:

acquiring a plurality of webpage data;

performing data cleaning on the webpage data to obtain cleaned text data;

and establishing a text data set based on the cleaned text data.

And the data cleaning of the webpage data is a process for removing invalid information in the webpage data.

The web page data may be, for example, encyclopedia page data in a web page; or novel page data on a novel website; or news page data on a news website.

In an embodiment of the application, the performing data cleaning on the webpage data to obtain cleaned webpage data includes: and removing the website and the label in the webpage data to obtain the cleaned webpage data.

For example, the web address in the web page data may include a URL and a hyperlink, and the tag in the web page data may be an HTML tag, which are symbols that are not needed for training the text summary to generate the model, and these data are data that cannot be used for training the model, and therefore need to be removed to implement data cleansing.

In other embodiments of the present application, other irrelevant symbolic contents of the web page data may also be removed.

In an embodiment of the application, the performing data cleaning on the webpage data to obtain cleaned webpage data includes: acquiring the subject content of the webpage data; and according to the theme content of the webpage data, washing the symbol content which is irrelevant to the theme content in the webpage data to obtain the washed webpage data.

For example, the subject content of a piece of web page data is social news, but the web page data also includes advertisement information, which is not related to the subject content and needs to be removed.

Through the data cleaning process, the cleaned text data generated by the data cleaning process can be suitable for the training model.

In an embodiment of the application, the creating a text data set based on the cleaned text data includes:

taking a text in the cleaned text data as a pre-training text, and taking a title in the cleaned text data as a standard abstract text corresponding to the pre-training text;

and obtaining a text data set according to the pre-training text and the standard abstract text corresponding to the cleaned text data.

For example, the post-cleaning text data may be a piece of news, which includes a news title and a news body, the news body will be used as pre-training text, and the news title will be used as standard abstract text.

In the embodiment of the application, the pre-training text and the corresponding standard abstract text are automatically generated through data cleaning, so that the generation efficiency of the text data set is greatly improved, and the labor cost caused by manually setting the standard abstract text is also reduced.

And step 220, training based on the text data set to obtain a text abstract generation model.

The text abstract generating model comprises an encoding unit, a decoding unit and an output unit.

Specifically, the encoding unit includes a multi-layer encoder stacked in sequence, the decoding unit includes a multi-layer decoder stacked in sequence, the encoder includes a sparse attention module and a first fully-connected forward network module, the decoder includes a multi-headed full attention module, a multi-headed full attention module and a second fully-connected forward network module shielded intentionally, the multi-headed full attention module in each layer of decoder is connected with the output of the encoding unit, each sub-module in the encoder and the decoder includes a residual connection and normalization layer behind, the sparse attention module calculates attention information of a current word in the current text according to a key value vector corresponding to a participle in the current text, the multi-headed full attention module calculates attention information of the current word in the generated abstract text according to key value vectors corresponding to coding feature vectors of all words in the current text, and the intentionally shielded multi-head full attention module calculates the attention information of the current word in the generated abstract text according to the key value vector corresponding to the word in the generated abstract text. Where the current text is the text being processed by the model, it may be the text that was input into the model during the model training phase or the model use phase.

In an embodiment of the present application, the training to obtain a text summary generation model based on the text data set includes:

iteratively executing a training step until a preset condition is met so as to train to obtain a text abstract generation model, wherein the training step comprises the following steps of:

inputting the pre-training texts in the text data set into an initial text abstract generation model to obtain an output result of the initial text abstract generation model;

calculating a loss value according to the output result and a standard abstract text corresponding to the pre-training text;

and adjusting parameters of the initial text abstract generation model according to the loss value.

The predetermined condition may be that the loss value is less than a predetermined loss value threshold or that the training step is iteratively performed a predetermined number of times.

In one embodiment of the present application, the text digest generation model further includes a word embedding vector generation module and a softmax layer located in the output unit; the word embedding vector generation module is connected with the coding unit and used for generating corresponding word embedding vectors according to the input text; and the softmax layer is used for outputting the corresponding probability of each word.

In an embodiment of the present application, the text abstract generating model further includes an abstract word embedding vector generating module, where the abstract word embedding vector generating module is connected to the decoding unit and configured to generate a corresponding word embedding vector according to an input abstract.

FIG. 3 is a model architecture diagram of a text classification model shown in accordance with an exemplary embodiment. Referring to fig. 3, the text classification model includes N layers of encoders, which are encoding units stacked one on another, and N layers of decoders, which are decoding units stacked one on another. It will be readily appreciated that only one layer of encoder and one layer of decoder are shown in fig. 3 to simplify the representation of the model architecture.

Next, a training process of the text classification model will be described based on fig. 3.

Firstly, each word block sequence w of the pre-training text and the standard abstract text_iAfter being embedded into BERT model by words, the words are respectively mapped into word vectors x_i,y_iRespectively obtaining a numerical text sequence X ═ { X ═ X₁,x₂,x₃…x_n},Y＝{y₁,y₂,y₃…y_n}. In fig. 3, the text classification model is trained using long text as pre-training text. The word embedding BERT model input by the pre-training text is an embedding vector generation module capable of generating corresponding word embedding vector words according to the input text, and the word embedding BERT model input by the standard abstract text is an abstract word embedding vector generation module capable of generating corresponding word embedding vectors according to the input abstract.

The word embedding BERT model is established based on a BERT model, wherein the BERT (bidirectional Encoder retrieval from transformations) model is a language representation model, the main model structure of the model is a multi-layer transformer, and a multi-head attention mechanism is integrated inside the model. The BERT model is a pre-training model that is trained in advance using a large amount of corpus data, and thus has high accuracy.

Next, X, Y are input to the Encoder and Decoder, respectively, for processing. X firstly carries out attention calculation through a sparse attention mechanism, then calculates residual error connection and carries out normalization calculation; the value obtained by the normalization calculation is further input to a fully connected forward network, which is the first fully connected forward network module, then the calculation of residual error concatenation and normalization is repeated, and then the words concerned about input and the words concerned about are input to a Decoder.

The residual join is to calculate the sum of all inputs by using a residual network, for example, the residual join after the sparse attention mechanism is to calculate the sum of output results after X and X pass through the sparse attention mechanism, network degradation can be avoided by using the residual join, and the normalization calculation can perform layer normalization.

Then, after inputting X into an Encoder Encoder for processing, the output of the encoding unit firstly passes through a multi-head full attention mechanism which is covered intentionally, and the calculation of the first round of residual connection and normalization is carried out; then, the output result and the input from the Encoder Encoder pass through a multi-head full attention machine system, and the output result of the multi-head full attention machine system and the output result obtained after the calculation of the first round of residual connection and normalization are subjected to the calculation of the second round of residual connection and normalization; and further inputting an output result obtained after the second round of residual connection and normalization calculation into the fully-connected forward network, and performing the third round of residual connection and normalization calculation.

In the above process, the multi-head full attention device with the intentional shielding is a multi-head full attention module with the intentional shielding, and the multi-head full attention device is a multi-head full attention module, where the multi-head full attention module and the multi-head full attention module with the intentional shielding are similar to the trasnformer in the BERT model.

The Encoder outputs the predicted probability of the first word in the summary, which is the word queried from the lexicon, through a linear function and a regression function, which are located in the output unit.

Then, a loss value is determined based on the output result and Y, and back propagation is performed according to the loss value to adjust parameters of the model.

Finally, when the next word in the abstract is predicted continuously, the predicted word is input into the coding unit, and the coding unit predicts the next word in the abstract according to the word and the input from the coding unit.

It is worth mentioning that although the number of layers of the encoder and the decoder in the text classification model shown in fig. 3 is the same, in other embodiments of the present application, the number of layers of the encoder and the number of layers of the decoder in the text classification model may be different.

In the following, various attention mechanisms in the encoder and decoder will be described.

It can be seen that the architecture in the decoder is similar to that of the encoder, except that deliberately masked multi-head full attention mechanism and multi-head full attention mechanism are used in the decoder, and the mathematical expression of the multi-head full attention mechanism is as follows:

wherein Q, K, and V are Query, Key, and Value parameter matrices, d_kThe dimension number of the Key parameter matrix.

The multi-headed full attention mechanism is actually an integration of multiple self-attention mechanisms, and the deliberately masked multi-headed full attention mechanism differs from the multi-headed full attention mechanism in that the input to the deliberately masked multi-headed full attention mechanism contains the words that have been generated predictively. Therefore, in the model training process, the Decoder performs calculation using key, values of the word of interest of the Encoder and the input word query of the Decoder as inputs. And processing the final output value by a linear function and a scoring function softmax in sequence to obtain the prediction probability of the word w in the lexicon as the t-th word in the abstract.

In the whole training process, when the loss function reaches the minimum value or meets the iteration times, parameters in the model are trained, and a trained text abstract generation model is obtained.

The sparse attention mechanism is described below.

In one embodiment of the present application, the partial word is determined according to at least one of the following ways:

randomly selecting a preset number of words from the words which belong to the current text and are positioned outside the current words as partial words corresponding to the current words;

determining a word corresponding to the covering position of a sliding window corresponding to a current word in an adjacency matrix as a partial word corresponding to the current word, wherein elements in the adjacency matrix are used for indicating whether a key value vector corresponding to one of two corresponding words participates in the calculation of attention information of the other word;

and taking a pre-designated word as the partial word, wherein the key value vectors corresponding to the pre-designated word participate in the calculation of the attention information of all words in the current text, and the key value vectors corresponding to all words in the current text participate in the calculation of the attention information of the pre-designated word.

In one embodiment of the present application, the sparse attention module determines the partial words according to all of the above ways.

That is to say, it is determined that all partial words participate in the calculation of attention of the current word through the three ways described above, and through this way, the comprehensiveness of the word concerned by the sparse attention module can be improved, so that the performance of the model can be improved.

The distribution of attention in the sparse attention module is represented by a directed graph D. D is formed by an attention matrix A epsilon [0,1]^n×nThe description is that the attention matrix A is an adjacent matrix of a directed graph D, nodes in the directed graph D are words, and the attention matrix A describesAttention-calculated relationships between words. If the input word i is concerned with word j in the sentence, i.e. if q_iAnd k_jParticipating in the attention calculation, a (i, j) is 1, otherwise, a (i, j) is 0. The three ways of determining part words described above are random attention, sliding window attention and global attention, respectively, which will be described below in conjunction with fig. 4A-4D.

4A-4D are diagrams illustrating an inter-word attention calculation relationship according to an exemplary embodiment. Shown in FIGS. 4A-4D is the attention matrix A, where the rows are labeled i and the columns are labeled j, if q is_iAnd k_jIf the attention calculation is involved, a lattice of the shaded part is used for representing that the corresponding two words do not participate in the attention calculation.

See fig. 4A, which illustrates random attention, i.e., each word is focused on r randomly selected words. In fig. 4A, r is 2, and each row corresponds to two words, which means that random attention will focus on two other words in the sentence sequence in which the current word is located at random for each word. This simplest random graph construction makes the number of nodes for the shortest path between any two nodes logarithmic, and information can flow quickly between any pair of nodes.

FIG. 4B illustrates a sliding window attention, which can be expressed as:

w is the defined window width. The word at node i will notice

All words in the interval are distributed above the node i

Intervals and below

An interval. As shown in fig. 4B of the drawings,when w is 3, the sliding window attention will focus on the node above for each word

Intervals and below

All words in the interval, i.e.

All words in the interval, this makes the words of interest (i ═ 1, 2 … w) with a sliding window well balanced between the average shortest path and the concept of locality.

FIG. 4C illustrates global attention, i.e., defining some global tokens to which all tokens in the sequence are aware.

The global token may be defined in two ways:

(1) in the internal transform construct, the existing token is globally used for the entire sequence. I.e. defining the index subset g: for i e G, a (i, i) 1 and a (: i) 1

(2) In the extended transform construct, other tokens (such as CLS) are added to make it a global token.

When g global tokens are added, a new matrix B e [0,1 ] is created]^(N+g)×(N+g)And add g rows to matrix a. The mathematical expression is that, for all i ∈ {1, 2,. g }, it satisfies:

B(i，：)＝1，

B(：，i)＝1，

here, the global token is the pre-specified word, and fig. 4C shows the global attention when g is 2, which means that each word in the sentence focuses on two words of the beginning of the sentence, that is, the key value vector corresponding to the two words of the beginning of the sentence participates in the attention calculation of all the words in the current text.

Fig. 4D shows an attention mechanism integrating the three attentions in fig. 4A-4C, and the sparse attention mechanism shown in fig. 4D may be used by the sparse attention module to improve the comprehensiveness of the words focused by the sparse attention module.

Step 230, inputting the target text to be summarized into the text summary generation model.

The target text to be summarized may be a long text which is good for the scheme of the present application, or may be a text of a conventional length.

In one embodiment of the present application, the number of words in the target text exceeds 512.

In the embodiment of the application, the text abstract generation model can generate the abstract for the target text with the number of words exceeding 512, so that the abstract generation can be efficiently performed on the text such as novel, thesis and the like.

And 240, outputting the attention information of each word in the target text through a coding unit in the text abstract generating model.

And calculating the attention information of each word in the target text according to the key value vector corresponding to the part word in the target text.

As described above, the encoding unit includes a plurality of layers of encoders stacked in sequence, and the encoders include a sparse attention module by which attention information is calculated.

And 250, outputting decoding characteristic information according to the attention information from the coding unit by a decoding unit in the text abstract generating model.

The decoding characteristic information is implicit characteristic information output by the decoding unit and used for generating an output result.

In one embodiment of the present application, the decoding unit further outputs the decoding feature information according to a word embedding vector corresponding to the generated digest text from the digest word embedding vector generation module.

And step 260, acquiring the abstract text which is generated and output by the output unit in the text abstract generating model according to the decoding characteristic information and corresponds to the target text.

After the target text to be abstracted is input into the text abstract generating model, a sparse attention mechanism provided by a sparse attention module in the text abstract generating model can partially pay attention to other words in the text sequence when each word in the text sequence of the target text is processed, so that the words in the target text can be better encoded.

Specifically, after a target text T is input into a text abstract generation model, a word embedding BERT model is used for mapping the T to a word embedding space, and a mapping result is passed through an Encoder Encoder to obtain key K of a concerned word_hAnd Value V_h(ii) a Then, all word blocks pass through a Decoder in parallel at the same time to finally obtain the probability distribution P (w) of the predicted words, determine the words at the T-th moment in the predicted abstract until the words at all moments in the abstract are predicted, and finally obtain the abstract text of the target text T.

In summary, according to the text abstract generation method provided by the embodiment, the sparse attention module is arranged in the text abstract generation model, and the sparse attention module can calculate the attention information of the current word according to the key value vectors corresponding to the participles in the current text, and does not need to calculate the attention information of the current word according to the key value vectors corresponding to all the words; therefore, the calculation amount of the attention information is reduced, the calculation complexity of the attention mechanism on the text sequence length can be reduced from quadratic to linear, and the memory consumption is also reduced, so that the limitation of the text summarization task on the text length and the high dependence on the calculation force can be reduced, and the efficient automatic summarization generation of the text such as the thesis, the novel and the like can be realized under the condition of limited calculation force and memory; moreover, because the decoder of the decoding unit adopts the multi-head full attention module, the output sequence is shorter, the advantage of multi-head global attention can be exerted to the greatest extent while the efficiency of processing long texts is improved, main information is reserved, and semantic acquisition and abstract formation are realized.

The following describes an overall process of the text summary generation method provided in the embodiment of the present application with reference to fig. 5.

Fig. 5 is a workflow diagram illustrating a text summary generation method according to an exemplary embodiment. Referring to fig. 5, the work flow is: firstly, performing text preprocessing on a pre-training text and a standard abstract text; secondly, pre-training a summary model by utilizing a pre-processing result to obtain a summary generation model; and then, inputting the long text to be summarized into the abstract generation model obtained by training, and performing abstract generation.

The application also provides a text abstract generating device, and the following is an embodiment of the device.

Fig. 6 is a block diagram illustrating a text summary generation apparatus according to an example embodiment. As shown in fig. 6, the apparatus 600 includes:

a first obtaining module 610 configured to obtain a text data set, wherein the text data set includes a plurality of pre-training texts and a standard abstract text corresponding to each pre-training text;

a training module 620 configured to train a text abstract generation model based on the text data set, wherein the text abstract generation model includes an encoding unit, a decoding unit, and an output unit;

an input module 630 configured to input a target text of a summary to be generated to the text summary generation model;

the encoding module 640 is configured to output attention information of each word in the target text through an encoding unit in the text abstract generation model, wherein the attention information of each word in the target text is calculated according to a key value vector corresponding to a part of word in the target text;

a decoding module 650 configured to output, by a decoding unit in the text digest generation model, decoding feature information according to the attention information from the encoding unit;

a second obtaining module 660 configured to obtain a digest text corresponding to the target text, which is generated and output by the output unit in the text digest generation model according to the decoding feature information.

According to a third aspect of the present application, there is also provided an electronic device capable of implementing the above method.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 700 according to this embodiment of the invention is described below with reference to fig. 7. The electronic device 700 shown in fig. 7 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 7, electronic device 700 is embodied in the form of a general purpose computing device. The components of the electronic device 700 may include, but are not limited to: the at least one processing unit 710, the at least one memory unit 720, and a bus 730 that couples various system components including the memory unit 720 and the processing unit 710.

Wherein the storage unit stores program code that can be executed by the processing unit 710 such that the processing unit 710 performs the steps according to various exemplary embodiments of the present invention described in the section "example methods" above in this specification.

The storage unit 720 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)721 and/or a cache memory unit 722, and may further include a read only memory unit (ROM) 723.

The memory unit 720 may also include programs/utilities 724 having a set (at least one) of program modules 725, such program modules 725 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 730 may be any representation of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 700 may also communicate with one or more external devices 900 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 700, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 700 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 750, such as with display unit 740. Also, the electronic device 700 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 760. As shown, the network adapter 760 communicates with the other modules of the electronic device 700 via the bus 730. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 700, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present application.

According to a fourth aspect of the present application, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.

Referring to fig. 8, a program product 800 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A text summary generation method, the method comprising:

2. The method of claim 1, wherein obtaining the text data set comprises:

acquiring a plurality of webpage data;

performing data cleaning on the webpage data to obtain cleaned text data;

and establishing a text data set based on the cleaned text data.

3. The method of claim 2, wherein the data cleaning the web page data to obtain cleaned text data comprises:

and removing the website and the label in the webpage data to obtain the cleaned webpage data.

4. The method of claim 2, wherein the data cleaning the web page data to obtain cleaned text data comprises:

acquiring the subject content of the webpage data;

and according to the theme content of the webpage data, washing the symbol content which is irrelevant to the theme content in the webpage data to obtain the washed webpage data.

5. The method of claim 2, wherein the building a text dataset based on the cleansed text data comprises:

6. The method of any one of claims 1-5, wherein the text summary generation model further comprises a word embedding vector generation module and a softmax layer located in the output unit; the word embedding vector generation module is connected with the coding unit and used for generating corresponding word embedding vectors according to the input text; and the softmax layer is used for outputting the corresponding probability of each word.

7. The method according to any one of claims 1 to 5, wherein the partial words are determined according to at least one of the following:

8. An apparatus for generating a text summary, the apparatus comprising:

9. A computer-readable program medium, characterized in that it stores computer program instructions which, when executed by a computer, cause the computer to perform the method according to any one of claims 1 to 7.

10. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory having stored thereon computer readable instructions which, when executed by the processor, implement the method of any of claims 1 to 7.