CN111723194A

CN111723194A - Abstract generation method, device and equipment

Info

Publication number: CN111723194A
Application number: CN201910203859.5A
Authority: CN
Inventors: 桂敏; 王睿; 田俊峰
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-03-18
Filing date: 2019-03-18
Publication date: 2020-09-29

Abstract

The embodiment of the invention provides a method, a device and equipment for generating an abstract, wherein the method comprises the following steps: obtaining a plurality of first sentences composing a text; respectively carrying out word encoding processing on the plurality of first sentences through a word encoder to obtain first sentence expression vectors; sentence coding processing is carried out on the plurality of first sentence expression vectors through a sentence coder, and first text expression vectors are obtained; decoding the first text expression vector through a statement decoder to obtain a plurality of first statement indication vectors; and respectively decoding the plurality of first statement indication vectors through a word decoder to obtain a plurality of first abstract statements, wherein the text abstract is formed by the plurality of first abstract statements. In the scheme, the sentence structure of the text is used for generating the abstract for the text, so that the generation quality and speed of the abstract are ensured.

Description

Abstract generation method, device and equipment

Technical Field

The invention relates to the technical field of internet, in particular to a method, a device and equipment for generating an abstract.

Background

People are exposed to a great deal of textual information each day, such as news, reports, papers, blogs, etc. In a practical scenario, an author has already written a text (such as a novel) and submitted the written text to a network platform for publishing, and in order to attract more readers to read the text, it is necessary to generate simplified abstract information for the text, so that the readers can first know the main content of the text and then decide whether to open the text for perusal.

In the past, text summaries were written by the author of the text himself, and the author itself was time consuming to write summaries. Some authors may copy only one or a few words from the text as an abstract, which is not good.

Disclosure of Invention

The embodiment of the invention provides a method, a device and equipment for generating an abstract, which are used for realizing automatic generation of the abstract.

In a first aspect, an embodiment of the present invention provides a digest generation method, where the method includes:

obtaining a plurality of first sentences composing a text;

respectively carrying out word encoding processing on the plurality of first sentences through a word encoder to obtain a plurality of first sentence expression vectors corresponding to the plurality of first sentences;

sentence encoding processing is carried out on the plurality of first sentence expression vectors through a sentence encoder, and first text expression vectors corresponding to the text are obtained;

decoding the first text representation vector through a sentence decoder to obtain a plurality of first sentence indication vectors indicating text contents needing to be decoded;

and respectively decoding the plurality of first statement indication vectors through a word decoder to obtain a plurality of first abstract statements, wherein the text abstract is composed of the plurality of first abstract statements.

In a second aspect, an embodiment of the present invention provides an apparatus for generating a summary, where the apparatus includes:

the obtaining module is used for obtaining a plurality of first sentences composing the text;

a word encoder for performing word encoding processing on the plurality of first sentences respectively to obtain a plurality of first sentence expression vectors corresponding to the plurality of first sentences;

a sentence encoder for performing sentence encoding processing on the plurality of first sentence expression vectors to obtain first text expression vectors corresponding to the text;

the sentence decoder is used for decoding the first text representation vector to obtain a plurality of first sentence indication vectors which indicate text contents needing to be decoded;

and the word decoder is used for decoding the plurality of first statement indicating vectors respectively to obtain a plurality of first abstract statements, and the plurality of first abstract statements form an abstract of the text.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a processor and a memory, where the memory stores executable codes, and when the executable codes are executed by the processor, the processor is caused to implement at least the digest generation method in the first aspect.

In a fourth aspect, an embodiment of the present invention provides a non-transitory machine-readable storage medium, on which executable code is stored, and when the executable code is executed by a processor of an electronic device, the processor is enabled to implement at least the digest generation method in the first aspect.

In the embodiment of the invention, a Sequence-to-Sequence (Seq 2Seq) architecture composed of an encoder and a decoder is adopted to perform automatic summary generation processing on a text. Specifically, the text needs to be preprocessed by using the sentence structure of the text to divide the text into several sentences, and the encoder includes a word encoder and a sentence encoder, and the decoder correspondingly includes a sentence decoder and a word decoder. Specifically, for a text for which a summary needs to be generated, sentence division processing is performed on the text according to a file sentence structure, then word encoding processing is performed on words included in each sentence by a word encoder to obtain a sentence expression vector (which may also be referred to as a semantic expression vector) corresponding to each sentence, the obtained sentence expression vectors are sequentially input to the sentence encoder, and the sentence expression vectors are encoded by the sentence encoder to obtain a text expression vector for expressing the text, where the text expression vector includes context information of the text. The text representation vector can be transmitted to a sentence decoder, and the sentence decoder performs decoding processing to obtain a plurality of sentence indication vectors indicating the text content to be decoded, wherein the sentence indication vectors indicate which main contents in the text, namely the text content to be decoded, should be summarized by the sentence at the current moment. Then, the sentence indication vectors are respectively input into a word decoder for decoding, so that a word corresponding to each sentence indication vector is decoded, and the words form a sentence abstract sentence, so that a plurality of abstract sentences obtained in sequence are combined together to obtain the abstract of the text.

In the scheme, the sentence structure of the text is used for carrying out abstract generation processing on the text, so that the context information in the sentence can be captured at the encoding end through a word encoder, and the context information between the sentences, namely the context information of the text, can be captured through a sentence encoder; at the decoding end, the decoding mode from coarse granularity to fine granularity from sentence to word can firstly grasp the main content of the whole abstract at the sentence layer, and then specifically decode a sentence abstract sentence, so that the generation quality of the abstract can be ensured.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart of a summary generation method provided in an exemplary embodiment;

FIG. 2 is a schematic diagram of a digest generation process provided by an exemplary embodiment;

FIG. 3 is a flow chart of a method of model training provided by an exemplary embodiment;

FIG. 4 is a flow chart of a method for model optimization provided in an exemplary embodiment;

FIG. 5 is a schematic diagram of an implementation of the model training method of FIG. 4;

FIG. 6 is a flow chart of another method of model optimization provided in an exemplary embodiment;

FIG. 7 is a schematic diagram of one implementation of the model training method of FIG. 6;

FIG. 8 is a schematic diagram of another implementation of the model training method of FIG. 6;

fig. 9 is a schematic structural diagram of a summary generation apparatus according to an exemplary embodiment;

fig. 10 is a schematic structural diagram of an electronic device corresponding to the digest generation apparatus provided in the embodiment shown in fig. 9.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well. "plurality" generally includes at least two unless the context clearly dictates otherwise.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.

In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.

Before the summary generation method provided by each embodiment of the present invention is described in detail, a scheme for automatically generating a summary in the prior art is described. The automatic generation of the abstract aims to automatically output a brief abstract which retains key information in the abstract for a text through a machine.

Automatic text summarization (automatic text summarization) can be divided into two categories: the extraction type and the generation type, wherein the extraction type abstract is a few sentences composing abstract which can express the main content of the text from the text. The generated abstract is the process of recalling the main content of the text on the basis of understanding the text, and the generated sentences and even words may not exist in the text completely, which is much like people writing the abstract by themselves.

With the development of neural network technology, the generative text abstract based on the neural network is rapidly developed. The generative neural network model may employ a Seq2Seq architecture. The Seq2Seq architecture consists of an encoder (encoder) and a decoder (decoder), the encoder being responsible for encoding the input text into a vector representation (context vector) containing context information, the vector being a representation of the text. The decoder performs digest generation based on the vector representation to obtain a digest of the text. At present, the Seq2Seq architecture generally performs word-by-word encoding or decoding at an encoding end and a decoding end, and at this time, the model encoding and decoding speed is slow, and chapter structure information of a text is not fully utilized.

The following describes the implementation of the summary generation method provided herein with reference to the following embodiments. The summary generation method may be executed by an electronic device, which may be a terminal device such as a PC, a notebook computer, or the like, or a server. The server may be a physical server including an independent host, or may also be a virtual server carried by a host cluster, or may also be a cloud server.

Fig. 1 is a flowchart of a digest generation method according to an exemplary embodiment, and as shown in fig. 1, the method includes the following steps:

101. a plurality of first statements comprising a text is obtained.

102. And respectively carrying out word encoding processing on the plurality of first sentences through a word encoder to obtain a plurality of first sentence expression vectors corresponding to the plurality of first sentences.

103. Sentence encoding processing is carried out on the plurality of first sentence expression vectors through a sentence encoder, and first text expression vectors corresponding to the text are obtained.

104. And decoding the first text representation vector through a sentence decoder to obtain a plurality of first sentence indication vectors indicating text contents to be decoded.

105. And respectively decoding the plurality of first statement indication vectors through a word decoder to obtain a plurality of first abstract statements, wherein the text abstract is formed by the plurality of first abstract statements.

The abstract generation method provided by this embodiment is to generate a corresponding abstract for a text submitted by a user, and a model trained to converge is needed to achieve this purpose.

The model still adopts a Seq2Seq architecture, and only comprises two layers of encoders (also called encoding networks) at an encoding end, namely a word encoder and a statement encoder; the decoding end also comprises corresponding two layers of decoders (also called decoding networks), namely a sentence decoder and a word decoder.

The model structure is set as such because, in the embodiment of the present invention, the sentence structure, which is the hierarchical structure feature of the text, is required to be used for performing the digest generation process for the text. Specifically, the digest generation process is performed according to two granularities, that is, words and sentences.

Assuming that the model is trained, the present embodiment describes the process of actually using the model.

Text submitted by a user (such as a novel, a paper, a blog, etc.) is received, and then the text can be preprocessed to meet the usage requirements of the model. The pre-processing may include: by adding a tag to the beginning and end of a sentence, the model can read data according to the sentence conveniently, i.e. the model can identify each sentence contained in the text based on the tag. Furthermore, word segmentation processing can be performed on each sentence, and each word obtained by word segmentation is converted into a corresponding word vector by combining with the word vector matrix trained by the model. It will be appreciated that when a statement is read into the model for translation into a word vector, the input statement does not include a beginning-to-end identifier.

Specifically, the training sample used for training the abstract generation model may be used as a corpus object for word vector matrix training, but not limited thereto. The word segmentation processing is performed on each corpus object to obtain a plurality of words, and then the plurality of words can be sorted according to the occurrence frequency of each word, for example, the words are sorted from more to less according to the occurrence frequency, assuming that there are N words in total, so that a word list composed of N words sorted in sequence can be generated. In addition, for each word, word vector conversion can be performed according to the existing word vector conversion algorithm, each word is assumed to be represented as a row vector of M dimensions, so that a word vector matrix of N × M dimensions is finally obtained, wherein the ith row word vector of the word vector matrix corresponds to the ith word in the word list. Based on this, any word in the input text can be converted into a corresponding word vector based on the corresponding relationship between the word vector matrix and the word table.

After obtaining each sentence contained in the text and performing word vector conversion on each sentence, word encoding processing may be performed on each sentence. The word encoder may be implemented as any one of Neural networks such as a Recurrent Neural Network (RNN), a Bi-directional Recurrent Neural Network (Bi-RNN), a Long Short Term Memory (LSTM) Network, a Bi-directional Long Short Term Memory (Bi-LSTM) Network, and the like. In order to be able to obtain context information for each word in a sentence to more accurately understand the semantics of the sentence, the word encoder may alternatively employ Bi-RNN or Bi-LSTM.

The output of the word encoder can be regarded as semantic representation corresponding to each statement, and is assumed to be a statement representation vector. The statement representation vector carries context information for the words in the statement.

Each statement representation vector output by the word encoder is input to the statement encoder, so that the statement encoder performs encoding processing on each input statement representation vector to obtain a text representation vector, namely the first text representation vector. That is, the final output of the sentence encoder is used as the representation of the text input, and the text representation vector carries the context information of the text, that is, the context information between the sentences in the text and the context information between the words in each sentence are included.

The sentence encoder may be implemented as any one of neural networks such as RNN, Bi-RNN, LSTM network, Bi-LSTM network, and the like. In order to be able to obtain context information for each statement in a statement, the statement encoder may alternatively employ Bi-RNN or Bi-LSTM.

The text representation vector may then be decoded using the decoded end of the model. Specifically, first, the text representation vector is decoded by a sentence decoder. The sentence decoder captures main contents of a text at a sentence level and generates a semantic representation which represents which contents should be summarized in a summary sentence which the sentence decoder should currently output. Since the semantic representation indicates the text content that should be decoded in the currently output digest sentence, the semantic representation is referred to as the first sentence indication vector.

The statement then instructs the vector to be input into the word decoder for decoding. The word decoder performs word-by-word decoding at the word level according to the semantic representation to obtain each word, and the words form the abstract sentence output at the current moment.

The sentence decoder and the word decoder may be implemented as RNN, LSTM network, Convolutional Neural network (CNN for short).

To facilitate understanding of the above summary generation process, the summary generation process is schematically described below in conjunction with the model structure illustrated in fig. 2.

In fig. 2, the word encoder and the sentence encoder at the encoding end of the abstract generation model are both composed of a Bi-LSTM network. Taking the word encoder as an example, the arrows between adjacent LSTM networks in the figure respectively illustrate the hidden layer transfer directions of the forward LSTM network and the backward LSTM network.

Suppose that a sentence a in the input text contains three words, x1, x2 and x3, and the three words are input to the word encoder after being converted by the word vector. Assuming that the hidden states of the three words are sequentially updated to Haf1, Haf2 and Haf3 after the forward LSTM network sequentially encodes the three words, and the hidden states of the three words are sequentially updated to Had1, Had2 and Had3 after the backward LSTM network sequentially encodes the three words, then the hidden state Haf3 at the last time of the forward LSTM network and the hidden state Had3 at the last time of the backward LSTM network can be spliced together { Haf3 and Had3} as the semantic representation of the sentence a, because Haf3 and Had3 contain all information in the forward and backward directions, that is, the sentence representation vector of the sentence a is Ha { Haf3 and Had3}, where the word group represents the splice symbol.

Similarly, for the words B ═ y1, y2, y3, y4, after the word encoder process, as shown in fig. 2, it is assumed that the word expression vector corresponding to the word B is Hb ═ Hbf4 ≦ Hbd 4.

Since the output of the word encoder is input to the sentence encoder, the sentence representative vector of the sentence a and the sentence representative vector of the sentence B output by the word encoder may be input to the sentence encoder as shown in fig. 2.

The processing procedure of the sentence encoder on the input sentence expression vectors is similar to the processing procedure of the word encoder on the input words, and is not repeated. In fig. 2, it is assumed that the term encoder encodes a plurality of term expression vectors that include all information of the input text to obtain a text expression vector Hdocument ═ { Hfm, Hdm }. Where m represents the number of input statement representation vectors, and Hfm and Hdm represent the hidden state at the last time of the forward LSTM network and the hidden state at the last time of the backward LSTM network, respectively, where the last time corresponds to the input time of the last statement representation vector.

Thereafter, the text representation vector Hdocument is input to the decoding end of the model. The sentence decoder and the word decoder at the decoding end illustrated in fig. 2 are both formed of LSTM networks. Only the hidden states of the sentence decoder and the word decoder at two different time instants are illustrated in fig. 2.

As shown in fig. 2, the text representation vector Hdocument is input to a sentence decoder for sentence-level decoding, and the hidden states output by the sentence decoder at different times represent abstract sentences containing what main contents should be output at corresponding times, that is, the sentence decoder captures the main contents of the input text at the sentence level and generates a semantic representation indicating which contents in the text should be summarized in the abstract sentences that should be output at different times. Two hidden layer states of the sentence decoder are illustrated in fig. 2: hguid 1 and hguid 2, which are used as corresponding sentence indication vectors at different time instances.

Thereafter, the words are decoded in the word decoder by Hguidance1 and Hguidance2, respectively, to decode the words that should be included in each corresponding abstract statement.

The hidden states of the word decoder at different times are illustrated in fig. 2, and for each statement indication vector, after the statement indication vector is input to the word decoder, a corresponding word may be generated according to the hidden state corresponding to the word decoder at each time, so that a plurality of words generated by the plurality of hidden states constitute a summary statement corresponding to the statement indication vector. As illustrated in fig. 2: the sentence indication vector Hguidance1 corresponds to the abstract sentence S1 ═ S1a, S1b, S1c, the sentence indication vector Hguidance2 corresponds to the abstract sentence S2 ═ S2a, S2b, S2c, S2d, and finally, the abstract corresponding to the input text is composed of S1 and S2 and is represented as { S1, S2 }.

In addition, it should be noted that in the scenario of generating an abstract, the input text may be relatively long, and at this time, in order to further improve the quality of the generated abstract, an Attention (Attention) mechanism may be introduced into the abstract generation model, and the implementation of the Attention mechanism may be completed with reference to the related art, which is not described herein.

In summary, in the above solution, the sentence structure of the text is used to perform the abstract generation processing on the text, so that the context information inside the sentence can be captured at the encoding end through the word encoder, and the context information between the sentences, that is, the context information of the text, can be captured through the sentence encoder; at the decoding end, the decoding mode from coarse granularity to fine granularity from sentence to word can firstly grasp the main content of the whole abstract at the sentence layer, and then specifically decode a sentence abstract sentence, so that the generation quality of the abstract can be ensured.

The above embodiment describes the usage process of the trained model, and the generation of the model, i.e. the training process, is described below with reference to the following embodiment. Since the model includes a word encoder, a sentence decoder, and a word decoder, the model training process can be regarded as a process of training these codecs.

Fig. 3 is a flowchart of a model training method provided in an exemplary embodiment, and as shown in fig. 3, the method may include the following steps:

301. a plurality of second sentences contained in the labeled training samples are obtained.

For model training, a large number of training samples, which may be various long texts collected from the network, may be collected in advance. In addition, model training is performed in a supervised manner, and therefore, preprocessing such as labeling needs to be performed in advance for each acquired training sample.

Where labeling can be understood as giving a standard answer for each training sample: a plurality of abstracts formed by referring to abstract sentences.

In addition, optionally, sentence splitting processing may be performed on each training sample, so that each sentence constituting one training sample is obtained.

In addition, for the convenience of training, two parameters can also be set: the method comprises the following steps of (1) sentence length L and sentence number C, wherein the sentence length L restricts the number of words which can be contained in a sentence at most; the number of sentences C constrains the number of sentences that can be included in a training sample at most.

Based on the setting of the two parameters, optionally, for any training sample, on one hand, if the number of sentences contained in the training sample exceeds C, the first C sentences can be extracted from the training sample for training, that is, the training sample is regarded as being composed of only the C sentences, and if the number of sentences contained in the training sample is less than C, the place-occupying sentences can be supplemented to ensure that the training sample is composed of the C sentences; on the other hand, for a certain sentence, if the number of words contained in the sentence is greater than L, similarly, the first L words are also truncated to represent the sentence, and conversely, if the number of words contained in the sentence is less than L, a preset placeholder may be filled behind the last word contained in the sentence to ensure that the number of words contained in the sentence is L. It can be understood that the space occupying sentences and space occupying words have no practical meaning, only play a space occupying role, and do not play a role in training the model parameters.

After each training sample is subjected to the preprocessing, one training sample can be randomly selected from a large number of training samples in sequence to train the model.

In this embodiment, only the training process of a certain training sample is taken as an example for explanation, and in practice, the model is trained to be convergent only by performing multiple rounds of training on the model through a large number of training samples.

It is understood that when the number of sentences C is constrained, the second sentences in step 301 refer to the first C sentences in the training sample.

302. And respectively carrying out word encoding processing on the plurality of second sentences through a word encoder to obtain a plurality of second sentence expression vectors corresponding to the plurality of second sentences.

Actually, word segmentation processing may be performed on each second sentence, and then each word segmentation result may be converted into a corresponding word vector according to a word vector matrix obtained by pre-training or a word vector matrix initialized at random to be input into the word encoder, where training and using of the word vector matrix refer to the foregoing description and are not repeated.

303. And sentence coding processing is carried out on the plurality of second sentence expression vectors through a sentence coder, and second text expression vectors corresponding to the training samples are obtained.

304. And decoding the second text expression vector by a sentence decoder to obtain a plurality of second sentence indication vectors indicating text contents to be decoded.

305. And respectively decoding the plurality of second statement indication vectors through a word decoder to obtain a plurality of second abstract statements.

The processing procedures of the word encoder, the sentence decoder and the word decoder may refer to the processing procedures of the input text in the foregoing embodiments, and are not described in detail.

306. And determining a first loss function according to the plurality of second abstract sentences and the plurality of labeled reference abstract sentences.

307. Parameters of the model are determined according to the first loss function.

After the first loss function is obtained, the parameters of the model may be adjusted by a back-propagation process.

It will be appreciated that if the first loss function of the model is in a more stable state, the model is considered to have been trained to converge. Alternatively, when the number of iterations of the training reaches a set number, the model training may be considered to be completed.

It should be noted that, in the above model training process, because the supervised training is adopted, similar to the traditional Seq2Seq architecture, in the execution process of the word decoder, the input word input to the word decoder at the current time, i.e. the hidden layer LSTM, is the already labeled reference word (with the reference answer), rather than the word predicted and output at the previous time, so that even if the prediction result is wrong at the previous time, the error will not be continuously transmitted downwards. However, in the actual use stage of the model, that is, in the process of actually generating the abstract for a certain input text, since there is no reference answer, the input of the word decoder at the previous moment is used as the input of the next moment, if there is an error in the output at the previous moment, the error will be transmitted backwards, and as the length of the generated sequence increases, the error will be amplified all the time, which is the problem of exposure bias (exposure bias).

To overcome this exposure bias problem, a model optimization scheme is also provided herein, which is described below in conjunction with the following embodiments. The model optimization scheme is performed after training the model to converge based on the first loss function. That is, the model training process can be divided into two stages, in the first stage, the model is trained to converge through the training process shown in fig. 3, and then the model is optimized in the second stage.

Fig. 4 is a flowchart of a model optimization method provided in an exemplary embodiment, and as shown in fig. 4, the method may include the following steps:

401. and reversely deducing the reconstructed hidden layer state at the t-n moment of the word decoder according to the actual hidden layer state at the t moment of the word decoder, wherein n is greater than or equal to 1.

In practical application, n may be 1, 2, or 3, for example.

402. And determining a second loss function according to the actual hidden layer state of the word decoder at the t-n moment and the reconstructed hidden layer state of the word decoder at the t-n moment.

403. The parameters of the model are adjusted according to the second loss function.

To overcome the exposure bias problem, reconstruction networks are introduced to reconstruct or reverse-infer the hidden states of the word decoder. For the sake of distinction, the inversely inferred hidden state is referred to as a reconstructed hidden state, and the hidden state actually output by the word decoder is referred to as an actual hidden state.

In this embodiment, the purpose of reconstructing the hidden layer state at a time before the current time can be achieved based on the reconstruction network. At this time, as shown in fig. 5, the reconstructed network includes one hidden layer and a fully connected layer connected to the hidden layer. In the figure, it is assumed that the hidden layer is implemented as an LSTM network and the fully connected layer is denoted as FC. Of course, the hidden layer can also be implemented as a network structure such as RNN, CNN, etc.

Under the structure of the reconstruction network, the reconstruction hidden layer state at the t-n moment of the word decoder is reversely deduced according to the actual hidden layer state at the t moment of the word decoder, and the reconstruction hidden layer state at the t-n moment of the word decoder can be realized as follows:

inputting the actual hidden layer state of the word decoder at the time t and the hidden layer state of the reconstruction network at the time t-1 into a hidden layer of the reconstruction network to update the hidden layer state to obtain the hidden layer state of the reconstruction network at the time t, and then inputting the hidden layer state of the reconstruction network at the time t into a full connection layer to obtain the reconstructed hidden layer state of the word decoder at the time t-n output by the full connection layer.

The implementation is described with reference to fig. 5, and for the sake of understanding, the structure of the reconstructed network expanded according to time sequence is shown in the figure. Suppose that the actual hidden layer states of the word decoder corresponding to the four moments t-3, t-2, t-1 and t are h_t-3、h_t-2、h_t-1And h_t. The hidden layer states of the word decoder are used as the input of the reconstruction network, and therefore, the four hidden layer states are sequentially input into the hidden layer of the reconstruction network. As shown in fig. 5, in chronological order, h_t-3Firstly, the data is input to a hidden layer of a reconstructed network, and at this time, the state of the hidden layer is updated to h'_t-3. Then h_t-2Input to the hidden layer of the reconstructed network, according to h_t-2And h'_t-3Obtaining a hidden layer state h 'of the reconstructed network at the time t-2'_t-2. And analogizing in sequence to obtain the hidden layer state h 'of the reconstructed network in sequence'_t-1，h′_t。

Fig. 5 illustrates the case where n is 2, and onlyThe time t is an example, and the other time conditions are similar to the above, which are not described again. h'_tAfter the input of the full connection layer, the output of the full connection layer is the reconstructed hidden layer state of the word decoder at the t-2 moment

Therefore, the reconstruction of the hidden layer state of the word decoder at the t-2 moment is realized. At this point, the actual hidden layer state h of the word decoder at time t-2_t-2And reconstructing hidden layer states

And performing error calculation to obtain a second loss function. In FIG. 5, the second loss function is expressed as L_ARThat is, since n is 2, L is equal to time t_ARAccording to the actual hidden layer state h of the word decoder at the time t-2_t-2And reconstructing hidden layer states

Similarly, if the current time is time t-1, the reconstructed hidden layer state of the word decoder at time t-3 can be deduced reversely according to the above process, and so on.

And after the second loss function value is obtained, carrying out parameter fine adjustment on the model in the back propagation process according to the second loss function value, thereby achieving the purpose of optimizing the model.

Based on the model optimization scheme, reconstruction of the hidden layer state at a certain moment before the word decoder can be realized, and a scheme capable of reconstructing the hidden layer states at a plurality of moments before the word decoder is provided for further improving the quality of the model.

Fig. 6 is a flowchart of another model optimization method provided in an exemplary embodiment, and as shown in fig. 6, the method may include the following steps:

601. and reversely deducing the reconstruction hidden layer states corresponding to the time points from t-1 to t-m of the word decoder according to the actual hidden layer state at the time point t of the word decoder, wherein m is larger than 1.

In practical application, m may be, for example, 2 or 3.

602. And determining a plurality of third loss functions according to the actual hidden layer states corresponding to the word decoders from t-1 to t-m and the reconstructed hidden layer states corresponding to the word decoders from t-1 to t-m.

603. The parameters of the model are adjusted according to a plurality of third loss functions.

To enable reconstruction of hidden states at a number of times before the current time, two reconstruction network architectures are provided herein, as shown in fig. 7 and 8.

A stacked reconstruction network architecture is provided in fig. 7. At this time, the restructuring of the network includes: m hidden layers and m full-connection layers respectively connected with the hidden layers are sequentially connected. That is, at this time, the reconstructed network is composed of m hidden layers and m fully-connected layers, and the hidden layers and the fully-connected layers correspond to each other one to one. Under this reconstructed network structure, step 601 can be implemented as:

the actual hidden state at time t of the word decoder is input to the first hidden layer of the reconstruction network to update the hidden state. Thus, the reconstructed hidden layer state corresponding to the time t-1 of the word decoder is obtained as follows: and inputting the updated hidden layer state of the first hidden layer into a first full-connection layer connected with the first hidden layer to obtain a reconstructed hidden layer state of the word decoder at the t-1 moment output by the first full-connection layer. The reconstruction hidden layer state corresponding to the t-i moment of the word decoder is obtained in the following mode, wherein 1< i is not more than m: and inputting the hidden layer state of the (i-1) th hidden layer into the (i) th hidden layer to update the hidden layer state, and inputting the updated hidden layer state of the (i) th hidden layer into the fully-connected layer connected with the (i) th hidden layer to obtain a reconstructed hidden layer state corresponding to the t-i moment of the word decoder output by the (i) th fully-connected layer.

Specifically, as shown in fig. 7, assuming that m is 3, the reconstructed network includes three hidden layers LSTM1, LSTM2, and LSTM3, and three fully-connected layers respectively connected to the three hidden layers: FC1, FC2 and FC 3. In addition, the loss function calculated at the output of each fully-connected layer is denoted as L_AR1、L_AR2，L_AR3. Wherein, the parameters of LSTM1, LSTM2 and LSTM3 are different from each other

Assuming that the current time is t, the hidden states at three times, t-3, t-2 and t-1, before the word decoder can be reconstructed according to the reconstruction network shown in fig. 7. Suppose that the actual hidden layer states of the word decoder corresponding to the four moments t-3, t-2, t-1 and t are h_t-3、h_t-2、h_t-1And h_t。

In order to reconstruct the hidden layer states of the word decoder at three moments of t-3, t-2 and t-1, the actual hidden layer state h corresponding to the word decoder at the moment t is used_tIs input to the first hidden layer LSTM1 as input to the reconstruction network to update the hidden layer status of LSTM1 to h'_tFurthermore, the hidden layer state is h 'from FC 1'_tProcessing to output reconstructed hidden layer state at time t-1 of word decoder

Further, the actual hidden layer state h at time t-1 of the word decoder is used_t-1And

calculating L_AR1Assume that the calculated loss function value is: loss 1.

Hidden layer state h 'of LSTM 1'_tIn addition to being input to FC1, the data is passed to the second hidden layer LSTM2 to update the hidden state of LSTM2, assuming the updated hidden state of LSTM2 is h'_t-1，h′_t-1Input to FC2, and h 'is the hidden layer state from FC 2'_t-1Processing to output reconstructed hidden layer state at time t-2 of word decoder

Further, the actual hidden layer state h at time t-2 of the word decoder is used_t-2And

calculating L_AR2Let the resulting loss function value be: loss 2.

Similarly, hidden layer state h 'of LSTM 2'_t-1Is transmitted toThe third hidden layer LSTM3 updates the hidden layer state of LSTM3, and the updated hidden layer state of LSTM3 is assumed to be h'_t-2，h′_t-2Input to FC3, and h 'is the hidden layer state from FC 3'_t-2Processing to output reconstructed hidden layer state at time t-3 of word decoder

Further, the actual hidden layer state h at time t-3 of the word decoder is used_t-3And

calculating L_AR3Assuming the resulting loss function value is: loss 3.

Thereby, a reconstruction of the hidden layer state of the word decoder at a plurality of moments before the current moment t is achieved.

After obtaining the plurality of loss function values, the plurality of loss functions may be added together to perform an optimal adjustment on the model parameters.

In addition to the stacked reconstitution network configuration shown in FIG. 7, FIG. 8 also provides a side-by-side reconstitution network configuration. At this time, the restructuring of the network includes: m independent hidden layers and m fully-connected layers connected with the hidden layers respectively. Under this side-by-side reconfiguration network structure, step 601 can be implemented as:

and respectively inputting the actual hidden layer states of the word decoder at the t moment into the m hidden layers to update the hidden layer states of the m hidden layers. And inputting the hidden layer states of the m hidden layers into the corresponding full-connected layers to obtain the reconstructed hidden layer states corresponding to the time points from t-1 to t-m of the word decoder output by the m full-connected layers.

Specifically, as shown in fig. 8, assuming that m is 3, the reconstructed network includes three hidden layers LSTM1, LSTM2, and LSTM3, and three fully-connected layers respectively connected to the three hidden layers: FC1, FC2 and FC 3. In addition, a loss function, denoted L, may be calculated at the output of each fully-connected layer_AR1、L_AR2，L_AR3. Wherein, the word decoder respectively corresponds to the actual hidden layer state h at the time of t-3, t-2 and t-1_t-3、h_t-2、h_t-1And L_AR1、L_AR2，L_AR3The connection relationship of (2) is as shown in fig. 8.

Unlike the embodiment shown in fig. 7, in this embodiment, the three hidden layers are independent from each other, and no information is transmitted between them.

In addition, assuming that the current time is t, if the hidden layer states at three times, i.e., t-3, t-2 and t-1 before the word decoder are to be reconstructed, unlike the embodiment shown in fig. 7, in this embodiment, the word decoder is in the actual hidden layer state h corresponding to the time t_tTo be input to the various hidden layers of the reconstructed network: LSTM1, LSTM2, and LSTM 3. The parameters of LSTM1, LSTM2 and LSTM3 are different.

As shown in FIG. 8, the actual hidden state h of the word decoder at time t corresponds to_tThe hidden layer states of LSTM1, LSTM2 and LSTM3 can be updated to h 'after being input to LSTM1, LSTM2 and LSTM 3'_t、h′_t-1、h′_t-2. And further, FC1, FC2 and FC3 respectively process the hidden layer states input to the decoder so as to output reconstructed hidden layer states corresponding to the word decoders from t-1 to t-3

And

further, according to the loss function L_AR1、L_AR2、L_AR3Outputting the corresponding loss function value: loss1, loss2, and loss3, which add these several loss functions together to make optimal adjustments to the model parameters.

The summary generation apparatus of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that the summary generation means may be constructed by configuring the steps taught in the present solution using commercially available hardware components.

Fig. 9 is a schematic structural diagram of a summary generation apparatus according to an embodiment of the present invention, and as shown in fig. 9, the apparatus includes: the system comprises an acquisition module 11, a word encoder 12, a sentence encoder 13, a sentence decoder 14 and a word decoder 15.

An obtaining module 11, configured to obtain a plurality of first statements constituting a text.

And the word encoder 12 is configured to perform word encoding processing on the plurality of first sentences through a word encoder of the trained model, respectively, to obtain a plurality of first sentence expression vectors corresponding to the plurality of first sentences.

A sentence encoder 13, configured to perform sentence encoding processing on the plurality of first sentence expression vectors by using the sentence encoder of the model, so as to obtain first text expression vectors corresponding to the text.

And the sentence decoder 14 is configured to decode the first text expression vector by using the sentence decoder of the model to obtain a plurality of first sentence indication vectors indicating text contents to be decoded.

And the word decoder 15 is configured to decode the plurality of first statement indication vectors respectively through the word decoder of the model to obtain a plurality of first abstract statements, where the text abstract is composed of the plurality of first abstract statements.

Wherein, optionally, the word encoder and the sentence encoder comprise any one of the following neural networks: bidirectional recurrent neural network, bidirectional long-short term memory network.

Wherein, optionally, the word decoder and the sentence decoder comprise any one of the following neural networks: a recurrent neural network, a long-short term memory network.

Further, the apparatus may further include a model training module, configured to obtain a plurality of second sentences included in the labeled training samples; respectively carrying out word encoding processing on the plurality of second sentences through the word encoder to obtain a plurality of second sentence expression vectors corresponding to the plurality of second sentences; sentence encoding processing is carried out on the plurality of second sentence representation vectors through the sentence encoder, and second text representation vectors corresponding to the training samples are obtained; decoding the second text representation vector through the sentence decoder to obtain a plurality of second sentence indication vectors indicating text contents to be decoded; decoding the plurality of second statement indication vectors respectively through the word decoder to obtain a plurality of second abstract statements; determining a first loss function according to the plurality of second abstract sentences and the plurality of labeled reference abstract sentences; determining parameters of the model according to the first loss function.

Optionally, in the process of obtaining the plurality of second abstract statements, the model training module may be specifically configured to: for any statement indication vector in the plurality of second statement indication vectors, decoding any statement indication vector through the term decoder to obtain an actual hidden layer state of the term decoder at each moment; and determining words corresponding to each moment according to the actual hidden layer state of each moment, wherein the second abstract statement corresponding to any statement indication vector consists of the words corresponding to each moment.

Further optionally, the apparatus may further include: the first model optimization module is used for reversely deducing a reconstructed hidden layer state of the word decoder at the t-n moment according to the actual hidden layer state of the word decoder at the t moment, n is greater than or equal to 1, and t is one moment in all the moments; determining a second loss function according to the actual hidden layer state of the word decoder at the t-n moment and the reconstructed hidden layer state of the word decoder at the t-n moment; adjusting parameters of the model according to the second loss function.

Wherein, optionally, the reconfiguration network includes a hidden layer and a fully connected layer connected with the hidden layer. Thus, the first model optimization module may specifically be configured to: inputting the actual hidden layer state of the word decoder at the t moment and the hidden layer state of the reconstruction network at the t-1 moment into the hidden layer of the reconstruction network to update the hidden layer state, so as to obtain the hidden layer state of the reconstruction network at the t moment; and inputting the hidden layer state of the reconstructed network at the time t into the full-connection layer to obtain the reconstructed hidden layer state of the word decoder at the time t-n output by the full-connection layer.

Further optionally, the apparatus may further include: the second model optimization module is used for reversely deducing the reconstruction hidden layer states corresponding to the t-1 to t-m moments of the word decoder according to the actual hidden layer state of the t moment of the word decoder, wherein m is larger than 1, and t is one moment in all the moments; determining a plurality of third loss functions according to the actual hidden layer states corresponding to the word decoders from the t-1 to the t-m and the reconstructed hidden layer states corresponding to the t-1 to the t-m; adjusting parameters of the model according to the plurality of third loss functions.

In a specific optional embodiment, the reconfiguration network includes: m hidden layers and a full-connection layer, wherein the m hidden layers are connected in sequence, and the full-connection layer is respectively connected with the m hidden layers. At this point, the second model optimization module may be configured to: inputting the actual hidden layer state of the word decoder at the moment t to a first hidden layer of the reconstruction network to update the hidden layer state; obtaining a reconstruction hidden layer state corresponding to the t-1 moment of the word decoder in the following way: inputting the updated hidden layer state of the first hidden layer into a first full-connection layer connected with the first hidden layer to obtain a reconstructed hidden layer state of the word decoder at the t-1 moment output by the first full-connection layer; obtaining a reconstruction hidden layer state corresponding to the t-i moment of the word decoder in a mode that 1< i is not more than m: and inputting the hidden layer state of the (i-1) th hidden layer into the (i) th hidden layer to update the hidden layer state, and inputting the updated hidden layer state of the (i) th hidden layer into the fully-connected layer connected with the (i) th hidden layer to obtain a reconstructed hidden layer state output by the (i) th fully-connected layer and corresponding to the t-i moment of the word decoder.

In yet another specific alternative embodiment, the reconfiguration network includes: independent m hidden layers and a full connection layer connected with each of the m hidden layers. At this point, the second model optimization module may be configured to: respectively inputting the actual hidden layer states of the word decoder at the moment t into the m hidden layers to update the hidden layer states of the m hidden layers; and inputting the hidden layer states of the m hidden layers into the corresponding full-connected layers to obtain the reconstructed hidden layer states output by the m full-connected layers and corresponding to the time points from t-1 to t-m of the word decoder.

The apparatus shown in fig. 9 can perform the methods provided in the foregoing embodiments, and details of the portions of this embodiment that are not described in detail can refer to the related descriptions of the foregoing embodiments, which are not described herein again.

In a possible design, the structure of the summary generation apparatus shown in fig. 9 may be implemented as an electronic device, which may be a terminal device or a server, and as shown in fig. 10, the electronic device may include: a processor 21 and a memory 22. Wherein the memory 22 has stored thereon executable code, which when executed by the processor 21, makes the processor 21 capable of executing the digest generation method as provided in the foregoing embodiments.

In practice, the electronic device may also include a communication interface 23 for communicating with other devices.

In addition, an embodiment of the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which, when executed by a processor of an electronic device, causes the processor to perform the digest generation method provided in the foregoing embodiments.

The above-described apparatus embodiments are merely illustrative, wherein the units described as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A summary generation method comprises the following steps:

obtaining a plurality of first sentences composing a text;

2. The method of claim 1, the word encoder and the sentence encoder comprising any of the following neural networks: a bidirectional cyclic neural network, a bidirectional long-short term memory network;

the word decoder and the sentence decoder comprise any one of the following neural networks: a recurrent neural network, a long-short term memory network.

3. The method according to claim 1 or 2, further comprising a training step of the model:

obtaining a plurality of second sentences contained in the labeled training samples;

respectively carrying out word encoding processing on the plurality of second sentences through the word encoder to obtain a plurality of second sentence expression vectors corresponding to the plurality of second sentences;

sentence encoding processing is carried out on the plurality of second sentence representation vectors through the sentence encoder, and second text representation vectors corresponding to the training samples are obtained;

decoding the second text representation vector through the sentence decoder to obtain a plurality of second sentence indication vectors indicating text contents to be decoded;

decoding the plurality of second statement indication vectors respectively through the word decoder to obtain a plurality of second abstract statements;

determining a first loss function according to the plurality of second abstract sentences and the plurality of labeled reference abstract sentences;

determining parameters of the model according to the first loss function, the model including the word encoder, the sentence decoder, and the word decoder.

4. The method of claim 3, the obtaining of the plurality of second abstract statements comprising:

for any statement indication vector in the plurality of second statement indication vectors, decoding any statement indication vector through the term decoder to obtain an actual hidden layer state of the term decoder at each moment;

and determining words corresponding to each moment according to the actual hidden layer state of each moment, wherein the second abstract statement corresponding to any statement indication vector consists of the words corresponding to each moment.

5. The method of claim 4, further comprising:

reversely deducing a reconstructed hidden layer state of the word decoder at the t-n moment according to the actual hidden layer state of the word decoder at the t moment, wherein n is greater than or equal to 1, and t is one moment in all the moments;

determining a second loss function according to the actual hidden layer state of the word decoder at the t-n moment and the reconstructed hidden layer state of the word decoder at the t-n moment;

adjusting parameters of the model according to the second loss function.

6. The method of claim 5, a reconstructed network comprising a hidden layer and a fully connected layer connected to the hidden layer;

the reversely deducing the reconstructed hidden layer state of the word decoder at the t-n moment according to the actual hidden layer state of the word decoder at the t moment comprises the following steps:

inputting the actual hidden layer state of the word decoder at the t moment and the hidden layer state of the reconstruction network at the t-1 moment into the hidden layer of the reconstruction network to update the hidden layer state, so as to obtain the hidden layer state of the reconstruction network at the t moment;

and inputting the hidden layer state of the reconstructed network at the time t into the full-connection layer to obtain the reconstructed hidden layer state of the word decoder at the time t-n output by the full-connection layer.

7. The method of claim 4, further comprising:

reversely deducing the reconstruction hidden layer states corresponding to the t-1 to t-m moments of the word decoder according to the actual hidden layer state of the t moment of the word decoder, wherein m is more than 1, and t is one moment in all the moments;

determining a plurality of third loss functions according to the actual hidden layer states corresponding to the word decoders from the t-1 to the t-m and the reconstructed hidden layer states corresponding to the t-1 to the t-m;

adjusting parameters of the model according to the plurality of third loss functions.

8. The method of claim 7, reconfiguring a network comprising: m hidden layers and full-connection layers respectively connected with the m hidden layers are sequentially connected;

the reversely deducing the reconstruction hidden layer states corresponding to the t-1 to t-m moments of the word decoder according to the actual hidden layer state of the t moment of the word decoder comprises the following steps:

inputting the actual hidden layer state of the word decoder at the moment t to a first hidden layer of the reconstruction network to update the hidden layer state;

the reconstruction hidden layer state corresponding to the t-1 moment of the word decoder is obtained in the following mode: inputting the updated hidden layer state of the first hidden layer into a first full-connection layer connected with the first hidden layer to obtain a reconstructed hidden layer state of the word decoder at the t-1 moment output by the first full-connection layer;

the reconstruction hidden layer state corresponding to the t-i moment of the word decoder is obtained in the following mode, wherein 1< i is not more than m: and inputting the hidden layer state of the (i-1) th hidden layer into the (i) th hidden layer to update the hidden layer state, and inputting the updated hidden layer state of the (i) th hidden layer into the fully-connected layer connected with the (i) th hidden layer to obtain a reconstructed hidden layer state output by the (i) th fully-connected layer and corresponding to the t-i moment of the word decoder.

9. The method of claim 7, reconfiguring a network comprising: m independent hidden layers and a full connection layer connected with each of the m hidden layers;

respectively inputting the actual hidden layer states of the word decoder at the moment t into the m hidden layers to update the hidden layer states of the m hidden layers;

and inputting the hidden layer states of the m hidden layers into the corresponding full-connected layers to obtain the reconstructed hidden layer states output by the m full-connected layers and corresponding to the time points from t-1 to t-m of the word decoder.

10. A digest generation apparatus comprising:

11. An electronic device, comprising: a memory, a processor; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the digest generation method of any one of claims 1 to 9.