CN111241267B

CN111241267B - Abstract extraction and abstract extraction model training method, related device and storage medium

Info

Publication number: CN111241267B
Application number: CN202010025465.8A
Authority: CN
Inventors: 叶忠义; 吴飞; 方四安; 徐承
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2020-01-10
Filing date: 2020-01-10
Publication date: 2022-12-06
Anticipated expiration: 2040-01-10
Also published as: CN111241267A

Abstract

The application discloses a method for abstract extraction and abstract extraction model training, a related device and a storage medium, wherein the abstract extraction method comprises the following steps: the method comprises the steps of segmenting a text to obtain a sentence list, carrying out window segmentation on the sentence list according to a preset length to obtain a plurality of windows according to a sequence, wherein the preset length is the maximum sequence length supported by a abstract extraction model, each window comprises a plurality of continuous sentences, the length of each window is smaller than or equal to the preset length, two adjacent windows comprise at least one same sentence, predicting each window by using the abstract extraction model to obtain an importance score of each sentence in each window, determining the importance score of each sentence in the text by using the importance score of each sentence in each window, and selecting at least one sentence which is ranked from high to low according to the importance score as an abstract of the text. According to the scheme, the quality of abstract extraction can be improved.

Description

Abstract extraction and abstract extraction model training method, related device and storage medium

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a method for abstracting an abstract and training an abstract abstraction model, a related apparatus, and a storage medium.

Background

With the development of information technology, the realization of processing natural language through machine learning is becoming popular among people. Taking deep learning as an example, the neural network-based model may be applied to tasks related to natural language processing, such as machine translation, text summarization, and the like.

Due to the development of internet technology, people may receive a great amount of news, articles and other information every day. Therefore, the information is abstracted and extracted, and the efficiency of acquiring the information by people can be improved. However, in practical applications, especially when abstracting long texts such as work reports and meeting summaries, situations that affect the abstraction quality such as information loss or information redundancy are likely to occur. In view of this, how to improve the quality of the abstract extraction is an urgent problem to be solved.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a method for training an abstract extraction and abstract extraction model, a related device and a storage medium, which can improve the quality of abstract extraction.

In order to solve the above problem, a first aspect of the present application provides a method for extracting a summary, including: the method comprises the steps of segmenting a text to obtain a sentence list, carrying out window segmentation on the sentence list according to a preset length to obtain a plurality of windows according to a sequence, wherein the preset length is the maximum sequence length supported by a abstract extraction model, each window comprises a plurality of continuous sentences, the length of each window is smaller than or equal to the preset length, two adjacent windows comprise at least one same sentence, predicting each window by using the abstract extraction model to obtain an importance score of each sentence in each window, determining the importance score of each sentence in the text by using the importance score of each sentence in each window, and selecting at least one sentence which is ranked from high to low according to the importance score as an abstract of the text.

In order to solve the above problem, a second aspect of the present application provides a method for training a summarization extraction model, including: training a bidirectional conversion-based encoder model suitable for abstract extraction; constructing a abstract extraction model by utilizing a bidirectional conversion-based encoder model suitable for abstract extraction; and (5) training the abstract extraction model by using a text abstract training set.

In order to solve the above problem, a third aspect of the present application provides a method for extracting a summary, including: predicting the text by using a abstract extraction model to obtain an importance score of each sentence in the text; acquiring the characteristics of a plurality of sentences which are ranked from high to low according to importance scores and are in the front; combining at least part of the sentences according to the maximum abstract length to obtain a plurality of sentence combinations, wherein the length of each sentence combination is less than or equal to the maximum abstract length; predicting each sentence combination by using a trained scoring regressor to obtain a prediction quality score; and selecting one sentence combination with the highest prediction quality score as the abstract of the text.

In order to solve the above problem, a fourth aspect of the present application provides a digest extraction apparatus, including a memory and a processor, which are coupled to each other, the memory storing program instructions, and the processor being configured to execute the program instructions to implement the digest extraction method in the first aspect or the third aspect.

In order to solve the above problem, a fifth aspect of the present application provides a summarization model training apparatus, which includes a memory and a processor, which are coupled to each other, the memory storing program instructions, and the processor being configured to execute the program instructions to implement the summarization model training method in the second aspect.

In order to solve the above problem, a sixth aspect of the present application provides a computer-readable storage medium storing program instructions, wherein the program instructions, when executed by a processor, implement the digest extraction method in the first aspect, or implement the digest extraction model training method in the second aspect, or implement the digest extraction method in the third aspect.

According to the scheme, the text is divided into sentences to obtain a sentence list, the sentence list is subjected to window division according to the preset length, the preset length is the maximum sequence length supported by the abstract extraction model, each window comprises a plurality of continuous sentences, the length of each window is smaller than or equal to the preset length, two adjacent windows comprise at least one same sentence, each window is predicted by the abstract extraction model to obtain the importance score of each sentence in each window, the importance score of each sentence in the text is determined by using the importance score of each sentence in each window, at least one sentence with the highest importance score to the lowest importance in the text is selected as the abstract of the text, and in addition, no matter long text or short text, at least one sentence with the highest importance score to the lowest importance in the text can be selected as the abstract of the text, so that the probability of information loss or information redundancy can be reduced for the abstract of the long text or the short text, and the quality of text extraction can be improved.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a method for abstracting a summary of the present application;

FIG. 2 is a schematic diagram of an embodiment of windowing text;

FIG. 3 is a block diagram of an embodiment of a summarization extraction model;

FIG. 4 is a schematic flowchart of an embodiment of step S13 in FIG. 1;

FIG. 5 is a schematic flow chart diagram illustrating another embodiment of a method for abstracting a summary of the present application;

FIG. 6 is a flowchart illustrating an embodiment of a method for abstract extraction model training according to the present application;

FIG. 7 is a block diagram of an embodiment of an apparatus for abstract extraction according to the present application;

FIG. 8 is a block diagram of an embodiment of a training apparatus for abstract extraction model according to the present application;

FIG. 9 is a block diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.

In the following description, for purposes of explanation rather than limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, "plurality" herein means two or more than two.

Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a method for extracting a summary of the present application. Specifically, the following steps may be included:

step S11: and (5) carrying out sentence division on the text to obtain a sentence list.

In one implementation scenario, the text may be divided according to sentence end marks, so as to obtain a sentence list. For example, for the text "today is XX month XX day. Today's weather is sunny. ", may be in terms of periods of text. "the above text is divided into sentences to obtain" today is XX month XX day "and" today's weather is fine ", so that the obtained sentence list includes a sentence" today is XX month XX day "and a sentence" today's weather is fine ". Other text may be analogized and is not illustrated here. In a specific implementation scenario, in order not to affect the context between the texts after the sentence division, the sentences in the sentence list may be sorted according to their respective order in the texts. Still with the above text "today is XX months and XX days. Today's weather is sunny. For example, in the sentence list obtained by sentence division, the position of the sentence "today is XX days" in the sentence list may coincide with the position in the original text, i.e. the first position in the sentence list, and the position of the sentence "today's weather is fine" in the sentence list may coincide with the position in the original text, i.e. the second position in the sentence list. Other text may be analogized and is not illustrated here.

Step S12: and carrying out window division on the sentence list according to a preset length to obtain a plurality of windows.

In this embodiment, the preset length is a maximum sequence length supported by the abstract extraction model. In a specific implementation scenario, the abstract extraction model may be a BERT (Bidirectional Encoder from transforms) model, which is a model of a transform-based Bidirectional Encoder, and when a word is processed, information of words before and after the word can be considered, so that semantics of a context can be obtained. The preset length may be set according to the employed summarization model, for example, for the BERT model, the maximum supported sequence length is 512, so the preset length may be set to 512. When the summarization extraction model is another model and the maximum sequence length supported by the summarization extraction model is another value, the same can be done, and the examples are not repeated.

In this embodiment, each window includes a plurality of consecutive sentences and has a length less than or equal to a preset length. Still taking the BERT model as an example, the length of the multiple consecutive sentences included in each window should be less than or equal to 512, for example, 512, 500, 489, etc., which is not further illustrated here. In a specific implementation scenario, in order to make the summarization model process as many sentences as possible in each window, thereby reducing the number of times the summarization model is called, and further reducing the processing load, each window should include a plurality of consecutive sentences, not only the length of which is less than or equal to the preset length, but also the length of the window is greater than the preset length if a next sentence is added to the window, wherein the next sentence is an adjacent sentence after the last sentence in the window. For example, if the length of the next sentence is 12, the next sentence should be divided into windows in which the consecutive sentences are located, and conversely, if the length of the next sentence is 13, the next sentence should be divided into another window different from the windows in which the consecutive sentences are located.

In this embodiment, two adjacent windows include at least one identical sentence. Referring to fig. 2, fig. 2 is a schematic diagram illustrating an embodiment of window division of a text. As shown in fig. 2, the sentence list obtained by sentence division of the text includes "sentence 1", "sentence 2", "sentence 3", "sentence 4", "sentence 5", "sentence 6", "sentence 7", and "sentence 8". After window division is sequentially performed on the sentence list according to a preset length, a plurality of windows are obtained, wherein "sentence 1", "sentence 2", "sentence 3" is divided into the same window, "sentence 2", "sentence 3", "sentence 4" is divided into the same window, "sentence 3", "sentence 4", "sentence 5", "sentence 6" is divided into the same window, and "sentence 6", "sentence 7", and "sentence 8" are divided into the same window. In a specific implementation scenario, the starting sentence in the next window may also be determined according to the number of sentences in the current window. Specifically, if the number of sentences in the current window is greater than 4, the starting sentence in the next window is the 4 th sentence in the current window, and if the number of sentences in the current window is less than or equal to 4, the starting sentence in the next window is the second last sentence in the current window.

Step S13: and (4) predicting each window by using a abstract extraction model to obtain the importance score of each sentence in each window.

In an implementation scenario, in order to adapt each window to the input of the abstract extraction model, when each window is predicted by using the abstract extraction model, each window may be preprocessed to adapt to the input of the abstract extraction model.

In this embodiment, the importance score of each sentence represents the importance of the sentence, and the higher the importance score is, the higher the importance degree of the sentence is. By predicting each window, an importance score can be derived for each sentence in each window.

Step S14: an importance score for each sentence in the text is determined using the importance score for each sentence in each window.

In one implementation scenario, to determine the importance score of each sentence in the text, the number of windows in which each sentence is located in the text may be determined, and if a sentence in the text exists in only one window, the importance score of the sentence in the window is taken as the importance score in the text, and if a sentence in the text exists in at least two windows, the average value of the importance scores in the windows in which the sentence is located is taken as the importance score of the sentence in the text. In a specific implementation scenario, the average pooling may be performed to obtain the average value, which is not limited herein.

Step S15: at least one sentence with the top in the order of importance scores from high to low is selected as the abstract of the text.

In this embodiment, after obtaining the importance scores of the sentences in the text, at least one preceding sentence may be selected as the abstract of the text according to the order from high to low of the importance scores. For example, the top one sentence, or two sentences, or three sentences are selected as the text summary, which is not limited herein.

In an implementation scenario, in order to limit the length of the abstract and reduce the quality of the abstract after the length is limited as low as possible, when at least one selected sentence exceeds the maximum length of the abstract, the selected at least one sentence is freely combined to obtain a plurality of sentence combinations, the length of each sentence combination is less than or equal to the maximum length of the abstract, and then quality prediction is performed on each sentence combination to obtain a predicted quality score of each sentence, so that one sentence combination with the highest predicted quality score is selected as the abstract of the text. The detailed process of limiting the length of the summary is not repeated herein.

According to the scheme, the sentence list is obtained by dividing the text into sentences, the sentence list is sequentially subjected to window division according to the preset length to obtain a plurality of windows, the preset length is the maximum sequence length supported by the abstract extraction model, each window comprises a plurality of continuous sentences, the length is smaller than or equal to the preset length, two adjacent windows comprise at least one same sentence, the abstract extraction model is used for predicting each window respectively to obtain the importance score of each sentence in each window, the importance score of each sentence in the text is determined by using the importance score of each sentence in each window, at least one sentence with the importance score ranked from high to low is selected as the abstract of the text, and therefore, for both long text and short text, at least one sentence with the importance score ranked from high to low can be selected as the abstract of the text, and therefore, the probability of information loss or information redundancy can be reduced and the quality of text extraction can be improved.

Referring to fig. 4, fig. 4 is a schematic flowchart illustrating an embodiment of step S13 in fig. 1. Specifically, the following steps may be included:

step S131: and preprocessing the window.

In this embodiment, the preprocessing the window may specifically include adding an interval flag and a summary flag to each sentence in the window, where the interval flag is used to segment different sentences, and the summary flag is used to summarize semantic information of the sentences. Referring to fig. 3, fig. 3 is a block diagram illustrating an embodiment of predicting a window by using a summarization extraction model. As shown in fig. 3, an interval flag [ SEP ] and a summary flag [ CLS ] may be added to the sentence "send one" and the sentence "send again". Specifically, a summary flag [ CLS ] may be added at the start of each sentence, so that semantic information of each sentence may be summarized.

Step S132: and inputting the preprocessed window into a abstract extraction model to obtain chapter semantic information of the window and sentence semantic information of each sentence in the window.

In this embodiment, the abstract extraction model may be an abstract extraction model of a bidirectional conversion-based encoder, and in an implementation scenario, before the preprocessed window is processed by using the abstract extraction model, a pre-prepared text abstract training set may be further used to train the original abstract extraction model, so as to obtain the abstract extraction model, and the training process of the abstract extraction model is not repeated here.

In this embodiment, the abstract extraction model may include an embedding layer and a translation layer. Specifically, referring to fig. 3, by inputting the preprocessed window into the embedding layer, all the word semantic features, sentence semantic features, and sentence position features in the window can be obtained, and the word semantic features, the sentence semantic features, and the sentence position features are sent into the conversion layer to obtain chapter semantic information of the window and sentence semantic information of each sentence in the window, wherein the chapter semantic information of the window is obtained through semantic relationships between sentences in the window, and the semantic relationships between sentences can be obtained through the sentence semantic features and the sentence position features.

With continuing reference to fig. 3, as shown in fig. 3, after preprocessing the sentence "send one" and the sentence "send again", the words contained in the window after preprocessing can be obtained: [ CLS]、sent、one、[SEP]、[CLS]、sent、again、[SEP]And inputting the word meaning characteristics into the embedding layer, wherein the word meaning characteristics related to word embedding can be obtained respectively: e _[CLS] 、E _[sent] 、E _[one] 、E _[SEP] 、E _[CLS] 、E _[sent] 、E _[again] 、E _[SEP] And sentence semantic features related to segment embedding: e _A 、E _A 、E _A 、E _A 、E _B 、E _B 、E _B 、E _B And sentence position features related to position embedding: e ₁ 、E ₂ 、E ₃ 、E ₄ 、E ₅ 、E ₆ 、E ₇ 、E ₈ The sentence semantic features and the sentence position features are input into the conversion layer, the semantic relation among the sentences can be obtained, and then the chapter semantic information of the window can be obtained. When the sentences in the window are other sentences, the analogy can be repeated, and the examples are not repeated.

Step S133: and utilizing the chapter semantic information to correct the sentence semantic information of each sentence to obtain chapter-level sentence semantic information of each sentence.

In this embodiment, after obtaining the chapter semantic information of the window, the sentence semantic information of each sentence may be modified, so as to obtain chapter-level sentence semantic information of each sentence.

Step S134: and respectively carrying out probabilistic processing on the semantic information of the sentence at the chapter level to obtain the importance score of each sentence in the window.

In this embodiment, a sigmoid function may be used to perform probabilistic processing on the semantic information of the chapter-level sentences, so as to obtain an importance score of each sentence in the window. Specifically, the importance score obtained after the probabilistic processing may be a probability value indicating the importance of the sentence, and the higher the probability value, the higher the importance. The sigmoid function is a function which is commonly used in machine learning, when the function value tends to be in a smooth state or negative infinity, the output range of the sigmoid function is 0 to 1.

Different from the foregoing embodiment, the method includes preprocessing a window, inputting the preprocessed window into a digest extraction model, obtaining chapter semantic information of the window and sentence semantic information of each sentence in the window, and modifying the sentence semantic information of each sentence by using the chapter semantic information to obtain chapter-level semantic information of each sentence, so as to perform probabilistic processing on the chapter-level sentence semantic information respectively to obtain importance scores of each sentence in the window, and the corrected chapter-level sentence semantic information is introduced before being compared with that before being corrected, so that accuracy of digest extraction can be improved.

Referring to fig. 5, fig. 5 is a flowchart illustrating another embodiment of the abstract extraction method of the present application. In this embodiment, in order to limit the length of the extracted summary, so that the extracted summary can be suitable for the needs of a specific service scenario, the method specifically includes the following steps:

step S51: and predicting the text by using a abstract extraction model to obtain the importance score of each sentence in the text.

In an implementation scenario, the abstraction model may be an abstraction model of the encoder based on bidirectional conversion in the foregoing embodiment, and a specific structure may refer to the foregoing embodiment, which is not described herein again.

The specific steps of predicting the text by using the abstract extraction model to obtain the importance score of each sentence in the text may refer to the steps in the foregoing embodiments, and are not described herein again.

Step S52: the characteristics of a plurality of sentences ranked from high to low in importance score are acquired.

In this embodiment, at least one of the multiple features of the importance score, the sentence length, and the sentence score order of the sentence may be obtained. Specifically, the number of the obtained sentences may be 1, 2, 3, and the like, and is not limited herein, for example, 2 sentences ranked top from high to low in importance score are selected, or 5 sentences ranked top from high to low in importance score are selected, or 8 sentences ranked top from high to low in importance score are selected, which may be specifically set according to the actual application, and is not exemplified herein.

Step S53: and combining at least part of the sentences according to the maximum abstract length to obtain a plurality of sentence combinations, wherein the length of each sentence combination is less than or equal to the maximum abstract length.

The maximum digest length in this embodiment may be preset by a user, for example, 200, 250, 300, and the like, and is not limited herein. And combining at least part of the sentences to ensure that the length of the obtained combination of the sentences is less than or equal to the maximum length of the abstract. Still taking "sentence 1" to "sentence 8" in the foregoing embodiment as an example, in a specific implementation scenario, the above sentences are sorted by importance score as follows: "sentence 2", "sentence 1", "sentence 3", "sentence 5", "sentence 4", "sentence 8", "sentence 6" or "sentence 7", a plurality of preceding sentences may be selected, for example, "sentence 2", "sentence 1", "sentence 3" or "sentence 5" is selected, and the selected sentences are combined to obtain a plurality of sentence combinations, and the length of each sentence combination is less than or equal to the maximum length of the abstract, for example, the sentence combinations that can be obtained include but are not limited to: [ "sentence 2", "sentence 1" ], [ "sentence 2", "sentence 1", "sentence 3" ], [ "sentence 3", "sentence 5" ], [ "sentence 1", "sentence 5" ].

Step S54: and predicting each sentence combination by utilizing a trained scoring regressor to obtain a prediction quality score.

In one implementation scenario, the prediction quality score may be a summary evaluation score that is oriented towards recall deficiencies. Specifically, the quality score may be a round-organized outstanding for learning Evaluation score, which evaluates the summary based on co-occurrence information of n-grams in the summary, and is an Evaluation method Oriented to Recall rate of n-grams, and evaluates the quality of the summary by comparing a systematically-generated summary with a manually-generated standard summary and counting the number of overlapping basic units (n-grams, word sequences, and word pairs) between the two. Specifically, the ROUGE criterion is composed of a series of evaluation methods, including ROUGE-N (N is N in N-gram, and the value is 1, 2, 3 or 4), ROUGE-L, ROUGE-S, ROUGE-W, ROUGE-SU and the like. Taking the ROUGE-N as an example, the calculation of the ROUGE score can be obtained by the following formula:

in the above formula, the denominator is the number of n-grams in the standard digest (or reference digest) and the numerator is the number of n-grams where the standard digest (or reference digest) and the machine-produced digest co-occur (coincide). For example, if the standard abstract (reference abstract) is "today's weather is clear" and the machine-generated abstract is "today's weather is clear", taking the route-1 as an example, since the number of word units overlapped by the two is 6 and the number of word units in the standard abstract (reference abstract) is 7, the route is divided into 6/7, and other implementation scenarios can be analogized, which is not illustrated here.

In one implementation scenario, the scoring regressor may include an Xgboost (eXtreme Gradient Boosting) regressor.

In an implementation scenario, the abstract extraction model may be further used to predict the text with the reference abstract to obtain a predicted abstract of the text, the reference abstract is used to calculate a predicted quality score of the predicted abstract of the text, and the feature and the predicted quality score of each sentence in the predicted abstract are used to train the scoring regressor, so that the trained scoring regressor is obtained by performing loss calculation and parameter adjustment and repeating the steps of training the scoring regressor by using the feature and the predicted quality score of each sentence in the predicted abstract until a preset condition is met (for example, the loss value is less than a preset threshold and the loss value is not reduced). In a specific implementation scenario, the step of calculating the route score may be specifically referred to as calculating the prediction quality score of the prediction abstract of the text by using the reference abstract in the training process, and details are not repeated here.

Step S55: and selecting one sentence combination with the highest prediction quality score as the abstract of the text.

In this embodiment, one sentence combination with the highest predicted quality score among the obtained plurality of sentence combinations is used as the abstract of the text. For example, for the above sentence combination: in the case of the sentence combination of sentence 2, sentence 1, sentence 3, sentence 5, sentence 1, sentence 5, the prediction quality score is the highest, and the sentence combination can be used as the abstract of the text. Other situations may be analogized and are not limited herein.

According to the scheme, the text is predicted by using the abstract extraction model to obtain the importance score of each sentence in the text, so that the characteristics of a plurality of sentences with the importance scores ranked from high to low in the front are obtained, at least parts of the sentences are combined according to the maximum length of the abstract to obtain a plurality of sentence combinations, the length of each sentence combination is smaller than or equal to the maximum length of the abstract, each sentence combination is predicted by using a trained scoring regressor to obtain the predicted quality score, one sentence combination with the highest predicted quality score is selected as the abstract of the text, and the quality of the abstract can be improved under the condition that the length of the abstract of the text is limited.

Referring to fig. 6, fig. 6 is a flowchart illustrating an embodiment of a method for training a abstract extraction model according to the present application, which may specifically include the following steps;

step 61: a bi-directional transform-based encoder model suitable for summarization is trained.

In one implementation scenario, a bi-directional transform-based encoder model suitable for summarization may be trained, and in particular, a batch size (batch size) may be set to be greater than a first preset value (the first preset value may be 256), and in particular, the batch size may be set to be 3072; in order to enhance the semantic understanding of the model to chapters, the loss function can be set to include a loss function for predicting whether two sentences are in the same chapter; in order to enable the model to adapt to long texts, the sequence length of each training may be set to be greater than a second preset value (the second preset value may be 128), and specifically, the sequence length may be set to 512; in order to better utilize the material information and improve the generalization capability of the model, a processing mode of dynamically distributing the shielding words during each training (for example, words with different proportions of 15% of dynamically-distributed shielding words during each training) can be set; the processing mode of the shielding words can be set to include discarding the shielding words; in order to incorporate more a priori knowledge, phrases and/or named entities may also be used as masking words, specifically, named entities refer to special objects recognized in texts, semantic categories of the special objects are usually predefined before recognition, and predefined categories such as people, addresses, organizations and the like are not limited herein. In addition, in a specific implementation scenario, the loss function may also include a cross-entropy loss function of the masking word.

In one implementation scenario, in order to enable the model to better learn the domain-dependent semantic representation, a text summarization training set may also be utilized to perform domain refinement on a bi-directional transform-based encoder model suitable for summarization. In one particular implementation scenario, to prevent overfitting, a lower learning rate may be employed and training time controlled during the domain tuning process.

Step S62: and constructing a abstract extraction model by using a bidirectional conversion-based encoder model suitable for abstract extraction.

In an implementation scenario, an output layer may be specifically constructed, and the output layer is spliced to a bidirectional conversion-based encoder model suitable for digest extraction, so as to construct and obtain a digest extraction model.

Step S63: and (5) training an abstract extraction model by using a text abstract training set.

In this embodiment, the text summarization training set may be acquired in advance from paper media such as magazines and newspapers, network media such as blogs and news websites, or work reports published by departments and organizations, and the like, which is not limited herein.

According to the scheme, the encoder model which is suitable for abstract extraction and is based on bidirectional conversion is trained, so that the encoder model which is suitable for abstract extraction and is based on bidirectional conversion is used for building the abstract extraction model, the abstract extraction model is trained by the text abstract, and the extraction model for abstract extraction can be trained.

Referring to fig. 7, fig. 7 is a block diagram of an embodiment of a device 70 for extracting abstract according to the present application. The digest extracting apparatus 70 includes a memory 71 and a processor 72 coupled to each other, the memory 71 stores program instructions, and the processor 72 is configured to execute the program instructions to implement the steps in any of the digest extracting method embodiments described above.

In particular, the processor 72 is configured to control itself and the memory 71 to implement the steps of any of the above-described embodiments of the digest extraction method. The processor 72 may also be referred to as a CPU (Central Processing Unit). The processor 72 may be an integrated circuit chip having signal processing capabilities. The Processor 72 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. Additionally, processor 72 may be commonly implemented by a plurality of integrated circuit chips.

In some embodiments, the processor 72 is configured to perform sentence segmentation on the text to obtain a sentence list; the processor 72 is further configured to perform window division on the sentence list in sequence according to a preset length to obtain a plurality of windows, where the preset length is a maximum sequence length supported by the abstract extraction model, each window includes a plurality of consecutive sentences, and the length of each window is less than or equal to the preset length, and two adjacent windows include at least one same sentence; the processor 72 is further configured to predict each window by using the abstract extraction model, so as to obtain an importance score of each sentence in each window; the processor 72 is further configured to determine an importance score for each sentence in the text using the importance score for each sentence in each window; the processor 72 is further arranged to select at least one sentence ranked top by importance score from high to low as a summary of the text.

In some embodiments, the processor 72 is further configured to take the importance score of the sentence in the window as the importance score of the sentence when the sentence exists in only one window; the processor 72 is further configured to take an average of the importance scores of the sentences in all the windows as the importance score of the sentence when the sentence exists in at least two windows.

Different from the foregoing embodiment, by determining whether there is one window in which a sentence exists, if so, the importance score of the sentence in the window is taken as the importance score of the sentence, otherwise, the average value of the importance scores of the sentences in all the windows is taken as the importance score of the sentence, so that the importance scores of the sentences can be obtained when there are one or more windows in which the sentences exist, which is beneficial to improving the accuracy of abstract extraction.

In some embodiments, the window length is less than or equal to the preset length, and if a next sentence is added to the window, the window length is greater than the preset length, the next sentence being an adjacent sentence after the end sentence of the window.

Different from the foregoing embodiment, the window length is set to be less than or equal to the preset length, and if the next sentence is added to the window, the window length is greater than the preset length, where the next sentence is an adjacent sentence after the last sentence of the window, the number of windows can be reduced as much as possible, so that the frequency of the abstract extraction model being called is reduced, and the processing load is reduced.

In some embodiments, processor 72 is also used to pre-process windows; the processor 72 is further configured to input the preprocessed window into the abstract extraction model, so as to obtain chapter semantic information of the window and sentence semantic information of each sentence in the window; the processor 72 is further configured to modify the sentence semantic information of each sentence by using the chapter semantic information to obtain chapter-level sentence semantic information of each sentence; the processor 72 is further configured to perform probabilistic processing on the semantic information of the sentence at chapter level to obtain an importance score of each sentence in the window.

Different from the foregoing embodiment, the method includes preprocessing a window, inputting the preprocessed window into a digest extraction model to obtain chapter semantic information of the window and sentence semantic information of each sentence in the window, and modifying the sentence semantic information of each sentence by using the chapter semantic information to obtain chapter-level semantic information of each sentence, so as to perform probabilistic processing on the chapter-level sentence semantic information to obtain importance scores of each sentence in the window, wherein the chapter-level sentence semantic information obtained by modification introduces chapter semantic information before modification, and thus the accuracy of digest extraction can be improved.

In some embodiments, the summarization extraction model is a two-way conversion based summarization extraction model of an encoder, the summarization extraction model includes an embedding layer and a conversion layer, the processor 72 is further configured to input the preprocessed window into the embedding layer to obtain all of the word semantic features, the sentence semantic features, and the sentence position features in the window, and the processor 72 is further configured to input the word semantic features, the sentence semantic features, and the sentence position features into the conversion layer to obtain chapter semantic information of the window and sentence semantic information of each sentence in the window.

Different from the embodiment, the word semantic features, the sentence semantic features and the sentence position features in the window are obtained by inputting the preprocessed window into the embedding layer, so that the word semantic features, the sentence semantic features and the sentence position features are sent into the conversion layer to obtain the chapter semantic information of the window and the sentence semantic information of each sentence in the window, the sentence semantic information is corrected by subsequently adopting the chapter semantic information of the window, and the accuracy of abstract extraction is improved.

In some embodiments, processor 72 is also configured to train a bi-directional transform-based encoder model suitable for summarization; the processor 72 is further configured to construct a summarization model using a bi-directional transform-based encoder model adapted for summarization; the processor 72 is also configured to train the summarization extraction model using a text summarization training set.

Different from the foregoing embodiment, the bidirectional conversion-based encoder model suitable for abstract extraction is trained, so that the abstract extraction model is constructed by using the bidirectional conversion-based encoder model suitable for abstract extraction, and then the abstract extraction model is trained by using text abstract, so that the extraction model for abstract extraction can be trained.

In some embodiments, the batch size is greater than a first preset value; the loss function comprises a loss function for predicting whether two sentences are in the same chapter; the length of the sequence of each training is larger than a second preset value; dynamically allocating the processing mode of the shielding words for each training; the processing mode of the shielding words comprises discarding the shielding words; phrases and/or named entities are employed as masking words.

Different from the previous embodiment, the batch processing size is set to be larger than the first preset value, so that the model can be trained more fully; setting the loss function to include a loss function for predicting whether two sentences are in the same chapter, so that semantic understanding of the model to the chapter can be enhanced; the length of the sequence trained each time is set to be larger than a second preset value, so that the model can adapt to long texts; a processing mode of dynamically distributing the shielding words in each training is set, so that the material information can be better utilized, and the generalization capability of the model is improved; by employing phrases and/or named entities as masking words, more a priori knowledge can be incorporated.

In some embodiments, processor 72 is also configured to perform domain refinement on the abstracted bi-directional transform-based encoder model using a text abstraction training set.

Different from the foregoing embodiment, the field fine tuning is performed on the abstracted bidirectional conversion-based encoder model by using the text abstract training set, so that the model can better learn the semantic representation related to the field.

In some embodiments, the processor 72 is further configured to obtain features of a plurality of sentences ranked top by importance score from high to low; the processor 72 is further configured to combine at least some of the sentences according to the maximum abstract length to obtain a plurality of sentence combinations, where the length of each sentence combination is smaller than or equal to the maximum abstract length; the processor 72 is further configured to predict each sentence combination by using the trained scoring regressor to obtain a prediction quality score; the processor 72 is also arranged to select as the summary of the text one of the sentence combinations with the highest predictive quality score.

Different from the embodiment, the method and the device have the advantages that the text is predicted by using the abstract extraction model to obtain the importance score of each sentence in the text, so that the characteristics of a plurality of sentences with the importance scores ranked from high to low are obtained, at least part of the sentences are combined according to the maximum abstract length to obtain a plurality of sentence combinations, the length of each sentence combination is smaller than or equal to the maximum abstract length, each sentence combination is predicted by using a trained scoring regressor to obtain the predicted quality score, one sentence combination with the highest predicted quality score is selected as the abstract of the text, and the quality of the abstract can be improved under the condition that the abstract length of the text is limited.

In some embodiments, processor 72 is further configured to predict the text with the reference abstract using an abstract extraction model to obtain a predicted abstract of the text; the processor 72 is also for calculating a predicted quality score for the predicted digest of the text using the reference digest; the processor 72 is also configured to train a scoring regressor using the features and predicted quality scores of each sentence in the prediction summary.

Different from the embodiment, the text with the reference abstract is predicted by using the abstract extraction model to obtain the prediction abstract of the text, and the prediction quality score of the prediction abstract of the text is calculated by using the reference abstract, so that the score regression is trained by using the characteristics and the prediction quality score of each sentence in the prediction abstract, the prediction quality score of each sentence can be predicted accurately in the follow-up process, and the abstract extraction quality can be improved.

Referring to fig. 8, fig. 8 is a schematic diagram of a training apparatus 80 for abstract extraction model according to an embodiment of the present application. The abstract extraction model training device 80 comprises a memory 81 and a processor 82 which are coupled to each other, the memory 81 stores program instructions, and the processor 82 is used for executing the program instructions to implement the steps in any of the above-mentioned embodiments of the abstract extraction model training method.

In particular, the processor 82 is configured to control itself and the memory 81 to implement the steps in any of the above-described embodiments of the digest extraction method. Processor 82 may also be referred to as a CPU (Central Processing Unit). The processor 82 may be an integrated circuit chip having signal processing capabilities. The Processor 82 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 82 may be commonly implemented by a plurality of integrated circuit chips.

In this embodiment, the processor 82 is configured to train a bi-directional transform-based encoder model suitable for summarization; the processor 82 is further configured to construct a summarization model using the bi-directional transform-based encoder model adapted for summarization; the processor 82 is also configured to train the summarization extraction model using a text summarization training set.

In some embodiments, the training set of the bi-directional transform-based encoder model adapted for summarization comprises at least one of: the batch processing size is larger than a first preset value; the loss function comprises a loss function for predicting whether two sentences are in the same chapter; the length of the sequence of each training is larger than a second preset value; dynamically allocating the processing mode of the shielding words for each training; the processing mode of the shielding words comprises discarding the shielding words; phrases and/or named entities are employed as masking words.

Different from the previous embodiment, the batch size is set to be larger than the first preset value, so that the model can be trained more fully; setting the loss function to include a loss function for predicting whether two sentences are in the same chapter, so that semantic understanding of the model to the chapter can be enhanced; the length of the sequence trained each time is set to be larger than a second preset value, so that the model can adapt to long texts; a processing mode of dynamically distributing the shielding words in each training is set, so that the material information can be better utilized, and the generalization capability of the model is improved; by employing phrases and/or named entities as masking words, more a priori knowledge can be incorporated.

In some embodiments, the processor 82 is also configured to perform domain refinement on a bi-directional transform-based encoder model suitable for summarization using a text summarization training set.

Unlike the foregoing embodiments, by performing domain fine-tuning on a bi-directional conversion-based encoder model suitable for abstract extraction using a text abstract training set, the model can better learn domain-related semantic representations.

Referring to fig. 9, fig. 9 is a block diagram illustrating an embodiment of a computer-readable storage medium 90 according to the present application. The computer readable storage medium 90 stores program instructions 91, and the program instructions 91 when executed by the processor implement the steps in any of the above-described summarization extraction method embodiments, or implement the steps in any of the above-described summarization extraction model training method embodiments.

According to the scheme, the quality of abstract extraction can be improved.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. A method for abstracting an abstract is characterized by comprising the following steps:

sentence dividing is carried out on the text to obtain a sentence list;

sequentially carrying out window division on the sentence list according to a preset length to obtain a plurality of windows, wherein the preset length is the maximum sequence length supported by the abstract extraction model, each window comprises a plurality of continuous sentences, the length of each window is less than or equal to the preset length, and two adjacent windows comprise at least one same sentence;

predicting each window by utilizing the abstract extraction model to obtain the importance score of each sentence in each window;

determining an importance score for each sentence in the text using the importance score for each sentence in each of the windows;

selecting at least one of the sentences ranked top from high to low in the importance score as a summary of the text.

2. The method of claim 1,

said determining an importance score for each sentence in said text using said importance score for each sentence in each said window comprises:

if the sentence only exists in one window, taking the importance score of the sentence in the window as the importance score of the sentence;

and if the sentence exists in at least two windows, taking the average value of the importance scores of the sentence in all the windows as the importance score of the sentence.

3. The method of claim 1,

the window length is less than or equal to the preset length, and if a next sentence is added to the window, the window length is greater than the preset length, and the next sentence is an adjacent sentence after the last sentence of the window.

4. The method of claim 1,

the predicting each window by using the abstract extraction model comprises:

preprocessing the window;

inputting the preprocessed window into the abstract extraction model to obtain chapter semantic information of the window and sentence semantic information of each sentence in the window;

utilizing the chapter semantic information to correct the sentence semantic information of each sentence to obtain chapter-level sentence semantic information of each sentence;

and respectively carrying out probabilistic processing on the chapter-level sentence semantic information to obtain the importance score of each sentence in the window.

5. The method of claim 4,

the abstract extraction model is an abstract extraction model of a bidirectional conversion-based encoder, the abstract extraction model comprises an embedded layer and a conversion layer, the window after pretreatment is input into the abstract extraction model, and obtaining chapter semantic information of the window and sentence semantic information of each sentence in the window comprises the following steps:

inputting the preprocessed window into the embedding layer to obtain all word semantic features, sentence semantic features and sentence position features in the window;

and sending the word meaning characteristic, the sentence meaning characteristic and the sentence position characteristic into the conversion layer to obtain chapter meaning information of the window and sentence meaning information of each sentence in the window.

6. The method according to any one of claims 1 to 5,

before the predicting each of the windows by using the abstract extraction model, the method further includes:

training a bidirectional conversion-based encoder model suitable for abstract extraction;

constructing the abstract extraction model by using the encoder model which is suitable for abstract extraction and is based on bidirectional conversion;

and training the abstract extraction model by using a text abstract training set.

7. The method of claim 6,

the training setup of the bidirectional conversion-based encoder model suitable for summarization comprises at least one of:

the batch processing size is larger than a first preset value;

the loss function comprises a loss function for predicting whether two sentences are in the same chapter;

the length of the sequence of each training is larger than a second preset value;

dynamically allocating the processing mode of the shielding words for each training;

the processing mode of the shielding words comprises discarding the shielding words;

phrases and/or named entities are employed as masking words.

8. The method of claim 6,

the constructing the summarization model by using the bidirectional conversion-based encoder model suitable for summarization further comprises:

and performing field fine adjustment on the encoder model which is suitable for abstract extraction and is based on bidirectional conversion by utilizing the text abstract training set.

9. The method according to any one of claims 1 to 5,

said selecting at least one of said sentences having the highest importance scores as a summary of said text comprises:

acquiring the characteristics of a plurality of sentences which are ranked from high to low according to the importance scores and are in the top order;

combining at least part of the sentences according to the maximum abstract length to obtain a plurality of sentence combinations, wherein the length of each sentence combination is less than or equal to the maximum abstract length;

predicting each sentence combination by using a trained scoring regressor to obtain a prediction quality score;

selecting one of the sentence combinations with the highest prediction quality score as the abstract of the text.

10. The method of claim 9,

the method for predicting each sentence combination by using the trained scoring regressor to obtain the prediction quality score further comprises the following steps:

predicting the text with the reference abstract by using the abstract extraction model to obtain a predicted abstract of the text;

calculating a prediction quality score of a prediction summary of the text using the reference summary;

and training the scoring regressor by using the characteristics of each sentence in the prediction abstract and the prediction quality score.

11. A method for training an abstract extraction model is characterized by comprising the following steps:

constructing a abstract extraction model by using the encoder model suitable for abstract extraction and based on bidirectional conversion;

12. The method of claim 11,

the batch processing size is larger than a first preset value;

dynamically allocating a processing mode of the shielding word for each training;

phrases and/or named entities are employed as masking words.

13. The method of claim 11,

14. A method for extracting an abstract is characterized by comprising the following steps:

predicting a text by using the abstract extraction model to obtain an importance score of each sentence in the text;

combining at least part of the sentences according to the maximum abstract length to obtain a plurality of sentence combinations, wherein the length of each sentence combination is smaller than or equal to the maximum abstract length;

15. The method of claim 14,

and training the scoring regressor by utilizing the characteristics of each sentence in the prediction abstract and the prediction quality score.

16. A digest extraction apparatus comprising a memory and a processor coupled to each other, the memory storing program instructions,

the processor is configured to execute the program instructions to implement the method of any of claims 1-10, 14-15.

17. A summarization extraction model training device, comprising a memory and a processor coupled to each other, the memory storing program instructions,

the processor is configured to execute the program instructions to implement the method of any of claims 11-13.

18. A computer-readable storage medium storing program instructions, which when executed by a processor implement the method of any one of claims 1-15.