CN112860881A

CN112860881A - Abstract generation method and device, electronic equipment and storage medium

Info

Publication number: CN112860881A
Application number: CN201911182819.3A
Authority: CN
Inventors: 刘龑龙; 佟津乐; 谢海华
Original assignee: Pku Founder Information Industry Group Co ltd; Peking University Founder Group Co Ltd
Current assignee: Pku Founder Information Industry Group Co ltd; Peking University Founder Group Co Ltd
Priority date: 2019-11-27
Filing date: 2019-11-27
Publication date: 2021-05-28

Abstract

The invention provides a method and a device for generating an abstract, electronic equipment and a storage medium, wherein the method for generating the abstract comprises the following steps: the method comprises the steps of extracting at least one abstract sentence from a plurality of text chapters, carrying out full arrangement on the at least one abstract sentence to generate a first candidate abstract set, calculating the smoothness of each first candidate abstract in the first candidate abstract set by using a language model, and outputting the first candidate abstract with the maximum smoothness as an abstract. In the abstract generating method, the candidate abstract is processed by using the language model to output the smoothness of each candidate abstract, and then the candidate paragraph with the best smoothness is output as the abstract, so that the generated abstract has good logic order, high smoothness and good readability.

Description

Abstract generation method and device, electronic equipment and storage medium

Technical Field

The invention relates to the field of automatic abstractions in natural language processing, in particular to an abstract generating method and device, electronic equipment and a storage medium.

Background

The multi-document automatic summarization refers to that the main information of a plurality of texts under the same subject is abstracted into a summary by compression.

The existing multi-document automatic abstract generation method mainly adopts an extraction type automatic abstract generation technology, namely, a plurality of subject terms are extracted from a plurality of documents, the subject terms appearing at different positions in the documents are concatenated into a paragraph, and finally an abstract is generated.

However, the existing multi-document automatic summary generation method is not sorted because the subject words at different positions are directly dropped into a paragraph, so that the generated summary lacks logical order and has poor readability.

Disclosure of Invention

The invention provides a summary generation method, a device, electronic equipment and a storage medium, which are used for solving the technical problem that the generated summary lacks a logic order because the existing summary generation method is not sequenced.

In a first aspect, the present invention provides a method for generating a summary, including:

extracting at least one abstract sentence from a plurality of text chapters;

fully arranging at least one abstract sentence to generate a first candidate abstract set; wherein the first candidate summary set comprises at least one first candidate summary;

and calculating the compliance degree of each first candidate abstract in the first candidate abstract set by utilizing a language model so as to obtain the first candidate abstract with the maximum compliance degree.

Optionally, calculating a compliance degree of each first candidate summary in the first candidate summary set by using a language model to obtain a first candidate summary with a maximum compliance degree, specifically including:

for each first candidate abstract, extracting partial abstract sentences from the first candidate abstract to generate a second candidate abstract;

processing the second candidate abstract by using the language model to obtain the smoothness of the second candidate abstract;

selecting part of second candidate digests according to the popularity;

and calculating the compliance degree of the first candidate abstract corresponding to the selected part of the second candidate abstract by using a language model to obtain the first candidate abstract with the maximum compliance degree.

Optionally, extracting a partial abstract sentence from the first candidate abstract to generate a second candidate abstract, which specifically includes:

and intercepting the first M summary sentences of the first candidate summary to generate a second candidate summary, wherein M is a positive integer and is less than or equal to N.

Optionally, calculating a compliance degree of each first candidate summary in the first candidate summary set by using the language model specifically includes:

processing the first candidate abstract by using a language model to obtain N probabilities;

summing the N probabilities to obtain a compliance of the first candidate summary;

wherein, the ith probability in the N probabilities represents that the ith position is a summary sentence x_iN represents the number of abstract sentences, i is greater than or equal to 1 and less than or equal to N.

Optionally, summing the N probabilities to obtain the compliance of the first candidate summary includes: calculating the smoothness of the first candidate abstract according to the following formula;

wherein, P (x)_i|x₁,x₂,…,x_i-1,x_i+1,…,x_N) The ith position is shown as a summary sentence x_iThe probability of (c).

Optionally, the extracting at least one abstract sentence from a plurality of text chapters specifically includes:

extracting at least one subject term from a plurality of text chapters;

calculating the correlation between the subject word and the text chapter;

a portion of the text runs is selected based on the relevance, and at least one abstract sentence is extracted from the portion of the text runs.

Optionally, calculating the relevancy between the subject term and the text chapter specifically includes:

calculating the correlation degree of the subject words and the text chapters aiming at each subject word and each text chapter;

for each text chapter, the sum of all the relevancy degrees corresponding to the text chapter is calculated to obtain the relevancy degrees of the subject word and the text chapter.

In a second aspect, the present invention provides a digest generation apparatus, including:

the extraction module is used for extracting at least one abstract sentence from a plurality of text chapters;

the arrangement module is used for carrying out full arrangement on at least one abstract sentence to generate a first candidate abstract set; wherein the first candidate summary set comprises at least one first candidate summary;

and the calculation module is used for calculating the compliance degree of each first candidate abstract in the first candidate abstract set by utilizing the language model so as to obtain the first candidate abstract with the maximum compliance degree.

Optionally, the calculation module is specifically configured to:

and selecting part of second candidate digests according to the size of the compliance degree, and calculating the compliance degree of the first candidate digests corresponding to the selected part of second candidate digests by using a language model so as to output the first candidate digests with the maximum compliance degree as the digests.

Optionally, the calculation module is specifically configured to:

wherein, the ith probability in the N probabilities represents that the ith position is a summary sentence x_iI is more than or equal to 1 and less than or equal to N.

Optionally, the calculation module is specifically configured to:

calculating the smoothness of the first candidate abstract according to the following formula;

Optionally, the extraction module is specifically configured to:

extracting at least one subject term from a plurality of text chapters;

calculating the correlation between the subject word and the text chapter;

Optionally, the extraction module is specifically configured to:

In a third aspect, the present invention provides an electronic device, comprising:

a memory for storing a program;

a processor for executing the program stored in the memory, the processor being adapted to perform the summary generation method according to the first aspect and the alternative when the program is executed.

In a fourth aspect, the present invention provides a computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the summary generation method according to the first aspect and the alternative.

In the abstract generating method, a language model is used for processing candidate abstract to output the smoothness of each candidate abstract, and then the candidate paragraph with the best smoothness is taken as abstract output, so that the generated abstract has good logical order, high smoothness and good readability. In addition, a second candidate abstract with fewer abstract sentences is formed by intercepting the first candidate abstract, the number of the second candidate abstract is far smaller than that of the first candidate abstract due to the fact that the formed second candidate abstract has the same number, the second candidate abstract is trained by utilizing a language model, the first candidate abstract corresponding to the second candidate abstract with high compliance is selected to calculate compliance, and the calculation scale is greatly reduced.

Drawings

FIG. 1 is a flowchart illustrating a summary generation method according to an exemplary embodiment of the present invention;

FIG. 2 is a flowchart illustrating a digest generation method according to another exemplary embodiment of the present invention;

FIG. 3 is a schematic diagram of a summary generation method shown in the embodiment of FIG. 2;

FIG. 4 is a schematic diagram illustrating a structure of a summary generation apparatus according to an exemplary embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The existing multi-text automatic abstract generation method adopts an extraction type technology, namely, a subject word is extracted from a text, the subject words at different positions in the text are connected into paragraphs, and the generated paragraphs have no logical order and poor readability because the sequencing processing is not carried out.

The invention conception of the abstract generation method provided by the invention is as follows: firstly, the candidate abstracts are processed by using a language model to output the smoothness of each candidate abstract, and then the candidate paragraphs with the best smoothness are output as abstracts, so that the generated abstracts have good logical order, high smoothness and good readability. Secondly, if the smoothness is calculated by using the full arrangement of the abstract sentences, the calculation scale is

The method comprises the steps of firstly calculating the smoothness of partial sentences of a first candidate abstract, selecting the first candidate abstract corresponding to the sentences with high smoothness to calculate the smoothness, and reducing the calculation scale of the smoothness.

Fig. 1 is a flowchart illustrating a digest generation method according to an exemplary embodiment of the present invention. As shown in fig. 1, the present invention provides a digest generation method, including:

s101, extracting at least one abstract sentence from a plurality of text chapters.

Wherein the plurality of text chapters refer to a plurality of text chapters related in content composed of natural language sentences. For example: a plurality of content-related news stories. The method for extracting at least one abstract sentence from a plurality of text chapters specifically comprises the following steps: in each text chapter, a summary sentence is extracted from the text chapter according to the importance of each sentence.

S102, at least one abstract sentence is arranged completely to generate a first candidate abstract set.

The fully arranging at least one abstract sentence to generate a first candidate abstract set specifically includes: and (4) all the abstract sentences extracted in the step (S101) are arranged completely to generate a first candidate abstract set, wherein the first candidate abstract set comprises at least one first candidate abstract. For example: there are 4 summary sentences, and the 4 summary sentences are arranged completely to generate 120 first candidate summaries.

S103, calculating the smoothness of each first candidate abstract by using a language model; and outputting the first candidate summary with the highest popularity as the summary.

The language model can be a neural network language model, the neural network language model is trained through a large number of sentence paragraphs, and then the trained language model is used for calculating the smoothness of the first candidate abstract. Here, the calculation of the compliance of the first candidate digest using the neural network language model is not limited, and may be performed using another language model. And after the smoothness is obtained, outputting the first candidate summary with the maximum smoothness as the summary.

In the summary generation method provided by the invention, at least one summary sentence is arranged completely to obtain a first candidate summary set, the smoothness of each first candidate summary in the first candidate summary set is calculated by using a language model, the first candidate summary with the largest smoothness is selected for output, and the generated summary has good logic order and high readability.

Fig. 2 is a flowchart illustrating a digest generation method according to another exemplary embodiment of the present invention. As shown in fig. 2, the summary generation method provided by the present invention includes the following steps:

s201, extracting at least one subject term from a plurality of text chapters.

The method includes acquiring a plurality of text chapters, and performing denoising processing on each text chapter, for example: and removing the title, remarks, references, pictures, messy codes and other parts in the text chapters, and only keeping the text part. And then, the text chapters are subjected to de-duplication treatment to remove the duplicated text chapters. And finally, extracting the subject term from the text chapters. In this embodiment, the subject word can be extracted by utilizing LDA algorithm, TF-IDF algorithm and Textrank algorithm. Here, the type of algorithm used is not limited. After the subject term is extracted, the subject term with high importance can be selected according to the importance degree of the subject term, and the relevance of the subject term and the text chapter is further calculated.

For example: a plurality of hot news of 'the father of the romantic wonder comes to the death' are collected, and 33 news are collected. Each article in 33 news is divided into words, stop words are removed, the articles are connected in series to form character strings, subject words are extracted, the first 5 articles are selected, and the examples are selected as 'romantic Wei', 'Stanli', 'died', 'cartoon' and 'movie'.

S202, calculating the correlation degree of the subject term and the text chapters.

Wherein, calculating the correlation between the subject term and the text chapter specifically comprises: calculating the correlation degree of the subject words and the text chapters aiming at each subject word and each text chapter; for each text chapter, the sum of all the relevancy degrees corresponding to the text chapter is calculated to obtain the relevancy degrees of the subject word and the text chapter. In this embodiment, the correlation between the subject word and the text chapter can be calculated by using LDA algorithm, TF-IDF algorithm and Textrank algorithm. Here, the type of algorithm used is not limited.

For example: an article is claused with a period (|), exclamation point (|), question mark (. Remove sentences with less than 10 words and an end sign question? The sentence (1). Stop words are removed, for example: also, it is equal. Calculating sentence vectors according to TF-IDF algorithm, and calculating the similarity between the article and the subject term by cosine similarity calculation method

S203, selecting partial text chapters according to the relevance, and extracting at least one abstract sentence from the partial text chapters.

Wherein, selecting partial text chapters according to the relevance specifically comprises: and sequencing the relevancy of the subject word and the text chapters, arranging the text chapters with high relevancy in the front and the text chapters with low relevancy in the back, and selecting the front L text chapters for merging to form the merged document. If the text chapters under the subject word are lower than the L chapters, all the text chapters are combined before.

Extracting at least one abstract sentence from a partial text chapter, which specifically comprises the following steps: first, a sentence is divided into a merged document, the sentence length is filtered, and too short and too long sentences are deleted, and sentences unsuitable for being used as abstract sentences are deleted, for example: an interrogative sentence. And secondly, segmenting the sentence and removing stop words. And finally, calculating the importance of the sentence.

In this implementationIn the example, sentence importance is calculated using the Textrank algorithm. Specifically, a graph model is constructed, and the TextRank algorithm model can be represented as a directed weighted graph G ═ (V, E), wherein points represent sentences and are collected by points V { (V) }₁，v₂，……，v_vAnd the set of edges E { E }₁，e₂，……，e_eAnd E is a subset of F, wherein F represents a set formed by edges between any two points, and the number of elements in the set is v multiplied by v. Any two points v in the figure_i、v_jThe weight of the edge in between is w_ijFor a given point v_i，In(v_i) Is a point v of direction_iSet of points of (v), Out (v)_i) Is a point v_iA set of pointed to points. Point v_iThe score of (c) is defined as follows:

wherein d is a damping coefficient, the value range is 0 to 1, which represents the probability of pointing to any other point from a certain point in the graph, and the value is generally 0.85.

According to the formula, the score of each sentence is calculated by iterative propagation weight so as to obtain the importance of each sentence, and the sentence with high importance is selected as the abstract sentence.

In this embodiment, after the importance of each sentence is obtained, the redundancy removal processing may be performed on the sentence, specifically, the word segmentation processing is performed on the sentence, stop words are removed, the word repetition rate after the word segmentation is compared, if the repetition rate reaches a preset value, one of the sentences is deleted, and then the sentence with the high importance is selected as the abstract sentence.

For example: and (4) sorting according to the sum of the similarity of the articles corresponding to each subject term, selecting the first 3 documents with high relevance, and combining the documents to serve as an abstract article set. Wherein, the selected news is: the book "father of romantic Power" was reached, the father of romantic Power, Liangying, immortal spirit, the father of romantic Power "Stan. Liangying half of Angel and half of devil". The sentences in 3 documents are scored and sorted according to the Textrank algorithm, and 10 sentences are selected as the abstract sentence set. And (3) segmenting words of 10 abstract sentences, comparing the word repetition rate after segmentation, deleting one of the 10 abstract sentences when the word repetition rate reaches more than 50%, and taking the abstract sentence with the importance degree ranked in the top 4 as the final abstract.

S204, at least one abstract sentence is arranged completely to generate a first candidate abstract set.

This step is the same as S102 in the embodiment shown in fig. 1, and is not repeated here.

S205, calculating the smoothness of each first candidate abstract by using a language model; and outputting the first candidate summary with the highest popularity as the summary.

The abstract sentences are selected from different articles, and in order to ensure the continuity and readability of the final abstract sentences, the abstract sentences need to be sequenced. The language model is adopted to convert the problem of sequencing the abstract sentences into: and calculating the smoothness of the candidate abstract formed by arranging different abstract sentences. The candidate abstract with the highest popularity is the final abstract.

The method includes the following steps that the compliance degree of each candidate abstract is calculated by using a language model, and the first candidate abstract with the maximum compliance degree is output as an abstract, and specifically includes the following steps:

s3001, for each first candidate summary, extracting partial summary sentences from the first candidate summary to generate a second candidate summary.

Extracting partial abstract sentences from the first candidate abstract to generate a second candidate abstract, wherein the method specifically comprises the following steps: and intercepting the partial abstract sentence of the first candidate abstract to generate a second candidate abstract. Optionally, the first M summary sentences of the first candidate summary are intercepted to generate a second candidate summary, where M is a positive integer, M is less than or equal to N, and N is the number of summary sentences in the first candidate summary.

For example: if there are 4 summary sentences, there are 24 first candidate summaries in the first candidate summary set, and for each first candidate summary, the first 2 summary sentences are selected to form the second candidate summary. The second candidate digests are S1+ S2, S2+ S1, S2+ S3, S2+ S4, S1+ S3, S1+ S4, S3+ S1, S3+ S2, S3+ S4, S4+ S1, S4+ S2, S4+ S3. Wherein, S1 to S4 respectively represent the first to fifth abstract sentences.

S3002, processing the second candidate abstract by using the language model to obtain the compliance degree of the second candidate abstract.

The language model is the same as S103 in the embodiment shown in fig. 1, and is not described here again.

S3003, selecting partial second candidate digests according to the size of the compliance degree, and calculating the compliance degree of the first candidate digests corresponding to the selected partial second candidate digests by using the language model so as to output the first candidate digests with the maximum compliance degree as the digests.

And the second candidate digests are sequentially ranked according to the passing degree from high to low, the L first candidate digests ranked in the front are selected, and the first candidate digests corresponding to the selected second digests are determined. For example: firstly, calculating the compliance of a second candidate summary formed by the 12 sentence combinations, sorting the compliance from high to low, and deleting the Z combination modes with low compliance. Assuming that Z is 9, there are three remaining sorting modes, assumed as S1+ S2, S2+ S1, S2+ S3. Then, the method of sequencing all the abstract sentences corresponding to the three sequencing methods is determined. There are 6 kinds of sorting ways, specifically: s1+ S2 corresponds to S1+ S2+ S3+ S4, S1+ S2+ S4+ S3. S2+ S1 corresponds to S2+ S1+ S3+ S4, S2+ S1+ S4+ S3. S2+ S3 corresponds to S2+ S3+ S1+ S4, S2+ S3+ S4+ S1. And (4) calculating the smoothness of the sorting modes in the step (6), and selecting the sorting mode with high smoothness as the final abstract.

The method for calculating the compliance of each first candidate abstract by using the language model specifically comprises the following steps:

the first candidate abstract is processed by a language model to obtain N probabilities, and the N probabilities are summed to obtain the compliance of the first candidate abstract.

The compliance degree of the first candidate abstract is calculated according to the following formula:

wherein, P (x)_i|x₁,x₂,…,x_i-1,x_i+1,…,x_N) Watch (A)Shows the ith position as a summary sentence x_iProbability of (x)_iShowing the ith abstract sentence.

According to the formula, the larger the value of P (S), the higher the smoothness of the sentence, and the better the logic and fluency.

Fig. 3 is a schematic diagram illustrating the principle of the abstract generation method shown in fig. 2, in which a plurality of text chapters are performing text data, and then performing subject word relevancy calculation on the text to screen out the text related to the subject word. Extracting abstract sentences from the text to obtain candidate abstract sentences, performing redundancy removal processing on the abstract sentences to obtain selected candidate abstract sentences, sequencing the candidate abstract sentences, and inputting the sequenced result into a language model for calculation to generate an abstract.

In the digest generation method provided in this embodiment, a second candidate digest with fewer digest sentences is formed by intercepting the first candidate digest, and since the formed second candidate digests are the same and the number of the second candidate digests is much smaller than that of the first candidate digest, the language model is used to train the second candidate digest, and the first candidate digest corresponding to the second candidate digest with high compliance is selected to calculate compliance, thereby greatly reducing the calculation scale.

By utilizing the abstract generating method provided by the invention, the abstracts of a plurality of news with more than 40 subjects, such as 'naked donation of weekly wetting', 'depression concern', 'hero alliance capture', 'doctor-patient contradiction', 'university student employment', and the like are extracted, the automatically generated abstract of the short text is evaluated manually, the fluency of the abstract and the important point coverage rate of the plurality of news are taken as standards, the result is compared with a TextRank algorithm (hereinafter called algorithm 1, the algorithm core is that abstract sentence extraction and abstract sentence redundancy processing), and more than 85% of the processing results of the plurality of news are better than the processing results of the algorithm 1. One example is selected below for detailed description.

A total of 33 news "the father of the romantic world" series were searched in the web page. The summary generated using algorithm 1 is: in the heat mapping venom, Stan and Li have no missing part, and at the end of the film, Stan and Li perform a dog walking passer while walking the dog, so male is warned not to give up. Stant-Li is a legendary character in the comic field, has become one of the important marks of the American popular culture, creates 80% of the well-known roles of romantic power, and brings the dream of hero in countless people. Yesterday early, a sad message is transmitted by the movie circle: the world of the romantic power is created by one hand, super heroic comics of spider knight-errands, X-war policemen, Thymus, iron and steel knight-errands, magic four knight-errands, green giant people, panther and the like are created, the world comes to death in a medical center of Hollywood at local time Monday, and the world enjoys 95 years old.

The abstract generated by the abstract generation method provided by the invention is as follows: the parent of the romantic Power, namely Stant and plum, has one less active shadow from the romantic and cinematographic world, and the old who appears as a lover in each romantic super hero movie and becomes a colored egg and cast fun to leave. Stant-Li is a legendary character in the comic field, has become one of the important marks of the American popular culture, creates 80% of the well-known roles of romantic power, and brings the dream of hero in countless people. In the current year of commercial wealth of free-range, Stan-Li launches the free-range universe, in the parallel universe, ironmen, American captain and green giant build members of a 'revenge alliance', then the Thor, the Ant and the like are added, and perhaps the Stan-Li at that time does not realize that the series of free-range characters derive huge commercial values in the next decades.

Compared with the algorithm 1, the summary generation method provided by the invention has the advantages that the content consistency and the generalization are higher than those of the algorithm 1 when the summary generated by processing a plurality of documents is compared with the algorithm 1, the method is in accordance with the condition of writing the summary of the short text, is reasonable and effective, and has outstanding and remarkable effect, good use value and application prospect.

Fig. 4 is a schematic structural diagram of a summary generation apparatus according to an exemplary embodiment of the present invention. As shown in fig. 4, the present invention provides a summary generation apparatus 400, wherein the apparatus 400 includes:

an extraction module 401 for extracting at least one abstract sentence from a plurality of text chapters;

a ranking module 402, configured to perform full ranking on at least one abstract sentence to generate a first candidate abstract set; wherein the first candidate summary set comprises at least one first candidate summary;

a calculating module 403, configured to calculate, by using a language model, a compliance degree of each first candidate summary in the first candidate summary set to obtain a first candidate summary with a maximum compliance degree.

Optionally, the calculating module 403 is specifically configured to:

and intercepting partial abstract sentences of the first candidate abstract to generate a second candidate abstract.

Optionally, the calculating module 403 is specifically configured to:

Optionally, the extraction module 401 is specifically configured to:

extracting at least one subject term from a plurality of text chapters;

calculating the correlation between the subject word and the text chapter;

Optionally, the extraction module 401 is specifically configured to:

Fig. 5 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present invention. As shown in fig. 5, the electronic device 500 of the present embodiment includes: a processor 501 and a memory 502.

Memory 502 for storing computer execution instructions;

the processor 501 is configured to execute computer-executable instructions stored in the memory to implement the steps performed by the receiving device in the above embodiments. Reference may be made in particular to the description relating to the method embodiments described above.

Alternatively, the memory 502 may be separate or integrated with the processor 501.

When the memory 502 is separately provided, the electronic device 500 further includes a bus 503 for connecting the memory 502 and the processor 501.

The embodiment of the present invention further provides a computer-readable storage medium, in which computer execution instructions are stored, and when a processor executes the computer execution instructions, the abstract generation method is implemented.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for generating a summary, comprising:

extracting at least one abstract sentence from a plurality of text chapters;

fully arranging the at least one abstract sentence to generate a first candidate abstract set; wherein the first set of candidate digests comprises at least one first candidate digest;

and calculating the compliance degree of each first candidate abstract in the first candidate abstract set by using a language model to obtain the first candidate abstract with the maximum compliance degree.

2. The method according to claim 1, wherein the calculating, by using a language model, a compliance degree of each first candidate summary in the first candidate summary set to obtain a first candidate summary with a maximum compliance degree specifically includes:

processing the second candidate abstract by using a language model to obtain the smoothness of the second candidate abstract;

selecting part of the second candidate digests according to the size of the popularity;

and calculating the compliance degree of the first candidate abstract corresponding to the second candidate abstract of the selected part by using the language model to obtain the first candidate abstract with the maximum compliance degree.

3. The method according to claim 2, wherein the extracting partial abstract sentences from the first candidate abstract to generate a second candidate abstract specifically comprises:

and extracting the first M summary sentences of the first candidate summary to generate a second candidate summary, wherein M is a positive integer and is less than or equal to N.

4. The method according to any one of claims 1 to 3, wherein the calculating, by using a language model, the compliance of each first candidate summary in the first candidate summary set specifically includes:

5. The method according to claim 4, wherein said summing the N probabilities to obtain the compliance of the first candidate summary specifically comprises: calculating the compliance degree of the first candidate abstract according to the following formula;

6. The method of any of claims 1 to 3, wherein the extracting at least one abstract sentence from a plurality of text chapters comprises:

extracting at least one subject term from the plurality of text chapters;

calculating the correlation degree of the subject term and the text chapters;

and selecting part of the text chapters according to the relevance, and extracting at least one abstract sentence from the part of the text chapters.

7. The method of claim 6, wherein the calculating the relevancy of the subject term and the text chapter comprises:

calculating the relevance of the subject word and the text chapters aiming at each subject word and each text chapter;

and calculating the sum of all the relevance degrees corresponding to the text chapters aiming at each text chapters to obtain the relevance degrees of the subject word and the text chapters.

8. An apparatus for generating a summary, comprising:

the arrangement module is used for carrying out full arrangement on the at least one abstract sentence to generate a first candidate abstract set; wherein the first set of candidate digests comprises at least one first candidate digest;

and the calculation module is used for calculating the compliance of each first candidate abstract in the first candidate abstract set by using a language model so as to obtain the first candidate abstract with the maximum compliance.

9. An electronic device, comprising:

a memory for storing a program;

a processor for executing the program stored in the memory, the processor being configured to perform the digest generation method according to any one of claims 1 to 7 when the program is executed.

10. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the digest generation method according to any one of claims 1 to 7.