CN115221863A

CN115221863A - Text abstract evaluation method and device and storage medium

Info

Publication number: CN115221863A
Application number: CN202210844106.4A
Authority: CN
Inventors: 蔡晓东; 蒋鹏
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2022-07-18
Filing date: 2022-07-18
Publication date: 2022-10-21
Anticipated expiration: 2042-07-18
Also published as: CN115221863B

Abstract

The invention provides a text abstract evaluation method, a text abstract evaluation device and a storage medium, which belong to the field of language processing, and the method comprises the following steps: preprocessing an original Chinese text to obtain a processed text; analyzing the abstract key information coverage rate of all original Chinese texts and all processed texts to obtain abstract key information coverage rate, original text abstract probability distribution and processed text abstract probability distribution; and calculating the evaluation scores of the abstract key information coverage rate, the original text abstract probability distribution and the processed text abstract probability distribution to obtain a text abstract evaluation result. The invention can evaluate and generate the abstract more reasonably, and the evaluation result is closer to the manual evaluation, thereby enabling the evaluation content to be more flexible and reasonable.

Description

Text abstract evaluation method and device and storage medium

Technical Field

The invention mainly relates to the technical field of language processing, in particular to a text abstract evaluation method, a text abstract evaluation device and a storage medium.

Background

With the development of internet technology, text information is rapidly increasing in the network. In order to enable a user to acquire key contents of text information, text generation technology is applied to generate, for example, abstract generation, a corresponding abstract can be generated according to an original text, and whether the generated abstract can express the intention of the original text needs to be realized by applying an evaluation method. The evaluation method mainly comprises manual evaluation and automatic evaluation. Although manual evaluation is more flexible and reasonable than automatic evaluation, it is time consuming and laborious. Therefore, automatic evaluation methods are applied, such as the ROUGE and BLUE evaluation methods. However, these evaluation methods only use the co-occurrence information between the generated abstract and the reference abstract, do not consider semantic information between them, and take time and labor to produce the reference abstract, and therefore, these reasons all make this evaluation method unsuitable for evaluation of text generation.

Disclosure of Invention

The invention provides a text abstract evaluation method, a text abstract evaluation device and a storage medium, aiming at the defects of the prior art.

The technical scheme for solving the technical problems is as follows: a text abstract evaluation method comprises the following steps:

importing a plurality of original Chinese texts, and respectively preprocessing each original Chinese text to obtain a processed text corresponding to each Chinese text;

analyzing the abstract key information coverage rate of all original Chinese texts and all processed texts to obtain the abstract key information coverage rate corresponding to each original Chinese text, the abstract probability distribution of the original text corresponding to each original Chinese text and the abstract probability distribution of the processed text corresponding to each processed text;

and respectively calculating evaluation scores of the summary key information coverage rate, the probability distribution of the summary of the original text corresponding to each original Chinese text and the probability distribution of the summary of the processed text corresponding to each processed text to obtain the evaluation score corresponding to each original Chinese text, and respectively taking each evaluation score as the text summary evaluation result of each original Chinese text.

Another technical solution of the present invention for solving the above technical problems is as follows: a text summary evaluation apparatus comprising:

the preprocessing module is used for importing a plurality of original Chinese texts and respectively preprocessing each original Chinese text to obtain a processed text corresponding to each Chinese text;

the coverage rate analysis module is used for analyzing the summary key information coverage rate of all the original Chinese texts and all the processed texts to obtain the summary key information coverage rate corresponding to each original Chinese text, the probability distribution of the summary of the original text corresponding to each original Chinese text and the probability distribution of the summary of the processed text corresponding to each processed text;

and the abstract evaluation result obtaining module is used for respectively calculating the evaluation scores of the abstract key information coverage rate, the original text abstract probability distribution corresponding to each original Chinese text and the processed text abstract probability distribution corresponding to each processed text to obtain the evaluation score corresponding to each Chinese text, and respectively taking each evaluation score as the text abstract evaluation result of each original Chinese text.

Another technical solution of the present invention for solving the above technical problems is as follows: a text summary evaluation device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, when executing the computer program, implementing the text summary evaluation method as described above.

Another technical solution of the present invention for solving the above technical problems is as follows: a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a text summary evaluation method as described above.

The beneficial effects of the invention are: the method comprises the steps of preprocessing an original Chinese text to obtain a processed text, analyzing the abstract key information coverage rate of the original Chinese text and the processed text to obtain abstract key information coverage rate, abstract probability distribution of the original text and abstract probability distribution of the processed text, and calculating evaluation scores of the abstract key information coverage rate, the abstract probability distribution of the original text and the abstract probability distribution of the processed text to obtain a text abstract evaluation result.

Drawings

Fig. 1 is a schematic flow chart of a text abstract evaluation method according to an embodiment of the present invention;

fig. 2 is a block diagram of a text abstract evaluation apparatus according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

Fig. 1 is a schematic flow chart of a text abstract evaluation method according to an embodiment of the present invention.

As shown in fig. 1, a text abstract evaluation method includes the following steps:

and respectively calculating the evaluation scores of the summary key information coverage rate, the probability distribution of the summary of the original text corresponding to each original Chinese text and the probability distribution of the summary of the processed text corresponding to each processed text to obtain the evaluation score corresponding to each original Chinese text, and respectively taking each evaluation score as the text summary evaluation result of each original Chinese text.

It should be understood that 360 million fragments of chinese articles (i.e., a plurality of the original chinese text) are crawled across the internet.

In the embodiment, the processed text is obtained by preprocessing the original Chinese text, the abstract key information coverage rate, the abstract probability distribution of the original text and the abstract probability distribution of the processed text are obtained by analyzing the abstract key information coverage rate of the original Chinese text and the processed text, and the text abstract evaluation result is obtained by calculating the evaluation scores of the abstract key information coverage rate, the abstract probability distribution of the original text and the abstract probability distribution of the processed text, so that the generated text can be evaluated more reasonably, and the evaluation result is closer to manual evaluation, thereby enabling the evaluation content to be more flexible and reasonable.

Optionally, as an embodiment of the present invention, the step of respectively preprocessing each original chinese text to obtain a processed text corresponding to each chinese text includes:

respectively deleting words in each original Chinese text at random to obtain deleted texts corresponding to each Chinese text;

and respectively filling a blank word in each deleted text based on a BERT language model to obtain a processed text corresponding to each Chinese text.

It should be appreciated that the BERT language model is capable of masking some words in an article and then doing completion fills in and determining the contextual relationship between the two words.

As will be appreciated, the article pairs are generated using a perturbing regime

(i.e., the original Chinese text and the processed text corresponding to each of the Chinese texts).

It should be appreciated that the pre-trained language model (i.e., the BERT language model) is trained for abstract evaluation.

It should be understood that the word fill-in is the filling-in of the vacant word positions.

Specifically, the step of the perturbation mode is: randomly covering the original article (namely the original Chinese text), and filling the blank by using a pre-trained model (namely the BERT language model), so that the article with the semantic close to that of the single-use detail can be obtained. It can be understood that: words in an article (namely the original Chinese text) are randomly discarded to generate an article pair, and an abstract and a document pair are generated by utilizing the existing text generation model.

In the embodiment, the deleted text is obtained by randomly deleting words in the original Chinese text, and the processed text is obtained by filling gaps in the words of the deleted text based on the BERT language model, so that articles with different semantics close to those of single-use details can be obtained, and the context relationship between the two articles can be judged.

Optionally, as an embodiment of the present invention, the analyzing the abstract key information coverage rate of all the original chinese texts and all the processed texts to obtain the abstract key information coverage rate corresponding to each of the original chinese texts, the probability distribution of the abstract of the original text corresponding to each of the original chinese texts, and the probability distribution of the abstract of the processed text corresponding to each of the processed texts includes:

collecting all original Chinese texts to obtain a document set, and predicting the document set based on a BERT language model to obtain document probability distribution;

predicting the original Chinese text of each original Chinese text based on a BART abstract model to obtain the abstract probability distribution of the original text corresponding to each original Chinese text;

predicting the probability distribution of the processed text respectively based on the BART abstract model to obtain the probability distribution of the processed text abstract corresponding to each processed text;

respectively calculating a first co-occurrence text for each original text abstract probability distribution according to the document probability distribution to obtain a first co-occurrence text corresponding to each original Chinese text;

respectively calculating second co-occurrence texts for the probability distribution of the processed text abstracts according to the document probability distribution to obtain second co-occurrence texts corresponding to the original Chinese texts;

calculating the document coverage rate of all the first co-occurrence texts and all the second co-occurrence texts to obtain the document coverage rate;

and calculating the abstract key information coverage rate of each original text abstract probability distribution and the processed text abstract probability distribution corresponding to each original Chinese text according to the document coverage rate to obtain the abstract key information coverage rate corresponding to each original Chinese text.

It should be appreciated that the BART digest model is a de-noising autoencoder for pre-training a sequence to sequence model. The training method of BART is as follows: (1) Destroying text with arbitrary noise functions, and (2) learning a model to reconstruct the original text.

It should be understood that the decimated co-occurrence segments of the probability distributions of the generated text (i.e., the processed text) and the original document (i.e., the set of documents) are first computed; then calculating the document coverage rate; and finally, calculating the probability distribution area ratio of the generated abstract and the reference abstract.

In the embodiment, the summary key information coverage rate, the summary probability distribution of the original text and the summary probability distribution of the processed text are obtained by analyzing the summary key information coverage rate of the original Chinese text and the processed text, so that the generated text can be evaluated more reasonably, and the evaluation result is closer to manual evaluation, thereby enabling the evaluation content to be more flexible and reasonable.

Optionally, as an embodiment of the present invention, the calculating, according to the document probability distribution, a first co-occurrence text for each original text abstract probability distribution, to obtain a first co-occurrence text corresponding to each original chinese text includes:

calculating first co-occurrence texts by respectively carrying out first co-occurrence text calculation on the document probability distribution and the original text abstract probability distribution corresponding to each original Chinese text through a first formula, wherein the first formula is as follows:

wherein c to N (0, 1),

wherein, coverage (p) _φ|T (.|x _i ；ω；L),p _φ|K (. D; omega; W)) is the first co-occurrence text corresponding to the ith original text abstract, c is the key co-occurrence segment distribution, p _φ|K (| D; ω; W) is the document probability distribution, p _φ|T (.|x _i (ii) a Omega; l) is the probability distribution of the original text abstract corresponding to the ith original Chinese text, |. Is the co-occurrence segment length, | is an adjustment parameter,

for the vocabulary size, N (0, 1) is a positive distribution with a mean equal to 1 and a variance equal to 1.

Specifically, the percentage of key information in the summary that belongs to "extracted segments" in the input document is calculated, which are multiword key information shared between the input document and the summary. It is a simple precision measure and can predict how much key information in the input document is contained in the abstract, and the calculation expression is as follows:

where, | denotes co-occurrence segment length, p _φ|K (. D.; omega.; W) is the document probability distribution from the mask text of document D (i.e., the document set), the model predicts the probability of the word in document D (i.e., the document set) after each word is covered, p _φ|T (. X; ω; L) is the generation of a digest distribution (i.e., the original text digest probability distribution) generated by the digest model BART from the input article x (i.e., the original Chinese text), and c is the extracted key co-occurrence segment distribution, which follows a normal distribution. x represents the original article (i.e., the original chinese text), D represents the document (i.e., the set of documents), and D (i.e., the set of documents) contains a plurality of x (i.e., the original chinese text). Represents any element in the distribution. For a parameter of one groupThe omega probability mask language models BERT and BART, L are the adjusting parameters thereof.

Indicating the vocabulary size. N (0, 1) represents a positive-Taiwan distribution with a mean of 1 and a variance of 1.

In the embodiment, the first co-occurrence text is obtained by respectively calculating the document probability distribution and the first co-occurrence text of the original text abstract probability distribution in the first type, so that the key information in the abstract can be accurately predicted, and the measurement is simple.

Optionally, as an embodiment of the present invention, the calculating, according to the document probability distribution, a second co-occurrence text for each processed text abstract probability distribution, to obtain a second co-occurrence text corresponding to each original chinese text includes:

respectively calculating the document probability distribution and the processed text abstract probability distribution corresponding to each processed text by a second formula to obtain a second co-occurrence text corresponding to each original Chinese text, wherein the second formula is as follows:

wherein c to N (0, 1),

wherein ,

a second co-occurrence text corresponding to the ith processed text, c key co-occurrence segment distribution, p _φ|K (| D; ω; W) is the document probability distribution,

i is the probability distribution of the processed text abstract corresponding to the ith processed text, | - | is the length of the co-occurrence segment, L is an adjusting parameter,

As will be appreciated, the amount of time required,

representing new articles

The abstract model BART generates this distribution (i.e. the processed text).

In the embodiment, the second co-occurrence texts are obtained by respectively calculating the second co-occurrence texts of the document probability distribution and the processed text abstract probability distribution through the second formula, so that the key information in the abstract can be accurately predicted, the measurement is simple, and the generated text can be more reasonably evaluated.

Optionally, as an embodiment of the present invention, the calculating of the document coverage rate for all the first co-occurrence texts and all the second co-occurrence texts includes:

calculating the document coverage rate of all the first co-occurrence texts and all the second co-occurrence texts through a third formula to obtain the document coverage rate, wherein the third formula is;

wherein ,

wherein ,

wherein ,

wherein ,r^cov Coverage for a document, coverage (p) _φ|T (.|x _i ；ω；L),p _φ|K (. D; omega; W)) is the first co-occurrence text corresponding to the ith original Chinese text,

for the second co-occurrence text corresponding to the ith processed text,

is the coefficient of variation of the original Chinese text,

for the processed text coefficient of variation, n is the number of original Chinese texts or the number of processed texts, σ (coverage (p) _φ|T (.|x；ω；L),p _φ|K (| D; ω; W))) as the standard deviation of the original Chinese text,

for processing the following standard deviation, μ (coverage (p) _φ|T (.|x；ω；L),p _φ|K (. I D; omega; W))) is the average value of the original Chinese text,

is the processed text average.

It should be understood that, in order to uniformly improve the coverage rate of the key information, the invention proposes an extension of multiple documents, when the coverage rate of the key information of the input document reaches a certain degree, the EFC reaches the maximum value when the coverage rate of the key information of the document reaches uniform distribution.

Specifically, the expression of calculating the document coverage is as follows:

wherein, the key information coverage rate vector of the document set is cov (cov (p) _φ|T (.|x；ω；L),p _φ|K (. | D; ω; W))), the mean of the samples of the vector x is μ (x), the standard deviation of the samples is σ (x), and the ratio (coefficient of variation) of the standard deviation σ (x) to the mean μ (x) is

"normalized" coverage fraction

The larger its value, the larger the score, the more uniform the entire document set. n represents the number of samples x (i.e., the original chinese text number), where,

in the embodiment, the document coverage rate is calculated by the third formula for all the first co-occurrence texts and all the second co-occurrence texts, so that the coverage rate of the key information can be uniformly improved, the generated texts can be evaluated more reasonably, and the evaluation result is closer to the manual evaluation, so that the evaluation content is more flexible and reasonable.

Optionally, as an embodiment of the present invention, the calculating, according to the document coverage, the key information coverage on each original text digest probability distribution and the processed text digest probability distribution corresponding to each original chinese text, to obtain the key information coverage corresponding to each original chinese text includes:

calculating the summary key information coverage rate of the document coverage rate, the probability distribution of the summary of the original text corresponding to each original Chinese text and the probability distribution of the summary of the processed text corresponding to each original Chinese text respectively through a fourth formula, wherein the fourth formula is as follows:

wherein ,

for the summary key information coverage rate, r, corresponding to the ith original Chinese text ^cov For document coverage, p _φ|T (.|x _i (ii) a Omega; l) is the probability distribution of the original text abstract corresponding to the ith original chinese text,

the probability distribution of the processed text abstract corresponding to the ith processed text is obtained.

Specifically, in order to ensure that short abstracts are not unfairly rewarded to high coverage fraction, the invention regulates the reward by the length ratio of the prediction to the original document, and the specific expression is as follows:

wherein ,

key information coverage, | p _φ|T (| x; ω; L) | denotes the size of the probability distribution of the generated text (i.e. the probability distribution of the original text digest),

representing the size of the reference text probability distribution (i.e., the processed text digest probability distribution).

In the embodiment, the summary key information coverage rate is calculated by the fourth formula for the document coverage rate, the probability distribution of the original text summary and the summary key information coverage rate of the processed text summary probability distribution, so that short summaries are ensured not to be unfairly rewarded as high coverage rate scores, the reward is normalized, and more accurate evaluation scores are obtained.

Optionally, as an embodiment of the present invention, the calculating the evaluation scores of the summary key information coverage, the probability distribution of the summary of the original text corresponding to each original chinese text, and the probability distribution of the summary of the processed text corresponding to each processed text separately, and the process of obtaining the evaluation score corresponding to each chinese text includes:

calculating the evaluation scores of the summary key information coverage rate, the probability distribution of the summary of the original text corresponding to the original Chinese text and the probability distribution of the summary of the processed text corresponding to the processed text respectively through a fifth formula, and obtaining the evaluation scores corresponding to the Chinese texts, wherein the fifth formula is as follows:

y _i ＝sigmoid(r _i W+b)，

wherein ,

wherein ,

wherein ,y_i Is the evaluation score corresponding to the ith original Chinese text, W is weight, b is offset, sigmoid () is function, omega is distributed parameter, r is _i For the convex combination corresponding to the ith original chinese text,

is the distance between the probability distribution of the original text abstract corresponding to the ith original Chinese text and the probability distribution of the processed text abstract corresponding to the ith processed text, alpha is a proportionality coefficient,

for the summary key information coverage rate, p, corresponding to the ith original Chinese text _φ|T (.|x _i (ii) a Omega; l) is the probability distribution of the original text abstract corresponding to the ith original Chinese text,

and D is a document set, and log () is a log function.

It will be appreciated that the distance of the probability distribution is fused with the coverage reward and the final rating score (i.e. the rating score) is then calculated by the s i gmoi d function.

It should be understood that the generated text is considered good if it has a high coverage of key information to the original text and its distribution is uniform, and the generated text has a good length. And taking the original text as a limit, if the prediction result is due to the original text, rewarding as addition, and otherwise, subtracting.

Specifically, since the distance of the probability distribution and the coverage award are not necessarily in proportion, in order to obtain the final evaluation score (i.e., the evaluation score), convex combination is performed by using a proportionality coefficient, and a specific expression is as follows:

y＝sigmoid(rW+b)

wherein a is a scaling factor, where a is,

document key information coverage (i.e., the summary key information coverage). r is the convex combination, sigmoid (. -) function, W is the weight, b is the bias, y is the final evaluation score (i.e. the evaluation score), ω is the parameter of the distribution,

calculating p _φ|T (. X.; omega.; L) and

distance between distributions, L, is measured by KL divergence of two distributions as follows:

log (.) represents the Log function. D represents a document (i.e., the set of documents) and x represents an article in the document (i.e., the original chinese text).

In the embodiment, the evaluation scores are calculated by the fifth formula for the summary key information coverage rate, the probability distribution of the original text summary and the evaluation scores of the processed text summary probability distribution, so that the generated text can be evaluated more reasonably, and the evaluation result is closer to manual evaluation, so that the evaluation content is more flexible and reasonable.

Optionally, as another embodiment of the present invention, with the development of the BERT pre-training language model, the evaluation method also makes certain progress. The method can be similar to manual evaluation to deal with the situation that the generated text and the original text have the same semantics and different literal meanings. The text can be more reasonably evaluated and generated by applying the evaluation method of the pre-training language model.

Optionally, as another embodiment of the present invention, a document probability distribution, an original chinese text probability distribution corresponding to each original chinese text, and a processed text probability distribution corresponding to each processed text may also be obtained by another method, where the calculation formula is as follows:

p (, D, ω, W) represents the probability of the joint distribution consisting of masked text and document D (i.e., the document set). p (D; ω; W) represents the probability value of document D (i.e., the set of documents). p (, x, ω, L) represents the probability of a joint distribution consisting of masked words and article x (i.e., the original chinese text). p (x; ω; L) represents the probability value of article x (i.e. the original chinese text),

representing the probability of a joint distribution consisting of the masked text and the processed text.

Representing a probability value of the processed text.

Optionally, as another embodiment of the present invention, the present invention mainly includes training a pre-training language model, calculating extracted co-occurrence segments of the generated abstract and the original text from a semantic perspective, document coverage, probability distribution area ratio of the generated abstract and the reference abstract, distance of the probability distribution and coverage reward fusion method. First, a pre-trained language model is trained for abstract evaluation. Second, the extracted co-occurrence segments that generate the digest and the original text are computed. This allows key information in the original to be obtained, followed by calculation of the document coverage and the probability distribution area ratio of the generated summary and the reference summary. And finally, fusing the distance of the probability distribution and the coverage reward to calculate the final evaluation score through a sigmoid function. The invention has the beneficial effects that: according to the scheme, the method and the device can calculate the extracted co-occurrence segments of the generated abstract and the original text, the document coverage rate, the probability distribution area ratio of the generated abstract and the reference abstract, the distance of the probability distribution and the coverage reward are fused from the semantic perspective, and then the final evaluation score is calculated through a sigmoid function.

Optionally, as another embodiment of the present invention, the beneficial effects of the present invention are: the method can calculate the extracted co-occurrence segments of the generated abstract and the original text, the document coverage rate, the probability distribution area ratio of the generated abstract and the reference abstract, the distance of the probability distribution and the coverage reward are fused from the semantic perspective, and then the final evaluation score is calculated through a sigmoid function.

Alternatively, as another embodiment of the present invention, as shown in fig. 2, a text summary evaluation apparatus includes:

the coverage rate analysis module is used for analyzing the abstract key information coverage rate of all the original Chinese texts and all the processed texts to obtain the abstract key information coverage rate corresponding to each original Chinese text, the abstract probability distribution of the original text corresponding to each original Chinese text and the abstract probability distribution of the processed text corresponding to each processed text;

Alternatively, another embodiment of the present invention provides a text abstract evaluation device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the computer program is executed by the processor, the text abstract evaluation method as described above is implemented. The device may be a computer or the like.

Alternatively, another embodiment of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the text summary evaluation method as described above.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A text abstract evaluation method is characterized by comprising the following steps:

2. The method of claim 1, wherein the step of preprocessing each of the original chinese texts to obtain a processed text corresponding to each of the chinese texts comprises:

3. The method for evaluating a summary of a text according to claim 1, wherein the analyzing the coverage rate of the summary key information on all the original chinese texts and all the processed texts to obtain the coverage rate of the summary key information corresponding to each of the original chinese texts, the probability distribution of the summary of the original text corresponding to each of the original chinese texts, and the probability distribution of the summary of the processed text corresponding to each of the processed texts comprises:

predicting the original Chinese text of each original Chinese text based on a BART abstract model to obtain the probability distribution of the abstract of the original text corresponding to each original Chinese text;

predicting the probability distribution of the processed texts on the basis of the BART abstract model to obtain the probability distribution of the abstracts of the processed texts corresponding to the processed texts;

calculating a first co-occurrence text for each original text abstract probability distribution according to the document probability distribution to obtain a first co-occurrence text corresponding to each original Chinese text;

calculating a second co-occurrence text for each processed text abstract probability distribution according to the document probability distribution to obtain a second co-occurrence text corresponding to each original Chinese text;

4. The method of claim 3, wherein the step of calculating the probability distribution of the abstract of the original text according to the probability distribution of the document to obtain the first co-occurrence text corresponding to the original Chinese text comprises:

calculating the probability distribution of the document and the probability distribution of the abstract of the original text corresponding to each original Chinese text by a first formula to obtain a first co-occurrence text corresponding to each original Chinese text, wherein the first formula is as follows:

wherein c to N (0, 1),

wherein, coverage (p) _φ|T (.|x _i ；ω；L)，p _φ|K D, omega, W)) is the first co-occurrence text corresponding to the ith original text abstract, c is the key co-occurrence segment distribution, p _φ|K (| D; ω; W) is the document probability distribution, p _φ|T (.|x _i (ii) a Omega; l) is the probability distribution of the original text abstract corresponding to the ith original Chinese text, |. Is the co-occurrence segment length, | is an adjustment parameter,

5. The method of claim 3, wherein the step of calculating the probability distribution of the processed text excerpt according to the document probability distribution to obtain a second co-occurrence text corresponding to each of the original chinese texts comprises:

wherein c to N (0, 1),

wherein ,

is the second co-occurrence text corresponding to the ith processed text, c is the key co-occurrence segment distribution, p _φ|K (| D; ω; W) is the document probability distribution,

the probability distribution of the processed text abstract corresponding to the ith processed text, | is the length of the co-occurrence segment, L is an adjusting parameter,

6. The method according to claim 3, wherein the calculating of the document coverage rate for all the first co-occurrence texts and all the second co-occurrence texts comprises:

wherein ,

wherein ,

wherein ,

wherein ,r^cov Coverage for a document, coverage (p) _φ|T (.|x _i ；ω；L)，p _φ|K (. D; omega; W)) is the first co-occurrence text corresponding to the ith original Chinese text,

for the second co-occurrence text corresponding to the ith processed text,

is the coefficient of variation of the original Chinese text,

for the processed text coefficient of variation, n is the number of original Chinese texts or the number of processed texts, σ (coverage (p) _φ|T (.lx；ω；L)，p _φ|K (. I D; omega; W))) is the standard deviation of the original Chinese text,

for processing the following standard deviation, μ (coverage (p) _φ|T (.|x；ω；L)，p _φ|K (. I D; omega; W))) is the average value of the original Chinese text,

is the processed text average.

7. The method of claim 3, wherein the calculating of the key information coverage for each of the probability distributions of the original text excerpts and the probability distributions of the processed text excerpts corresponding to each of the original chinese texts according to the document coverage to obtain the key information coverage corresponding to each of the original chinese texts comprises:

wherein ,

8. The text summarization evaluation method of claim 1, wherein the step of calculating the evaluation scores of the summary key information coverage, the probability distribution of the summary of the original text corresponding to the original chinese text, and the probability distribution of the summary of the processed text corresponding to the processed text to obtain the evaluation score corresponding to the chinese text comprises:

calculating evaluation scores of the summary key information coverage rate, the probability distribution of the summary of the original text corresponding to the original Chinese text and the probability distribution of the summary of the processed text corresponding to the processed text respectively through a fifth formula, and obtaining the evaluation score corresponding to the Chinese text, wherein the fifth formula is as follows:

y _i ＝sigmoid(r _i W+b)，

wherein ,

wherein ,

for the abstract corresponding to the ith original Chinese textCritical information coverage, p _φ|T (.|x _i (ii) a Omega; l) is the probability distribution of the original text abstract corresponding to the ith original Chinese text,

and D is a document set, and log () is a log function.

9. A text digest evaluation apparatus, comprising:

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a text summarization evaluation method according to any one of claims 1 to 8.