CN116776868B

CN116776868B - Evaluation method of model generation text and computer equipment

Info

Publication number: CN116776868B
Application number: CN202311075044.6A
Authority: CN
Inventors: 冯好国; 徐青伟; 严长春; 裴非; 范娥媚
Original assignee: Zhiguagua Tianjin Big Data Technology Co ltd; Beijing Zhiguagua Technology Co ltd
Current assignee: Zhiguagua Tianjin Big Data Technology Co ltd; Beijing Zhiguagua Technology Co ltd
Priority date: 2023-08-25
Filing date: 2023-08-25
Publication date: 2023-11-03
Anticipated expiration: 2043-08-25
Also published as: CN116776868A

Abstract

The application discloses an evaluation method and computer equipment of a model generation text, which are independent of labels and are suitable for a production environment. The evaluation method is characterized in that after the three indexes of the gene, the readability and the fingerprint are respectively evaluated, comprehensive evaluation is carried out; wherein, the gene index is used for measuring the semantic relativity and homology of the model generation text and the input text, the readability index is used for measuring the degree of the model generated text which can be read by a person according to the sentence length average value and the text repeated degradation condition which are cut from the punctuation, and the fingerprint index is used for measuring the degree of consistency of semantic feature distribution of the model generated text and the training set label; specifically, a network model is extracted by training fingerprints based on a ternary twin network in advance, a fingerprint library is generated at the same time, then a model generation text is input into the fingerprint extraction network model to obtain fingerprints, and then distance measurement is carried out on the fingerprints and the fingerprint library to determine the value of a fingerprint index.

Description

Evaluation method of model generation text and computer equipment

Technical Field

The application belongs to the technical field of document data deep processing, and particularly relates to an evaluation method and computer equipment for model generation text.

Background

The patent deep processing is a patent rewriting technology for obtaining high added value by utilizing a text generation technology aiming at the characteristics of patent literature. At present, patent deep processing mainly comprises name deep processing, abstract deep processing, keyword indexing, IPC classification and the like, and patent abstract deep processing is that a long text is input into a trained model to generate abstract text (model generation text). Thus, it is necessary to perform a proper evaluation to provide an objective, uniform, quantitative measure for model-generated text.

Early model-generated text evaluations employed information retrieval metrics such as recall, precision, and F-value, compare model-generated text to manually written text, and measure the commonality between them. The main problem with this traditional evaluation method is that comparing model-generated text with single manually written text is too subjective.

Currently, standard metrics on which the digest system is developed are typically ROUGE and BLEU. The main task of BLEU (English full name: bilingual evaluation understudy, chinese term: bilingual evaluation candidate) is to compare the n-gram unit of the model translation with the n-gram unit of the reference translation (i.e. the manually written translation) and count the number of matches, which are location independent, the more matches the better the model translation. ROUGE (English full name: recall-Oriented Understudy for Gisting Evaluation, chinese term: evaluation replacement for Recall-oriented salient points) compares a model-generated text with a tag, i.e., a manually composed text, and calculates the number of overlapping units, such as n-gram, word sequence, and word pairs, between the model-generated text to be evaluated and the ideal text of the human composition.

AutoSummenG, presented by Giannakopoulos et al, is based on an n-gram graph and considers co-occurrence of word n-grams or character n-grams within a window. In the autosumeng method, all the different forms of a word are always converted into its lements.

ROUGE and BLEU, while effective in overall system ranking, are still lexical in nature and, depending on the tag, are not suitable for use in production environments lacking tags. AutoSummeNG is based on n-grams and considers co-occurrence of word n-grams or character n-grams within a window, which has a higher relevance to human judgment than ROUGE.

Disclosure of Invention

Based on the above, aiming at the technical problems, a new evaluation method and computer equipment for model generation text are provided, which are not dependent on labels and are suitable for production environments.

The application provides an evaluation method of a model generation text, which comprises the following steps:

generating a text for the model to be evaluated, and respectively calculating a gene index, a readability index and a fingerprint index of the text;

the gene index is used for measuring semantic relativity and homology of a model generated text and an input text, and comprises three factors including text length, relativity, mutual information and the like; the relevance is designed and realized based on word frequency and pearson correlation coefficient, and the mutual information characterization model generates the dependency degree between the text and the input text;

the readability index is used for measuring the degree of the model generated text which can be read by a person according to the sentence length average value and the text repeated degradation condition which are cut from the punctuation;

the fingerprint index is used for measuring the consistency degree of semantic feature distribution of the model generation text and the training set label; specifically, a network model is extracted by training fingerprints based on a ternary twin network in advance, a fingerprint library is generated at the same time, then a model generation text is input into the fingerprint extraction network model to obtain fingerprints, and then distance measurement is carried out on the fingerprints and the fingerprint library to determine the value of fingerprint indexes;

and carrying out comprehensive evaluation based on the calculated gene index, the calculated readability index and the calculated fingerprint index value to obtain a final evaluation result.

Alternatively, the gene index is formulated as follows:

wherein: v (V) _gene Representing a gene index value; len (len) _candidate Representing the number of characters of the text generated by the model; corr represents the relevance of the model generated text to the input text; mi denotes the mutual information of the model generation text and the input text.

Optionally, the formula of the readability index is as follows:

wherein: v (V) _read Is the value of the readability index; total _sent Representing the number of sentences of the text generated by the model, wherein the sentences refer to character strings obtained by punctuation segmentation; total _char Representing the number of characters of the text generated by the model, except for punctuation; total _dup Indicating the number of repetitions that occur; len (len) _generate Representing the length of the text.

Optionally, the network model is extracted based on the ternary twin network training fingerprint, and a fingerprint library is generated at the same time, which specifically comprises the following steps:

firstly, preprocessing a data set to construct three data sets of an original label, a copy label and an irrelevant label;

secondly, constructing three networks sharing weight parameters, and respectively inputting three data sets of the original label, the copy label and the irrelevant label;

and updating network parameters according to the triplet loss and the classification loss, and obtaining a training set tag fingerprint library.

Further, the triplet loss function L _tr The method comprises the following steps:

wherein: n represents the total number of triplet groups; max (d) represents a maximum value; d (,) represents the calculated Euclidean distance; x is x _f 、And->Respectively indicate->、/>And->The characteristics obtained through network mapping; margin represents the interval of the triplet loss function for controlling the distance of positive and negative samples.

Further, the classification loss adopts a cross entropy loss function L _ce The representation is specifically as follows:

wherein: n represents the total number of triplet groups; m represents the total number of label categories, and takes a value of 2; p represents the true classification probability; q represents the prediction classification probability, x _ij Representing the probability that the original text x in the ith triplet belongs to (the expression applied to the calculation of the true classification probability p) or predicts (the expression applied to the prediction classification q) as the label j class.

Further, the triple loss and the cross entropy loss are combined, and the optimized fingerprint extraction network model loss function L is as follows:

wherein:representing variable weight parameters; l (L) _tr Representing a triplet loss function; l (L) _ce Representing a cross entropy loss function.

Optionally, the distance measure uses a hamming distance H, defined as follows:

wherein: l represents the fingerprint length; b _q A fingerprint representing text to be evaluated; b represents a training set label fingerprint library; n represents the number of bits of the fingerprint;representing an exclusive-or operation;

determining fingerprint index function V based on minimum Hamming distance _fingerprint The definition is as follows:

。

optionally, the comprehensive evaluation is performed to obtain a final evaluation result, and one of the following comprehensive evaluation methods is selected to be executed:

the first comprehensive evaluation method defines the following formula:

wherein: v (V) _eval Representing a text evaluation value; v (V) _gene Is the value of the gene index, V _read Is the value of the readability index, V _fingerprint Is the value of the fingerprint index; n is equal to the number of the factors in the root number;

the second comprehensive evaluation method defines the formula as follows:

wherein: w (W) _gene Weight of gene index, V _gene A value representing a gene index; w (W) _read Weights representing readability indicators, V _read A value representing a readability indicator; w (W) _fingerprint Weights representing fingerprint indicators, V _fingerprint A value representing a fingerprint index, and satisfies:；

if the gene index, the readability index and the fingerprint index do not need to be respectively set with weights or the weights can not be calculated, a first comprehensive evaluation method is selected; if the weights need to be set or given respectively, a second comprehensive evaluation method is selected.

The application provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program, and is characterized in that the processor realizes the steps of the model generation text evaluation method when executing the computer program.

The application has at least the following beneficial effects:

the evaluation method of the model generated text is independent of labels, is suitable for production environment, and is used for flexible and selectable comprehensive evaluation after three aspects of homology correlation degree with the original patent literature, human reading degree, semantic feature distribution consistency of training set labels and the like are respectively evaluated through three indexes such as genes, readability, fingerprints and the like.

The relevance in the gene index is designed and realized based on word frequency and pearson correlation coefficient, the weight is designed and realized simply, the sorting is quick, and the returned P value can be utilized to combine with the set significance level for verification, so that the statistical theory support is obtained. The mutual information in the gene index reflects the degree of dependence between the model generation text and the input text, and if the mutual information is far more than 0, the model generation text is highly related to the input text; if the mutual information is equal to 0, it is indicated that the model generation text and the input text are independent of each other.

The readability index is used for measuring the degree of the text generated by the model which can be read by a person, and is defined in two aspects of sentence length average value cut from punctuation and text repeated degradation condition, and the design is simple to realize.

The fingerprint index is based on a ternary twin network training fingerprint extraction network model, and a fingerprint library is generated. The model generates text input fingerprint to extract network model to obtain fingerprint, then calculates the nearest hamming distance between fingerprint and finger print in fingerprint library, then calculates fingerprint index value, and measures the consistency of semantic feature distribution of model generated text and training set label.

Drawings

FIG. 1 is a schematic diagram of an evaluation method of model generation text according to an embodiment of the present application;

FIG. 2 is a text fingerprint extraction block diagram of a ternary twin network in one embodiment of the application;

FIG. 3 is a schematic diagram of a triplet network architecture in accordance with one embodiment of the present application;

FIG. 4 illustrates three views of the integrated evaluation in one embodiment of the application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

In one embodiment, as shown in fig. 1, there is provided an evaluation method of model generation text, including: generating a text for the model to be evaluated, and respectively calculating a gene index, a readability index and a fingerprint index of the text; and carrying out comprehensive evaluation based on the calculated gene index, the calculated readability index and the calculated fingerprint index value to obtain a final evaluation result.

1) Gene index

The genetic index is used for measuring semantic relativity and homology of the generated text and the input text of the model and comprises three factors such as text length, relativity and mutual information. The text length refers to the number of generated text characters; the relevance refers to the relevance of the generated text and the input text, and a calculation formula of the relevance corr is designed based on word frequency and pearson correlation coefficient:

（1）

wherein: n represents the size of the dictionary constructed after inputting text and all generated text word segmentation, namely the vocabulary number; x represents a word frequency list of the input text in dictionary order; y represents a word frequency list of the generated text of which the correlation degree is to be calculated, in dictionary order. Considering the performance improvement, the following equivalent and better-performance correlation calculation formula is given:

（2）

wherein: x represents a word frequency list of the input text in dictionary order; m is m _x Representing word frequency average values of input texts; y represents a word frequency list of a model generation text of the correlation degree to be calculated, and the word frequency list is in dictionary order; m is m _y The model representing the degree of correlation to be calculated generates a word frequency mean of the text. The scipy.stats.pearsonr packet is implemented based on equation (2) and returns the P value for verification, so the application directly calls scipy.stats.pearsonr to calculate the correlation.

The mutual information refers to average mutual information of the model generation text and the input text, namely, statistical average value in joint probability space of the model generation text and the input text. The average mutual information has the properties of non-negativity, reciprocity (symmetry), extremum and the like. Non-negative interpretation given the generated text, a portion of the uncertainty about the input text is generally always eliminated. Reciprocity (symmetry) means that the amount of information obtained from the model-generated text about the input text is equal to the amount of information obtained from the input text about the model-generated text. Extremum indicates that the amount of information obtained from one event about another event can be at most only the average self-information of the other event, and does not exceed the amount of information contained in the other event itself. The application calculates the mutual information to reflect the degree of dependence between the model generation text and the input text, and if the mutual information is far more than 0, the application indicates that the model generation text is highly relevant to the input text; if the mutual information is equal to 0, it is indicated that the model generation text and the input text are independent of each other. The calculation formula of the mutual information is defined as follows:

（3）

wherein: mi represents mutual information of the model generation text and the input text;a maximum value among a plurality of values indicating comma intervals in brackets, ensuring that the calculation result is non-negative; x represents a list of input text words; y represents the model to generate a text word list; />: representing a joint probability distribution of the input text and the model-generated text; p (x): representing a probability distribution of the input text; p (y): the representation model generates a probability distribution of the text.

A smoothed value calculation formula used in calculating the probability distribution of the input text:

（4）

wherein:: representing the sum of word frequency accumulation of the input text; />: the representation model generates a vocabulary number that appears in the text but not in the input text.

Calculating a smoothed value calculation formula used when generating a probability distribution of text:

（5）

wherein:: generating a text word frequency accumulation sum by the representation model; />: representing the number of words that occur in the input text but not in the model-generated text.

Designing gene indexes based on the factors such as the length, the relativity, the mutual information and the like, wherein a definition formula is as follows:

（6）

wherein: v (V) _gene The larger the index value of the expressed gene is, the better; len (len) _candidate Representing the number of characters of the text generated by the model; corr represents the relevance of model-generated text to the input text, see equation (1) (2); mi represents the mutual information of the model generation text and the input text, see formula (3).

2) Readability index

The readability index is used for measuring the degree of the model generated text which can be read by a person, and the readability index V is defined by the sentence length average value which is cut from punctuation and the repeated degradation condition of the text _read The following are provided:

（7）

wherein: total _sent Representing the number of sentences of the text generated by the model, wherein the sentences refer to character strings obtained by punctuation segmentation; total _char Representing the number of characters of the text generated by the model, except for punctuation; total _dup Indicating the number of times of repetition, and suggesting a value not less than 3; len (len) _generate Representing the length of the text. In the middle ofModeling the influence of the length of sentences segmented by text punctuation on the readability, wherein the longer the readability is, the worse the readability is; />Modeling the influence of the repetition number on the readability, wherein the readability is worse as the repetition number is larger; />Modeling the influence of the average length of the repeated character strings on the readability, wherein the shorter the average length of the repeated character strings is, the worse the readability is; />Modeling is the effect of text repetition degradation on readability.

3) Fingerprint index

The fingerprint index is used for measuring the consistency degree of semantic feature distribution of the model generation text and the training set label. In the triple twinning network training stage, firstly, preprocessing a data set to construct 3 data sets of an original tag, a copy tag and an irrelevant tag; secondly, constructing 3 networks sharing weight parameters, and respectively inputting 3 data sets of an original label, a copy label and an irrelevant label; and updating network parameters according to the triplet loss and the classification loss, and obtaining a training set tag fingerprint library. In the model application stage, obtaining the fingerprint of the text to be evaluated by applying a trained fingerprint extraction network, calculating the hamming distance between the text to be evaluated and the fingerprint in the training set label fingerprint library, and calculating the fingerprint index value according to the minimum hamming distance. The triplet twinned network text fingerprint extraction framework is shown in fig. 2.

(1) Convolution kernel

Since the text character sequence is one-dimensional, the text is converted into a two-dimensional numerical matrix after the character is vectorized. The convolution kernel size dim×win, dim representing the character vector dimension size, win representing the window size.

(2) Triplet network structure

The triplet network takes a group of triplet samples as input, can learn the distance relation between the original samples and the positive and negative samples at the same time, and constrains the distance between the original samples and the positive and negative samples by using a triplet loss function, so that the characteristics of the original text and the copy text of the original text are closer, and the characteristic distance between the original text and other irrelevant texts is enlarged. Therefore, the whole framework is constructed by adopting a triplet network with shared weight parameters, as shown in fig. 3, the network weight parameters of three branches are shared, and the input is respectively an original text x and a copy text x ⁺ Extraneous text x ⁻ Extracting text features through a network, wherein the copy text x ⁺ Is obtained by carrying out synonym replacement on the original text x, adopts a triplet loss function to measure the characteristic distance, carries out network parametersAnd (5) optimizing. Triplet loss function L _tr Expressed as:

（8）

The feature distance between the original text and the copy text is reduced by minimizing the triple loss, and the feature distance between the original text and the irrelevant text is increased. In order to enhance the sensitivity of fingerprints to the overall distribution information of sample characteristics and enhance the uniqueness of the fingerprints, the application adopts a cross entropy loss function as classification loss supplement, and the cross entropy loss function L _ce The definition is as follows:

（9）

wherein: n represents the total number of triplet groups; m represents the total number of label categories, and M takes a value of 2 because the classification is used for judging whether the labels are the same type of the original text or the duplicate text; p represents the true classification probability; q represents the prediction classification probability, x _ij Representing that the original text x in the ith triplet belongs to (applies to the expression calculated by the true classification probability p) or predicts(the expression applicable to the prediction class q) is the probability of the tag j class.

Combining the triple loss and the cross entropy loss, optimizing a network model loss function L, and defining as follows:

（10）

(3) Training phase

The backbone network contains 5 2D convolutional layers, 1 pooling layer and 2 fully connected layers using triad twin network training that shares weight parameters. The activation function of the last full-connection layer is set as a hyperbolic tangent function tanh, the other full-connection layer activation functions are set as correction linear units ReLU, and the activation function tanh enables the corresponding output of each input after passing through the whole network structure to be inContinuous real values. And the hash function is quantized to 0 or 1, so that the storage space occupied by the text fingerprint is reduced, and the calculation efficiency of the Hamming distance is provided. Model training and fingerprint extraction can be formalized as:

（11）

wherein: given a text x ⁱ Order-makingTo learn the proper parameter w after passing through the 2D network, the feature is quantized into a binary text fingerprint b through a function sign ⁱ The fingerprint length is set to be 16-80 bits, and the application takes 64bits.

(4) Application phase

Taking the trained 2D model of one branch as a fingerprint extraction network, firstly, extracting fingerprints from all training set labels to obtain a fingerprint library; then, extracting fingerprints from each text to be evaluated through a trained fingerprint extraction network; next, a distance metric and fingerprint index value calculation is performed with the fingerprint library. The fingerprint metric uses a hamming distance H, defined as follows:

（12）

fingerprint index V designed based on minimum Hamming distance _fingerprint The definition is as follows:

（13）

comprehensive evaluation is performed based on the gene index, the readability index and the fingerprint index, and three evaluation perspectives shown in fig. 4 are embodied. The present embodiment provides two comprehensive evaluation methods:

the first comprehensive evaluation method defines the following formula:

（14）

wherein: v (V) _eval Representing a text evaluation value; v (V) _gene Is the value of the gene index, V _read Is the value of the readability index, V _fingerprint Is the value of the fingerprint index; n represents the power of n, which is equal to the number of factors in the root.

The second comprehensive evaluation method defines the formula as follows:

（15）

wherein: w (W) _gene Weight of gene index, V _gene A value representing a gene index; w (W) _read Weights representing readability indicators, V _read A value representing a readability indicator; w (W) _fingerprint Weights representing fingerprint indicators, V _fingerprint A value representing a fingerprint index, and satisfies:。

if the gene index, the readability index and the fingerprint index do not need to be respectively set with weights or cannot be calculated, suggesting to select a first evaluation method; if weights need to be set separately or weights can be given, a second evaluation method is recommended.

In one embodiment, a computer device is also provided, where the computer device may be a server or a client device, and all or part of the flow in the method of the foregoing embodiment is implemented by using an operating program.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A method of evaluating model-generated text, comprising:

the gene index is used for measuring semantic relativity and homology of a model generated text and an input text, and comprises three factors including text length, relativity and mutual information; the relevance is designed and realized based on word frequency and pearson correlation coefficient, and the mutual information characterization model generates the dependency degree between the text and the input text;

based on the calculated gene index, readability index and fingerprint index values, carrying out comprehensive evaluation to obtain a final evaluation result;

the gene index is expressed as follows:

V _gene ＝log ₂ (len _candidate )×corr×log ₂ (mi)

wherein: v (V) _gene Representing a gene index value; len (len) _candidate Representing the number of characters of the text generated by the model; corr represents the relevance of the model generated text to the input text; mi represents mutual information of the model generation text and the input text;

the formula of the readability index is as follows:

2. The method for evaluating model generation text according to claim 1, wherein the network model is extracted based on the ternary twin network training fingerprint, and a fingerprint library is generated at the same time, specifically comprising:

3. The method for evaluating model-generated text according to claim 2, wherein the triplet loss function L _tr The method comprises the following steps:

wherein: n represents the total number of triplet groups; max (d) represents a maximum value; d (,) represents the calculated Euclidean distance; x is x _f 、And->Respectively represent x, x ⁺ And x ^- The characteristics obtained through network mapping; margin represents the interval of the triplet loss function for controlling the distance of positive and negative samples.

4. A method of evaluating model generated text according to claim 3, wherein the classification penalty employs a cross entropy penalty function L _ce The representation is specifically as follows:

wherein: n represents the total number of triplet groups; m represents the total number of label categories, and takes a value of 2; p represents the true classification probability; q represents the prediction classification probability, x _ij Representing the probability that the original text x in the ith triplet belongs to or is predicted to be the tag j category.

5. The method for evaluating model generated text according to claim 4, wherein the triplet loss and the cross entropy loss are combined, and the optimized fingerprint extraction network model loss function L is:

L＝γL _tr +(1-γ)L _ce

wherein: gamma represents a variable weight parameter; l (L) _tr Representing a triplet loss function; l (L) _ce Representing a cross entropy loss function.

6. The method for evaluating model-generated text according to claim 2, wherein the distance measure uses a hamming distance H defined as follows:

7. the method for evaluating model-generated text according to claim 1, wherein the performing the comprehensive evaluation to obtain a final evaluation result is performed by selecting one of the following comprehensive evaluation methods:

the first comprehensive evaluation method defines the following formula:

the second comprehensive evaluation method defines the formula as follows:

V _eval ＝W _gene ×V _gene +W _read ×V _read +W _fingerprint ×V _fingerprint

wherein: w (W) _gene Weight of gene index, V _gene A value representing a gene index; w (W) _read Weights representing readability indicators, V _read A value representing a readability indicator; w (W) _fingerprint Weights representing fingerprint indicators, V _fingerprint A value representing a fingerprint index, and satisfies: w (W) _fingerint +W _gene +W _read ＝1；

If the gene index, the readability index and the fingerprint index do not need to be respectively set with weights or the weights can not be calculated, a first comprehensive evaluation method is selected; if the weights are required to be set or given respectively, a second comprehensive evaluation method is selected.

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.