Detailed Description
So that the manner in which the features and elements of the disclosed embodiments can be understood in detail, a more particular description of the disclosed embodiments, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. In the following description of the technology, for purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the disclosed embodiments. However, one or more embodiments may be practiced without these details. In other instances, well-known structures and devices may be shown in simplified form in order to simplify the drawing.
The terms "first," "second," and the like in the description and in the claims, and the above-described drawings of embodiments of the present disclosure, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the present disclosure described herein may be made. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions.
The term "plurality" means two or more unless otherwise specified.
In the embodiment of the present disclosure, the character "/" indicates that the preceding and following objects are in an or relationship. For example, A/B represents: a or B.
The term "and/or" is an associative relationship that describes objects, meaning that three relationships may exist. For example, a and/or B, represents: a or B, or A and B.
With reference to fig. 1, an embodiment of the present disclosure provides a method for text generation quality evaluation, including:
step S101, acquiring a reference text and a generated text, wherein the generated text is acquired according to the reference text;
step S102, inputting the reference text and the generated text into a preset evaluation model to obtain an evaluation index; the evaluation model is obtained according to a sample text with a theme similarity label and a generated sentence identification label;
and step S103, evaluating the text generation quality according to the evaluation index.
By adopting the method for evaluating the text generation quality, the evaluation model is trained through the sample text with the topic similarity label and the generated sentence identification label, so that the trained evaluation model can comprehensively evaluate the text generation quality through two aspects of the topic similarity and the generated sentence identification.
Optionally, obtaining an evaluation model according to the sample text with the topic similarity label and the generated sentence recognition label includes: obtaining a sample text; obtaining a theme similarity label of a sample text; obtaining a generated sentence identification label of a sample text; and training a preset neural network model by using a sample text with a theme similarity label and a generated sentence identification label to obtain an evaluation model.
Therefore, the evaluation model is obtained through the collaborative training of the sample text with the theme similarity label and the generated sentence identification label, so that the trained evaluation model can comprehensively evaluate the text generation quality through the theme similarity and the generated sentence identification, and the reliability of text generation quality evaluation can be improved.
Optionally, the neural network model comprises a bert (bidirectional Encoder replication from transformations) pre-training model. In some embodiments, the neural network model is trained using sample text with topic similarity labels and generated sentence recognition labels to obtain an evaluation model. Compared with the traditional statistical evaluation model, the evaluation model trained on the neural network model can analyze the topic features and the structural features of the texts, and the evaluation of the topic similarity between the texts is more reliable by evaluating the topic similarity of the texts through analyzing the topic features and the structural features.
Optionally, the sample text comprises a first text pair and a second text pair; obtaining the sample text includes: acquiring a reference sample text and a generated sample text, and generating the generated sample text according to the reference sample text; combining the reference sample text with a generated sample text corresponding to the reference sample text to obtain a first text pair; and combining the two different reference sample texts to obtain a second text pair.
Optionally, the obtaining of the topic similarity label of the sample text includes: acquiring a first similarity of a reference sample text and a generated sample text in a first text pair, and acquiring a topic similarity label of the first text pair according to the first similarity; and acquiring a second similarity between the reference sample texts in the second text pair, and acquiring the topic similarity label of the second text pair according to the second similarity. Therefore, the sample texts have the topic similarity labels, the evaluation model trained on the sample texts with the topic similarity labels can generate topic similarity indexes based on the topic similarity labels, the topic similarity degree between the texts is evaluated according to the topic similarity indexes, and the accuracy of the evaluation model for evaluating the text generation quality is improved.
Optionally, obtaining a first similarity between a reference sample text and a generated sample text in the first text pair includes: inputting the reference sample text and the generated sample text into a preset theme model for theme analysis to obtain a first probability distribution vector and a second probability distribution vector of the reference sample text and the generated sample text which respectively correspond to a plurality of preset themes; and acquiring a first KL distance (Kullback-Leibler direction) of the first probability distribution vector and the second probability distribution vector, and determining the first KL distance as a first similarity.
Optionally by calculation
Obtaining a first KL distance of a first probability distribution vector and a second probability distribution vector, wherein p (x)
i) Is a first probability distribution vector, q (x)
i) Is a second probability distribution vector, D
KL(p | | q) is a first probability distribution vector p (x)
i) And a second probability distribution vector q (x)
i) N is a first probability distribution vector p (x)
i) And a second probability distribution vector q (x)
i) N is a positive integer.
Optionally, the first KL distance DKLThe smaller the numerical value of (p | | q), the smaller the first probability distribution vector p (x)i) And a second probability distribution vector q (x)i) The higher the similarity between them.
Optionally, obtaining a second similarity between the reference sample texts in the second text pair includes: inputting the two reference sample texts into a preset theme model for theme analysis to obtain a third probability distribution vector and a fourth probability distribution vector of the two reference sample texts, wherein the third probability distribution vector and the fourth probability distribution vector correspond to a plurality of preset themes respectively; acquiring a second KL distance between the third probability distribution vector and the fourth probability distribution vector; the second KL distance is determined as a second similarity.
Optionally by calculation
Obtaining a second KL distance of the third probability distribution vector and the fourth probability distribution vector, wherein p' (x)
i) Is a third probability distribution vector, q' (x)
i) Is a fourth probability distribution vector, D'
KL(p ' | q ') is a third probability distribution vector p ' (x)
i) And a fourth probability distribution vector q' (x)
i) Is a third probability distribution vector p' (x)
i) And a fourth probability distribution vector q' (x)
i) N' is a positive integer.
Optionally, a second KL distance D'KLThe smaller the numerical value of (p ' | q '), the smaller the third probability distribution vector p ' (x)i) And a fourth probability distribution vector q' (x)i) The higher the similarity between them.
Optionally, obtaining the topic similarity label of the first text pair according to the first similarity includes: determining the topic similarity label of the first text pair as topic similarity under the condition that the first similarity meets a first preset condition; and under the condition that the first similarity does not meet a first preset condition, determining the topic similarity label of the first text pair as dissimilar in topic.
Optionally, in a case that the first similarity satisfies a first preset condition, determining the topic similarity label of the first text pair as topic similarity includes: and under the condition that the first similarity of the first text pair is smaller than or equal to a preset first threshold, the reference sample text in the first text pair is similar to the theme of the generated text sample, and the theme similarity label of the first text pair is determined to be similar to the theme.
Optionally, in a case that the first similarity does not satisfy the first preset condition, determining the topic similarity label of the first text pair as dissimilar topics includes: and under the condition that the first similarity of the first text pair is greater than a preset second threshold, the reference sample text in the first text pair is not similar to the subject of the generated text sample, and the subject similarity label of the first text pair is determined to be not similar to the subject.
Optionally, obtaining the topic similarity label of the second text pair according to the second similarity includes: determining the topic similarity label of the second text pair as topic similarity under the condition that the second similarity meets a second preset condition; and under the condition that the second similarity does not meet a second preset condition, determining the topic similarity label of the second text pair as the topic dissimilarity.
Optionally, when the second similarity satisfies a second preset condition, determining the topic similarity label of the second text pair as topic similarity includes: and under the condition that the second similarity of the second text pair is smaller than or equal to a preset third threshold, determining that the subjects of the reference sample texts in the second text pair are similar, and determining the subject similarity label of the second text pair as subject similarity.
Optionally, when the second similarity does not satisfy the second preset condition, determining the topic similarity label of the second text pair as dissimilar topics, including: and under the condition that the second similarity of the second text pair is greater than a preset fourth threshold, the subjects of the reference sample texts in the second text pair are not similar, and the subject similarity label of the second text pair is determined to be the subject dissimilarity.
Optionally, obtaining a sentence recognition tag for generating a sample text includes: determining a generation sentence recognition tag of the sample text as having a generation sample text in a case where the sample text includes the generation sample text; in a case where the sample text does not include the generated sample text, the generated sentence recognition tag of the sample text is determined not to have the generated sentence. Therefore, the sample text has the generated sentence identification tag, the evaluation model trained on the sample text with the generated sentence identification tag can output the generated sentence identification index, the possibility that the text is identified as the text generated by the model is evaluated according to the generated sentence identification index, and the accuracy of evaluating the generation quality of the model evaluation text is improved.
Optionally, training a preset neural network model by using a sample text with a topic similarity label and a generated sentence identification label to obtain an evaluation model, including: obtaining a first loss value corresponding to a theme similarity label in a trained neural network model; obtaining a second loss value corresponding to the generated sentence identification label in the trained neural network model(ii) a By calculating Lall=Ldist+λLtopicObtaining a total loss value, wherein LallAs overall loss value, LtopicIs a first loss value, LdistAnd the second loss value is lambda which is a hyperparameter and is less than or equal to 1. Optionally, the larger λ is, the larger the influence of the evaluation index on the evaluation index by the theme evaluation index corresponding to the theme similarity label generated by the evaluation model when evaluating the text generation quality is larger.
Optionally, obtaining a first loss value corresponding to a topic similarity label in the trained neural network model, including; obtaining a first sample hidden layer output vector from a reference sample text in a first text pair through a trained neural network model; the method comprises the steps that a generated sample text in a first text pair passes through a trained neural network model to obtain a hidden layer output vector of a second sample; respectively obtaining a third sample hidden layer output vector and a fourth sample hidden layer output vector of different reference sample texts in the second text pair through a trained neural network model; taking an arithmetic mean of a combination of the first sample hidden layer output vector and the third sample hidden layer output vector to obtain a first sample arithmetic mean; taking an arithmetic mean of a combination of the second sample hidden layer output vector and the fourth sample hidden layer output vector to obtain a second sample arithmetic mean; transforming the first sample arithmetic mean to an output dimension corresponding to the topic similarity label through a multilayer perceptron (MLP) to obtain a fifth sample hidden Layer output vector; transforming the arithmetic mean of the second samples to an output dimension corresponding to the theme similarity label through a multi-layer perceptron MLP to obtain a sixth sample hidden layer output vector; splicing the fifth sample hidden layer output vector and the sixth sample hidden layer output vector to obtain a sample splicing vector; performing linear transformation on the sample splicing vector through a full Connected Layer (FC) to obtain a sample theme similarity characteristic corresponding to the sample splicing vector; performing two-classification processing on the sample theme similarity characteristics through an SIGMOID function to obtain a sample theme similarity function corresponding to the theme similarity label; and calculating loss of the sample theme similarity function according to the cross entropy loss function to obtain a first loss value.
Optionally, obtaining a second loss value corresponding to the generated sentence recognition tag in the trained neural network includes: acquiring a sample semantic expression symbol generated by a trained neural network at an input layer; obtaining a sample semantic output vector by passing the semantic expression symbol through a trained neural network; transforming the sample semantic output vector to an output dimension corresponding to the generated sentence identification tag through a multi-layer perceptron MLP to obtain a sample semantic vector; performing two-classification processing on the sample semantic vector through an SIGMOD function to obtain a sample generation statement identification function corresponding to the generation statement identification tag; and calculating loss of the sample generation statement identification function according to the cross entropy loss function to obtain a second loss value.
Optionally, training a preset neural network model by using a sample text with a topic similarity label and a generated sentence identification label to obtain an evaluation model, including: inputting a sample text with a theme similarity label and a generated sentence identification label into a preset neural network model for training, and recording the total loss value of the training model in each preset period; acquiring the lowest value of the recorded overall loss values; when the total loss value of the training model in M continuous preset periods is not lower than the lowest value in the total loss values, determining that the accuracy of the neural network model is not improved any more; stopping model training, and determining the trained model as an evaluation model; wherein M is a positive integer. Optionally, M is greater than or equal to 10.
Optionally, inputting the reference text and the generated text into a preset evaluation model to obtain an evaluation index, where the evaluation index includes: the reference text passes through an evaluation model to obtain a first hidden layer output vector; the generated text passes through an evaluation model to obtain a second hidden layer output vector; obtaining a semantic expression symbol generated by an evaluation model in an input layer; the semantic expression symbols pass through an evaluation model to obtain semantic output vectors; calculating an arithmetic mean of the output vector of the first hidden layer to obtain a first arithmetic mean; taking an arithmetic mean of the output vector of the second hidden layer to obtain a second arithmetic mean; transforming the first arithmetic mean to an output dimension corresponding to the theme similarity label through a multilayer perceptron MLP to obtain a third hidden layer output vector; transforming the second arithmetic mean to an output dimension corresponding to the theme similarity label through a multilayer perceptron MLP to obtain a fourth hidden layer output vector; splicing the third hidden layer output vector and the fourth hidden layer output vector to obtain a spliced vector; performing linear transformation on the spliced vector through a full connection layer FC to obtain theme similarity characteristics corresponding to the spliced vector; performing two-classification processing on the theme similarity characteristics through an SIGMOID function to obtain a theme similarity function corresponding to the theme similarity label; acquiring a first confidence coefficient of the topic similarity function on a positive example corresponding to the topic similarity label; determining the first confidence as a topic similarity index; converting the semantic output vector to an output dimension corresponding to the generated sentence identification tag through a multi-layer perceptron MLP to obtain a semantic vector; performing two-classification processing on the semantic vector through an SIGMOD function to obtain a generated statement identification function corresponding to the generated statement identification tag; acquiring a second confidence coefficient of the generated statement identification function on the corresponding positive example of the generated statement identification tag; determining the second confidence coefficient as a generated sentence identification index; and obtaining an evaluation index according to the topic similarity index and the generated sentence identification index.
In one embodiment, the reference text and the generated text are input to an evaluation model based on a BERT pre-training model, the evaluation model generating input data at an input layer, the input data comprising [ CLS ] symbols, the reference text and the generated text, wherein the [ CLS ] symbols represent semantic features of the reference text and the generated text; and acquiring a [ CLS ] output vector of the [ CLS ] symbol after passing through the evaluation model, and determining the [ CLS ] output vector as a semantic output vector.
Optionally, obtaining an evaluation index according to the topic similarity index and the generated sentence identification index includes: and obtaining an evaluation index by multiplying the topic similarity index and the generated sentence identification index.
Optionally, the evaluating the text generation quality according to the evaluation index includes: determining the text generation quality to be excellent in the case where the evaluation index is greater than or equal to a first set threshold; determining the text generation quality to be good under the condition that the evaluation index is greater than or equal to a second set threshold and smaller than a first set threshold; determining the text generation quality as poor in the case where the evaluation index is smaller than a second set threshold; wherein the second set threshold is less than the first set threshold. Optionally, the first set threshold is 0.7. Optionally, the second set threshold is 0.4.
In one embodiment, the reference text is "which is a very cost effective handset. The price is too much for the people, the cost performance is ultrahigh, and the method is suitable for all people. Especially for the old, the page is clear, simple and free from dazzling. The screen is clear, and is smooth, and the slip unblock is convenient, and tone quality is put the lever. The tone is vivid. The picture is too excellent, and the Qian Yuan machine firstly draws one finger. And obtaining a generated text through a text generation model to be evaluated according to the reference text, wherein the generated text is' the cost performance of the mobile phone is high. The electronic lock has the advantages of simplicity, high preference, high cost performance, suitability for all scenes, clear screen page display, smooth sliding, quick unlocking and tone quality bar, and is particularly suitable for the old people. The best picture is taken, and the Qian Yuan machine is the first to yield one finger. ". Inputting the reference text and the generated text into an evaluation model; determining the topic similarity of the topic similarity label as a positive example of a topic similarity function, wherein the higher the topic similarity of the reference text and the generated text is, the higher the numerical value of a topic similarity index is; the theme similarity index output by the evaluation model is 0.852; determining the generated sentences without the generated sentence identification tags as the normal examples of the language generation sentence identification function, wherein the higher the possibility that the generated text is identified as the text generated by the model is, the higher the numerical value of the generated sentence identification index is; the generated sentence recognition index output by the evaluation model is 0.831; the evaluation index obtained by multiplying the topic similarity index and the generated sentence identification index is 0.708; the text generation quality of the text generation model to be evaluated is determined to be excellent.
As shown in fig. 2, an apparatus for text generation quality evaluation according to an embodiment of the present disclosure includes a processor (processor)100 and a memory (memory)101 storing program instructions. Optionally, the apparatus may also include a Communication Interface (Communication Interface)102 and a bus 103. The processor 100, the communication interface 102, and the memory 101 may communicate with each other via a bus 103. The communication interface 102 may be used for information transfer. The processor 100 may call program instructions in the memory 101 to perform the method for text generation quality assessment of the above-described embodiments.
Further, the program instructions in the memory 101 may be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.
The memory 101, which is a computer-readable storage medium, may be used for storing software programs, computer-executable programs, such as program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor 100 executes functional applications and data processing, i.e., implements the method for text generation quality evaluation in the above-described embodiments, by executing program instructions/modules stored in the memory 101.
The memory 101 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. In addition, the memory 101 may include a high-speed random access memory, and may also include a nonvolatile memory.
By adopting the device for text generation quality evaluation provided by the embodiment of the disclosure, the evaluation model is trained through the sample text with the topic similarity label and the generated sentence identification label, so that the trained evaluation model can comprehensively evaluate the text generation quality through two aspects of the topic similarity and the generated sentence identification.
Compared with the conventional evaluation method, the evaluation method not only evaluates the topic similarity of the reference text and the corresponding generated text, but also considers whether the generated text is easy to be identified as the text generated by the model, and improves the reliability of text generation quality evaluation.
Optionally, the device comprises a computer, smartphone, tablet, or the like.
Embodiments of the present disclosure provide a computer-readable storage medium storing computer-executable instructions configured to perform the above-described method for text generation quality assessment.
The disclosed embodiments provide a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the above-described method for text generation quality assessment.
The computer-readable storage medium described above may be a transitory computer-readable storage medium or a non-transitory computer-readable storage medium.
The technical solution of the embodiments of the present disclosure may be embodied in the form of a software product, where the computer software product is stored in a storage medium and includes one or more instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present disclosure. And the aforementioned storage medium may be a non-transitory storage medium comprising: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes, and may also be a transient storage medium.
The above description and drawings sufficiently illustrate embodiments of the disclosure to enable those skilled in the art to practice them. Other embodiments may incorporate structural, logical, electrical, process, and other changes. The examples merely typify possible variations. Individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in or substituted for those of others. Furthermore, the words used in the specification are words of description only and are not intended to limit the claims. As used in the description of the embodiments and the claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Similarly, the term "and/or" as used in this application is meant to encompass any and all possible combinations of one or more of the associated listed. Furthermore, the terms "comprises" and/or "comprising," when used in this application, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Without further limitation, an element defined by the phrase "comprising an …" does not exclude the presence of other like elements in a process, method or apparatus that comprises the element. In this document, each embodiment may be described with emphasis on differences from other embodiments, and the same and similar parts between the respective embodiments may be referred to each other. For methods, products, etc. of the embodiment disclosures, reference may be made to the description of the method section for relevance if it corresponds to the method section of the embodiment disclosure.
Those of skill in the art would appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software may depend upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments. It can be clearly understood by the skilled person that, for convenience and brevity of description, the specific working processes of the system, the apparatus and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments disclosed herein, the disclosed methods, products (including but not limited to devices, apparatuses, etc.) may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units may be merely a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to implement the present embodiment. In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. In the description corresponding to the flowcharts and block diagrams in the figures, operations or steps corresponding to different blocks may also occur in different orders than disclosed in the description, and sometimes there is no specific order between the different operations or steps. For example, two sequential operations or steps may in fact be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. Each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.