CN118278543A - Answer evaluation model training method, evaluation method, device, equipment and medium - Google Patents

Answer evaluation model training method, evaluation method, device, equipment and medium Download PDF

Info

Publication number
CN118278543A
CN118278543A CN202410465501.0A CN202410465501A CN118278543A CN 118278543 A CN118278543 A CN 118278543A CN 202410465501 A CN202410465501 A CN 202410465501A CN 118278543 A CN118278543 A CN 118278543A
Authority
CN
China
Prior art keywords
answer
model
score
training
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410465501.0A
Other languages
Chinese (zh)
Inventor
陈政宗
万峻辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202410465501.0A priority Critical patent/CN118278543A/en
Publication of CN118278543A publication Critical patent/CN118278543A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The embodiment of the disclosure provides an answer evaluation model training method, an answer evaluation device, equipment, a medium and a program product. The method comprises the following steps: acquiring a training data set comprising training samples; the training samples include: question text, answer text, positive and negative labels and sample scoring; inputting the training data set into an evaluation model to be trained to obtain a corresponding model score; respectively generating a scoring loss function and an error loss function according to the model score, the sample score and the error label; training the evaluation model to be trained according to the loss function to obtain an answer evaluation model. The model training method is characterized in that a sample score which accords with human preference is introduced into a training data set, and the model is trained from two loss functions of positive and error loss and score loss. The trained answer evaluation model not only can judge the correctness of the answers of the questions, but also can evaluate the perfection of the answer solving step provided by the answers from the perspective of human preference.

Description

Answer evaluation model training method, evaluation method, device, equipment and medium
Technical Field
The disclosure relates to the technical field of computers, and in particular relates to an answer evaluation model training method, an answer evaluation device, electronic equipment, a storage medium and a computer program product.
Background
With the continuous development of artificial intelligence technology, a large language model achieves remarkable influence effect in the field of natural language processing. However, because the generation content has a certain difference due to the reasons of deviation of a training model, a model training target and the like in the model training process, various problems such as uncorrelated answers, wrong answers, step missing, inference thinking errors and the like are easy to occur particularly for mathematical answers generated by a large language model. There are significant differences in the experience that it ultimately gives to the user for different problems. Therefore, how to simulate the human preference, comprehensive scoring is performed on the answer results, steps, formats and other multidimensional degrees generated from the large language model, so that the human preference is better simulated to evaluate the answer of the large language model, the answer generation quality of the large language model is improved, and the technical problem to be solved in the field is urgent.
Disclosure of Invention
The embodiment of the disclosure provides an answer evaluation model training method, an answer evaluation device, electronic equipment, a storage medium and a computer program product.
According to a first aspect of embodiments of the present disclosure, there is provided an answer evaluation model training method, including: acquiring a training data set comprising a plurality of training samples; each of the training samples includes at least: question text, answer text, positive and negative labels and sample scoring; the answer text is an answer corresponding to the question text; the correct and incorrect label is used for indicating whether the answer text conclusion is correct or incorrect; the sample score is used for representing the quality of the answer text; inputting the training data set into an evaluation model to be trained; the to-be-trained evaluation model obtains a model score corresponding to the training sample according to the question text and the answer text of the training sample; the model score is used for representing the quality score of the answer text relative to the question text; generating a scoring loss function according to the model score and the sample score of the training sample; the scoring loss function is used to represent the degree of difference between the model score and the sample score; generating a positive-error loss function according to the model score and the positive-error label of the training sample; the positive and negative loss function is used for representing the degree of difference between the model scores and positive and negative labels; and training the evaluation model to be trained according to the scoring loss function and the positive and negative loss function to obtain an answer evaluation model.
In some exemplary embodiments of the present disclosure, a plurality of training sample pairs are formed from the plurality of training samples in the training dataset; the training sample pair comprises: a first training sample and a second training sample; the first training sample comprises at least: a first question text, a first answer text, a first positive-negative label, and a first sample score; the second training sample comprises at least: a second question text, a second answer text, a second positive-negative label, and a second sample score; the first question text and the second question text are the same question text; the first answer text and the second answer text are different answer texts; the evaluation model to be trained obtains a model score corresponding to the training sample according to the question text and the answer text of the training sample, and the evaluation model to be trained further comprises: and obtaining a first model score corresponding to the first training sample and a second model score corresponding to the second training sample through the evaluation model to be trained.
In some exemplary embodiments of the present disclosure, the training samples are scored from multiple dimensions; taking the average of the multiple dimension scores as the sample score; the plurality of dimensions includes: at least two dimensions of step perfection, inference correctness, process relatedness, and format aesthetics.
In some exemplary embodiments of the present disclosure, the to-be-trained evaluation model obtains a model score corresponding to the training sample according to the question text and answer text of the training sample, including: word segmentation processing is carried out on the question text and the answer text of the training sample through a word segmentation device, so that a word sequence corresponding to the training sample is obtained; converting each word in the word sequence into a corresponding word vector through an embedding layer; generating hidden vectors corresponding to the words through a decoding layer according to the word vectors of the words and the position relation of the words in the word sequence; the hidden vectors of the words pass through a neural network of an output layer to obtain word scores corresponding to the words; the term score is used for representing the quality score of each term relative to the question text; and obtaining the model scores according to the word scores of the words.
In some exemplary embodiments of the disclosure, the obtaining the model score according to the word score of each word includes: the word score of the last word in the word sequence is used as the model score; or carrying out weighted average calculation on the word scores of the words to obtain the model scores.
In some exemplary embodiments of the disclosure, the generating a scoring loss function from the model score and sample score of the training sample includes: determining a magnitude relationship between the first and second sample scores in the training sample pair; calculating a model score difference from the first model score and the second model score based on the magnitude relation; and calculating the scoring loss function according to the model scoring difference.
In some exemplary embodiments of the disclosure, the generating a scoring loss function from the model score and sample score of the training sample includes: determining a magnitude relationship between the first and second sample scores in the training sample pair; acquiring the word scores of the words of the first training sample and the second training sample; calculating word differences between the word scores of the words corresponding to the first training samples and the second training samples according to the position relation of the words in the word sequence based on the size relation; carrying out weighted average calculation on the word difference values of the words to obtain model scoring differences; and calculating the scoring loss function according to the model scoring difference.
In some exemplary embodiments of the present disclosure, the obtaining the word score of the respective word of the first training sample and the second training sample further includes: acquiring word sequence lengths of the first training sample and the second training sample; the word sequence length of the first training sample is shorter than the word sequence length of the second training sample; filling words into the word sequence of the first training sample, so that the length of the word sequence of the first training sample after filling is the same as that of the word sequence of the second training sample; and the word score corresponding to the filling word is a preset score.
In some exemplary embodiments of the disclosure, the generating a positive-negative loss function from the model score and positive-negative label of the training sample includes: converting the model scores into prediction probabilities through an activation function; and calculating the error loss function through cross entropy according to the prediction probability and the error label.
In some exemplary embodiments of the present disclosure, the training the evaluation model to be trained according to the scoring loss function and the positive-negative loss function includes: acquiring a training sample set with the same problem text in the training data set; generating a first loss function according to the scoring loss function and the positive-error loss function corresponding to each training sample in the training sample set; and training the evaluation model to be trained according to the first loss function.
According to a second aspect of embodiments of the present disclosure, an answer evaluation method is provided, receiving an answer to be evaluated; the answer to be evaluated at least comprises: question text and answer text; the answer text is an answer corresponding to the question text; the answer text is an answer corresponding to the question text; inputting the answer to be evaluated into the answer evaluation model; the answer evaluation model obtains an answer score corresponding to the answer to be evaluated according to the question text and the answer text of the answer to be evaluated; the answer evaluation model is obtained by training the answer evaluation model training method according to any one of the above; determining the correct and incorrect information of the answer to be evaluated according to the answer score; and the positive and negative information is used for indicating whether the answer text conclusion is correct or incorrect.
According to a third aspect of the embodiments of the present disclosure, there is provided an answer evaluation model training apparatus, including: a sample acquisition module configured to acquire a training data set including a plurality of training samples; each of the training samples includes at least: question text, answer text, positive and negative labels and sample scoring; the answer text is an answer corresponding to the question text; the correct and incorrect label is used for indicating whether the answer text conclusion is correct or incorrect; the sample score is used for representing the quality of the answer text; a model scoring module configured to input the training dataset into a model to be trained for evaluation; the to-be-trained evaluation model obtains a model score corresponding to the training sample according to the question text and the answer text of the training sample; the model score is used for representing the quality score of the answer text relative to the question text; a scoring loss function module configured to generate a scoring loss function from the model score and a sample score of the training sample; the scoring loss function is used to represent the degree of difference between the model score and the sample score; a positive-to-negative loss function module configured to generate a positive-to-negative loss function from the model score and positive-to-negative label of the training sample; the positive and negative loss function is used for representing the degree of difference between the model scores and positive and negative labels; and the model training module is configured to train the evaluation model to be trained according to the scoring loss function and the positive and negative loss function to obtain an answer evaluation model.
According to a fourth aspect of the embodiments of the present disclosure, there is provided an answer evaluation apparatus including: the answer to be evaluated receiving module is configured to receive an answer to be evaluated; the answer to be evaluated at least comprises: question text and answer text; the answer text is an answer corresponding to the question text; an answer scoring module configured to input the answer to be evaluated into the answer evaluation model; the answer evaluation model obtains an answer score corresponding to the answer to be evaluated according to the question text and the answer text of the answer to be evaluated; the answer evaluation model is obtained by training the answer evaluation model training method according to any one of the above; the correct and incorrect judgment module is configured to determine correct and incorrect information of the answer to be evaluated according to the answer score; and the positive and negative information is used for indicating whether the answer text conclusion is correct or incorrect.
According to a fifth aspect of the embodiments of the present disclosure, there is provided an electronic device, including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the executable instructions to implement any one of the answer evaluation model training methods or any one of the answer evaluation methods.
According to a sixth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform any one of the answer evaluation model training methods or any one of the answer evaluation methods.
According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program/instruction, characterized in that the computer program/instruction, when executed by a processor, implements any one of the answer evaluation model training methods or any one of the answer evaluation methods.
According to the answer evaluation model training method provided by the embodiment of the disclosure, the model is trained by introducing sample scores conforming to human preferences into a training data set and losing functions in terms of positive and negative losses and scoring losses. The trained answer evaluation model not only can judge the correctness of the answers of the questions, but also can evaluate the perfection of the answer solving step provided by the answers from the perspective of human preference. The answer evaluation model can automatically evaluate the merits of different answers, thereby saving a great deal of manpower. Meanwhile, the answer evaluation model can also be used as an answer for providing evaluation standards for answers generated by different large language models.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.
FIG. 1 shows a schematic diagram of an exemplary system architecture to which the methods of embodiments of the present disclosure may be applied.
FIG. 2 is a flowchart illustrating a method of training an answer evaluation model according to an example embodiment.
Fig. 3 is a schematic diagram showing a structure of an evaluation model to be trained according to an example.
FIG. 4 is a flowchart illustrating scoring loss function calculation according to one example.
FIG. 5 is another scoring loss function calculation flow diagram according to one example.
FIG. 6 is a flow chart illustrating an answer evaluation model training process according to one example.
Fig. 7 is a flowchart illustrating a method of answer evaluation according to an example embodiment.
FIG. 8 is a block diagram illustrating an answer evaluation model training device, according to an example embodiment.
Fig. 9 is a block diagram illustrating an answer evaluation device according to an exemplary embodiment.
Fig. 10 is a schematic diagram illustrating a structure of an electronic device suitable for use in implementing exemplary embodiments of the present disclosure, according to an exemplary embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted.
The described features, structures, or characteristics of the disclosure may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. However, those skilled in the art will recognize that the aspects of the present disclosure may be practiced with one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.
The drawings are merely schematic illustrations of the present disclosure, in which like reference numerals denote like or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in at least one hardware module or integrated circuit or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and not necessarily all of the elements or steps are included or performed in the order described. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
In the present specification, the terms "a," "an," "the," "said" and "at least one" are used to indicate the presence of at least one element/component/etc.; the terms "comprising," "including," and "having" are intended to be inclusive and mean that there may be additional elements/components/etc., in addition to the listed elements/components/etc.; the terms "first," "second," and "third," etc. are used merely as labels, and do not limit the number of their objects.
FIG. 1 shows a schematic diagram of an exemplary system architecture to which the methods of embodiments of the present disclosure may be applied.
As shown in fig. 1, the system architecture may include a server 101, a network 102, a terminal device 103, a terminal device 104, and a terminal device 105. Network 102 is the medium used to provide communication links between terminal device 103, terminal device 104, or terminal device 105 and server 101. Network 102 may include various connection types such as wired, wireless communication links, or fiber optic cables, among others.
The server 101 may be a server providing various services, such as a background management server providing support for devices operated by a user with the terminal device 103, the terminal device 104, or the terminal device 105. The background management server may perform analysis and other processing on the received data such as the request, and feed back the processing result to the terminal device 103, the terminal device 104, or the terminal device 105.
The terminal device 103, the terminal device 104, and the terminal device 105 may be, but are not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a wearable smart device, a virtual reality device, an augmented reality device, and the like.
It should be understood that the numbers of the terminal device 103, the terminal device 104, the terminal device 105, the network 102 and the server 101 in fig. 1 are only illustrative, and the server 101 may be a server of one entity, may be a server cluster formed by a plurality of servers, may be a cloud server, and may have any number of terminal devices, networks and servers according to actual needs.
The steps of the method in the exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings and examples.
FIG. 2 is a flowchart illustrating a method of training an answer evaluation model according to an example embodiment. The method provided by the embodiment of fig. 2 may be performed by any electronic device, for example, the terminal device in fig. 1, or the server in fig. 1, or a combination of the terminal device and the server in fig. 1, which is not limited in this disclosure.
In step S210, a training data set including a plurality of training samples is acquired; each of the training samples includes at least: question text, answer text, positive-negative labels, and sample scores. The answer text is an answer corresponding to the question text; the correct and incorrect label is used for indicating whether the answer text conclusion is correct or incorrect; and the sample score is used for representing the quality of the answer text.
In the embodiment of the disclosure, the answer evaluation model training method aims at training in the field of knowledge question and answer to generate an answer evaluation model. The answer evaluation model can simulate human preference to evaluate the answer from the view point of the human preference, in particular from the dimensions of step perfection, reasoning correctness, process relativity, format aesthetic property and the like. Meanwhile, the answer evaluation model can be used as a standard for evaluating answer quality in the subsequent generation of answers to questions for different large language models.
Based on this, a training data set including a plurality of training samples is obtained in embodiments of the present disclosure. The training sample comprises a question text and a corresponding answer text. The training sample in the training data set can be obtained by the existing questions and corresponding answers in the existing problem book in the market, such as extracting 10 ten thousand mathematical questions and corresponding answers from the primary school mathematic problem book; or the corresponding answers are automatically generated by the large language model based on the questions, so that the richness of the answers is further expanded. The present disclosure is not limited in terms of question types, which may include, for example, for mathematical questions: solving equations, solving application problems, mathematical operations, statistics, series of numbers, and the like.
The answers provided by the different data sources are not identical for the same question. Five levels of answer variants are generally constructed in this disclosure to fully investigate the different dimensions of answer quality: 1. only answers to the erroneous results are included; 2. only contains the correct result but no answer of the process of solving the problem; 3. the answer including correct answer and the answer of the detailed question solving step; 4. attaching an error step and an answer of the solving step; 5. the answer to the question is generated by a large language model. The objective of the answer evaluation model generated by training in the present disclosure is to evaluate the answer quality in the training sample, and in particular, to distinguish the answer quality of different answers of the same question.
Based on the above, the training samples in the training data set further comprise correct and incorrect labels and sample scores on the basis of the existing question text and answer text. The correct and incorrect labels are used to characterize whether the answer is correct or incorrect. The correct and incorrect label can judge the correct and incorrect of the final result of the answer, namely correct and incorrect identification. The sample score is used for representing the quality size of the answer text. Here, the positive and negative labels and sample scores may be manually labeled in the training sample. The sample grading can be used for measuring the quality of the answer solving step, namely, the quality identification.
The sample score is an important index for measuring the quality of the answer solving step and whether the answer meets the human preference. In an exemplary embodiment, in order to fully embody the advantages and disadvantages of the answer text in different dimensions such as step perfection, reasoning correctness, process relatedness and format aesthetics in the sample score, the answer may be scored from a plurality of different dimensions respectively, and the sample score may be generated based on the scores of the different dimensions. The plurality of dimensions includes: at least two dimensions of step perfection, inference correctness, process relatedness, and format aesthetics.
For example, for a certain question, there are training samples 1,2, and 3 corresponding to answer text 1,2, and 3, respectively. And scoring the relevant answers from three dimensions of step perfection, reasoning correctness and format aesthetic property according to manual scoring. For the training sample 1, the step perfection scoring 5, the reasoning correctness scoring 5 and the format aesthetic scoring 5 are carried out; for the training sample 2, the step perfection scoring 5, the reasoning correctness scoring 2 and the format aesthetic scoring 3 are carried out; for training sample 3, its steps were scored 3, inferred correct scoring 5, and format aesthetic scoring 4. Taking the average score scored based on the different dimensions as the corresponding sample score, namely, the sample score of the training sample 1 is 5; sample score for training sample 2 was 3.3; the sample score for training sample 3 was 4. In this way, the sample score can comprehensively reflect the quality evaluation of different dimensions.
For the quality measurement of the answer solving step, the evaluation between different answers of the same question is based on comparability. Thus, in an exemplary embodiment, a plurality of training sample pairs are formed between a plurality of training samples based further on the same question text for a plurality of training samples in the training data set. Each training sample pair comprises two training samples, a first training sample and a second training sample. As described above, the first training sample includes the first question text, the first answer text, the first positive-negative label and the first sample score; the second training sample comprises a second question text, a second answer text, a second positive and negative label and a second sample score. The problem text of the first training sample and the problem text of the second training sample are the same. The first answer text and the second answer text are different answer texts. Through the training sample pair, transverse comparison can be carried out between different answers of the same question, and through different answer texts, corresponding positive and negative labels and sample scores, the model can learn the quality comparison between different answers.
Since for the same question, multiple different answer texts can be obtained from different data sources, multiple different training samples can be constructed. In an exemplary embodiment, for training samples between different answers to the same question, we use permutation and combination to perform pairwise combination, and may form a plurality of different pairs of the training samples to perform quality comparison between the answers. For example, there are training samples 1,2, and 3 corresponding to answer text 1,2, and 3, respectively, for the same question. Based on this, three training sample pairs { training sample 1, training sample 2}, { training sample 1, training sample 3}, { training sample 2, training sample 3} can be composed through permutation and combination.
By combining a plurality of training samples of the same problem in pairs to form a plurality of different training sample pairs, the model can only need to be compared between two training samples each time, so that the complexity of model design is reduced.
In step S220, the training data set is input into an evaluation model to be trained; and the evaluation model to be trained obtains a model score corresponding to the training sample according to the question text and the answer text of the training sample. The model score is used for representing the quality score of the answer text relative to the question text.
In the embodiment of the disclosure, the obtained training data set is input into a pre-constructed evaluation model to be trained. The evaluation model to be trained is an answer evaluation model to be trained, which is constructed based on a large language model technology, and model scores corresponding to the training samples can be obtained through natural language processing such as word segmentation, encoding and decoding, context understanding and the like on the question text and the answer text in the training samples. The model score is used for representing the quality score of the answer text relative to the question text based on the question text after the analysis processing of the evaluation model to be trained. Through continuous iteration and training of the to-be-trained evaluation model, the model score is expected to be increasingly close to the sample score of the training sample.
In step S230, a scoring loss function is generated from the model score and sample score of the training sample. The scoring loss function is used to represent the degree of difference between the model score and the sample score.
In the embodiment of the disclosure, in order to measure the quality of different answers, a scoring loss function is generated according to the model score and the sample score of the training sample. The scoring loss function is used to represent the degree of difference between the model score and the sample score. Through the scoring loss function, the training evaluation model may be trained such that the model score more and more approaches the sample score of the training sample.
It should be noted that there are various ways to design the scoring loss function, and those skilled in the art can use the scoring loss function design provided by the prior art means according to the actual requirements, and all such designs are considered to be within the scope of the present disclosure.
In step S240, an error loss function is generated from the model score and error label of the training sample. The positive-negative loss function is used for representing the degree of difference between the model score and the positive-negative label.
In the embodiment of the disclosure, in order to judge whether the answer is correct or incorrect, a correct or incorrect loss function is generated according to the model score and the correct or incorrect label of the training sample. The positive-negative loss function is used for representing the degree of difference between the model score and the positive-negative label. The model scoring can judge the correctness of the answer result through the correctness loss function, and the judgment result is consistent with the correctness label of the training sample.
It should be noted that there are various ways to design the related positive-negative loss function, and those skilled in the art should consider the design of the scoring loss function by using the prior art means according to the actual requirements as well as the protection scope of the present disclosure.
In step S250, training the evaluation model to be trained according to the scoring loss function and the positive-negative loss function to obtain an answer evaluation model.
In the embodiment of the present disclosure, the evaluation model to be trained is trained according to the score loss function and the positive-negative loss function generated in the foregoing steps S230 and S240, so that the model score output by the model is more and more close to the sample score of the training sample, and the positive-negative judgment result based on the model score is consistent with the positive-negative label of the training sample. And when the evaluation model to be trained meets the preset convergence condition, the evaluation model to be trained after parameter adjustment is an answer evaluation model.
In an exemplary embodiment, model training parameter adjustment fluctuations may be large due to training the model based on the scoring loss function and the positive-negative loss function generated by a single training sample. In order to optimize the model training process, in this embodiment, training samples with the same problem text are combined to form a training sample set, and a scoring loss function and a positive-error loss function generated by each training sample in the training sample set are integrated to form a first loss function capable of reflecting each training sample in the training sample set. Through the integrated first loss function, the evaluation model to be trained is trained, so that the model training process is stable. A specific process may include the following steps.
And acquiring a training sample set with the same problem text in the training data set.
And generating a first loss function according to the scoring loss function and the positive-error loss function corresponding to each training sample in the training sample set.
And training the evaluation model to be trained according to the first loss function.
There are many ways to integrate the scoring loss function and the positive-negative loss function generated by each training sample, and the specific integration method is not limited herein.
According to the answer evaluation model training method provided by the embodiment of the disclosure, the model is trained by introducing sample scores conforming to human preferences into a training data set and losing functions in terms of positive and negative losses and scoring losses. The trained answer evaluation model not only can judge the correctness of the answers of the questions, but also can evaluate the perfection of the answer solving step provided by the answers from the perspective of human preference. The answer evaluation model can automatically evaluate the merits of different answers, thereby saving a great deal of manpower. Meanwhile, the answer evaluation model can also be used as an answer for providing evaluation standards for answers generated by different large language models.
Fig. 3 is a schematic diagram showing a structure of an evaluation model to be trained according to an example. As shown, in the embodiment of the present disclosure, the evaluation model to be trained includes at least: a word segmenter (Tokenizer), an embedding layer (Embedding Layer), a decoding layer (Layers of Decoder), and an output layer (HEAD LAYER). And the question text and the answer text of the training sample input by the training data set are sequentially analyzed and processed by the word segmentation device, the embedding layer, the decoding layer and the output layer, and finally the output layer outputs the model scores corresponding to the training sample.
Word segmentation device
The word segmentation device is used for carrying out word segmentation processing on the question text and the answer text of the training sample to obtain a word sequence corresponding to the training sample.
In the embodiment of the disclosure, the word segmentation device is used as a data entry of a model and is mainly responsible for carrying out word segmentation processing on input question text and answer text, and segmenting continuous text into independent words with semantic meaning. Each segmented word is further converted and mapped into a preset dictionary, namely word mapping (Tokenization). This is the process of converting words from their original character form into numerical identifiers that the model can understand and process. The preset dictionary may be acquired by an open source resource, and the dictionary data specifically used is not limited herein.
Embedding layer
And the embedding layer is used for converting each word in the word sequence into a corresponding word vector.
In the embodiment of the disclosure, the embedded layer is a key link for connecting words in natural language with the understanding capability of the neural network. The embedding layer converts discrete words obtained by the word segmentation device into dense and low-dimensional word vector expressions. These term vectors not only preserve the inherent semantic properties of the term, but also enable the term that would otherwise not be directly operable to achieve similarity comparisons in mathematical space. In addition, the word vector obtained by training and learning by the embedding layer contains rich context information and semantic relations, such as synonyms or anti-ambiguities, which are often gathered or opposite in the embedding space. By comparing the distances between the term vectors, semantic relationships between corresponding terms can be determined.
Decoding layer
And the decoding layer is used for generating hidden vectors corresponding to the words according to the word vectors of the words and the position relations of the words in the word sequence.
In the disclosed embodiment, the decoding layer is a set of neural network layers responsible for generating hidden vectors from word vectors. The decoding layer recursively performs deep abstraction and conversion on the embedded word vector layer by layer based on the word sequence, comprehensively considers the contents of the word vector and the word vector before the word vector in each step, and outputs the hidden vector of the word vector based on the contents. Specifically, in the autoregressive decoding process, each layer of decoder uses the prediction result of the previous time to update its own internal state, and focuses on the relevant part of the source sequence in combination with the attention mechanism, so as to extract the key information which is helpful for the prediction of the current time. This stepwise decoding approach allows the model to maintain conditional dependencies in generating the target sequence and ensures consistency and logical consistency of the generated lexical sequence.
Wherein, hidden vector (Hidden Vectors) is a corresponding Hidden vector generated by each input word vector through a series of Hidden layer calculations. The hidden vector contains rich information in the context of the word, including but not limited to grammatical features, semantic features, sentence structure information, etc., and is an understanding and prediction of the current word by the model.
Output layer
The output layer is used for enabling the hidden vectors of the words to pass through a neural network to obtain word scores corresponding to the words; and obtaining the model scores according to the word scores of the words.
In the embodiment of the disclosure, the output layer is used for converting the hidden vectors of the words into final output scoring results. In the output layer, hidden vectors of the words can be input into a pre-designed neural network, and mapped to a new vector space through a series of matrix operations of the neural network. In this new vector space, each dimension typically corresponds to a score on a particular linguistic property or probability distribution, and based thereon, a word score corresponding to each word in the training sample is derived, respectively. Based on the word scores corresponding to the respective words, model scores corresponding to the training samples may be further obtained.
In an exemplary embodiment, the term score corresponding to the last term in the term sequence of the training sample may be used as the model score of the training sample. As previously described, each word considers the contents of the word vector and its preceding word vector during the decoding layer processing. Thus, the last word in a word sequence may be considered to have been a word score obtained after a sufficient understanding of the semantic meaning of the previous content. Therefore, the word score corresponding to the last word may be used as the model score for the training sample.
In an exemplary embodiment, the word scores of the words in the word sequence of the training sample may be calculated by weighted average to obtain a model score for the training sample. As previously described, each word considers the contents of the word vector and its preceding word vector during the decoding layer processing. Thus, the more words that follow a word sequence, the more abundant semantic meaning the word sequence is characterized. Therefore, based on weighted average calculation of the word scores of the words, a model score of the training sample can be obtained. The weight design of the weighted average calculation can be adjusted according to actual needs, and is not further limited herein.
The evaluation model to be trained provided by the embodiment of the disclosure can score the questions and the answers of each training sample in the training data set. The model scores are generated based on the question text and answer text by fully understanding the contextual meaning between the words. Providing a data basis for subsequent model training.
FIG. 4 is a flowchart illustrating scoring loss function calculation according to one example. As shown in fig. 4, in the embodiment of the present disclosure, the score loss function calculation process of the aforementioned step S230 may include the following steps.
In step S410, a magnitude relationship between the first and second sample scores in the training sample pair is determined.
As previously described, the first training sample and the second training sample in the training sample pair are two training samples having different answer text for the same question text. And according to the labeling of each training sample in the earlier stage, corresponding first sample scores and second sample scores are respectively labeled for the first answer sample and the second answer sample.
In the embodiment of the disclosure, according to the magnitude relation between the first sample score and the second sample score, determining which sample in the early label provides the answer with better solving step. In the calculation process of the scoring loss function, the model scores given by the training related models are more biased to the training samples with higher sample scores.
Here, the higher the default sample score, the more preferred the answer representing the training sample. It is also within the scope of the present disclosure to have different scoring methods in practical applications.
In step S420, a model score difference is calculated from the first model score and the second model score based on the magnitude relation.
In the embodiment of the disclosure, the size relationship between the model scores and the sample scores are not necessarily consistent. Thus, calculating the model score difference between the first model score and the second model score based solely on the magnitude relationship between the model scores may cause the model to train in reverse. Therefore, when the model score difference is calculated in this step, it is necessary to determine the relationship of the reduction and the subtracted between the first model score and the second model score based on the magnitude relationship between the first sample score and the second sample score determined in the foregoing step S410.
In an exemplary embodiment, the first sample score for the first training sample is 4 and the corresponding first model score is 3.6; the second training sample had a second sample score of 3.3 and the corresponding second model score of 4. Since the first sample score is greater than the second sample score. Thus, the model score difference should be-0.4 with the first model score subtracted from the second model score.
In step S430, the scoring loss function is calculated from the model scoring differences.
In the embodiment of the disclosure, the scoring loss function is calculated according to the model scoring difference in a cross entropy mode. The corresponding formula is as follows:
L1=log(Sa-Sb)
Where L1 represents the scoring loss function, S a represents the first model score, and S b represents the second model score.
FIG. 5 is another scoring loss function calculation flow diagram according to one example. As shown in fig. 5, in the embodiment of the present disclosure, the score loss function calculation process of the aforementioned step S230 may include the following steps.
In step S510, a magnitude relationship between the first and second sample scores in the training sample pair is determined.
As previously described, the first training sample and the second training sample in the training sample pair are two training samples having different answer text for the same question text. And according to the labeling of each training sample in the earlier stage, corresponding first sample scores and second sample scores are respectively labeled for the first answer sample and the second answer sample.
In the embodiment of the disclosure, according to the magnitude relation between the first sample score and the second sample score, determining which sample in the early label provides the answer with better solving step. In the calculation process of the scoring loss function, the model scores given by the training related models are more biased to the training samples with higher sample scores.
Here, the higher the default sample score, the more preferred the answer representing the training sample. It is also within the scope of the present disclosure to have different scoring methods in practical applications.
In step S520, the word scores of the respective words of the first training sample and the second training sample are obtained.
Unlike the embodiment shown in fig. 4, in the embodiment of the present disclosure, the model score difference is not calculated directly based on the first model score and the second model score, thereby generating a score loss function. Instead, as described above in the output layer shown in fig. 3, the word scores corresponding to the respective words in the word sequences corresponding to the first training sample and the second training sample are obtained. According to the scheme provided by the embodiment of the disclosure, the corresponding word difference value is calculated based on the word scores of the words in the two training samples, and then the model score difference value is obtained based on the word difference value of the words.
In an exemplary embodiment, the word sequence length is not necessarily as long as the two training samples. In order to enable two word sequences to calculate the corresponding word differences word by word, it is therefore necessary to supplement shorter word sequences with filler words. As shown in fig. 5, the following steps may be included.
In step S520a, a word sequence length of the first training sample and the second training sample is obtained; the word sequence length of the first training sample is shorter than the word sequence length of the second training sample.
In an embodiment of the disclosure, a word sequence length of a first training sample and a second training sample is obtained. It is assumed that the word sequence length of the first training sample is shorter than the word sequence length of the second training sample. Here, "first" and "second" are used as labels only, and do not represent the order of two training samples. In practice, the word sequence length of the second training sample may be shorter than the word sequence length of the first training sample.
In step S520b, filling words into the word sequence of the first training sample, so that the word sequence length of the first training sample after filling is the same as the word sequence length of the second training sample; and the word score corresponding to the filling word is a preset score.
In the embodiment of the disclosure, filling words are supplemented in the word sequence of the first relatively short training sample, so that the word sequence length of the supplemented first training sample is the same as the word sequence length of the second training sample. Here, the filled-in filler terms are only required to meet the calculation requirements of the subsequent scoring loss function, and do not provide any semantic meaning. The word score corresponding to the filling word is a preset score preset and set in advance. By supplementing the filling words, the word sequences of the two training samples can be subjected to subsequent word-by-word calculation.
In step S530, based on the magnitude relation, a word difference between the words corresponding to the first training sample and the second training sample is calculated according to the positional relation of the respective words in the word sequence.
As described in the foregoing step S420, the relationship between the number of the subtractions and the number of the subtractions between the word scores in the first training sample and the second training sample is determined based on the determined magnitude relationship, which is not described herein.
Based on the determined reduction and the reduced relation, word differences between corresponding words are calculated in sequence word by word according to the positional relation of the words in the word sequence.
In an exemplary embodiment, the word sequence of the first training sample corresponds to a word score of { S a1,Sa2,Sa3,Sa4……San }, and the word sequence of the second training sample corresponds to a word score of { S b1,Sb2,Sb3,Sb4……Sbn }. The word difference value obtained by the word-by-word sequential calculation is {S1=Sa1-Sb1,S2=Sa2-Sb2,S3=Sa3-Sb3,S4=Sa4-Sb4……Sn=San-Sbn}.
In step S540, a weighted average calculation is performed on the word differences of the words to obtain model score differences.
In the embodiment of the disclosure, the word difference value of each word may be calculated by weighted average, so as to obtain the model score difference. As previously described, each word considers the contents of the word vector and its preceding word vector during the decoding layer processing. Thus, the more words that follow a word sequence, the more abundant semantic meaning the word sequence is characterized. Therefore, the model score difference can be obtained based on weighted average calculation of the word difference value of each word. The weight design of the weighted average calculation can be adjusted according to actual needs, and is not further limited herein.
In step S550, the scoring loss function is calculated from the model scoring differences.
In the embodiment of the disclosure, the scoring loss function is calculated according to the model scoring difference in a cross entropy mode. The corresponding formula is as follows:
Wherein L1 represents a scoring loss function, S ai represents the word score of each word in the first training sample, S bi represents the word score of each word in the second training sample, and N represents the number of words in the word sequence.
The embodiment of the disclosure provides two technical schemes for generating a scoring loss function. According to the related technical scheme, according to sample scores, differences of model scores between two training samples for providing different answers to the same question are compared, and then a score loss function is generated in a cross entropy mode. The evaluation model to be trained can be trained through the scoring loss function, so that the obtained model score is more and more similar to the sample score of the training sample.
In an embodiment of the present disclosure, the positive-error loss function calculation process of the aforementioned step S240 may include the following steps.
S1, converting the model scores into prediction probabilities through an activation function.
An activation function is a function in a neural network that maps the inputs of neurons to their outputs, whose output value range can be defined between (0, 1) according to the function settings. Common activation functions, such as sigmoid functions.
In an embodiment of the present disclosure, the model score is converted to an output value range between (0, 1) by an activation function into a predicted probability corresponding to the model score. The predictive probability may be considered as the probability of determining whether the answer result is correct or incorrect based on the model score.
S2, calculating to obtain the positive and negative loss function through cross entropy according to the prediction probability and the positive and negative labels.
In the embodiment of the disclosure, according to the prediction probability and the positive and error labels in the training samples, a positive and error loss function can be obtained through cross entropy calculation. The corresponding formula is as follows:
L2=yc·log(pc)+(1-yc)·log(1-pc)
Wherein L2 represents an evaluation error loss function; y c represents the label value of the correct label, i.e., 0 or 1; p c denotes the prediction probability.
The embodiment of the disclosure provides a technical scheme of a positive-error loss function. The related technical scheme is characterized in that the prediction probability calculated based on the model score is compared with the positive and error labels of the training samples, and then a positive and error loss function is generated in a cross entropy mode. The evaluation model to be trained can be trained through the positive and negative loss function, so that the probability of the answer result being correct or incorrect can be judged based on the model score.
FIG. 6 is a flow chart illustrating an answer evaluation model training process according to one example. As shown in fig. 6, in the embodiment of the present disclosure, the answer evaluation model training process of the foregoing step S250 may include the following steps.
In step S610, a second loss function is generated from the scoring loss function and the error-correction loss function.
In the embodiment of the disclosure, the scoring loss function and the positive-error loss function obtained according to the foregoing steps S230 and S240 are integrated to form a second loss function that may embody each loss function. In an exemplary embodiment, the integration may be performed by weighting and summing the individual loss functions. Through the integrated second loss function, the loss value represented by each loss function can be reflected in subsequent model training.
In step S620, the gradient of the second loss function is calculated by a random gradient descent method.
In an embodiment of the present disclosure, the gradient of the second loss function is calculated by employing a random gradient descent method (Stochastic GRADIENT DESCENT, SGD). The stochastic gradient descent method is an iterative optimization algorithm used in machine learning and deep learning by calculating the gradient of the loss function with respect to model parameters (e.g., weights and biases) at each iteration. The gradient reflects the sensitivity of the loss function with respect to small changes in model parameters, pointing in the direction in which the loss function drops fastest.
In step S630, parameters of the neural network are adjusted by the gradient.
In an embodiment of the disclosure, parameters in the neural network are adjusted according to back-propagation (Backpropagation) of the neural network in the aforementioned output layer based on the gradient calculated based on the second loss function. The back propagation backtracks the neural network layer by layer, and calculates the contribution of each layer of neurons to the gradient of the loss function. After the gradient of each parameter in the neural network is obtained through back propagation, the parameters are adjusted according to a preset learning rate, so that the parameters are updated towards the direction of reducing loss. And repeating the iteration until the loss of the model on the training set reaches a preset convergence condition.
In step S640, in response to the to-be-trained evaluation model meeting a preset convergence condition, the answer evaluation model is obtained.
In the embodiment of the disclosure, when the to-be-trained evaluation model is trained by the steps to meet the preset convergence condition, the to-be-trained evaluation model after the parameters are adjusted is a target answer evaluation model.
The embodiment of the disclosure provides a training process of an answer evaluation model. In the training process, the parameters in the evaluation model to be trained are trained and adjusted by integrating the scoring loss function and the positive and negative error loss function, so that the model score output by the evaluation model to be trained can be more and more close to the sample score of a training sample on one hand, and the probability that the answer result is correct or wrong can be judged on the basis of the model score on the other hand. And repeating iteration until the loss of the model on the training set reaches a preset convergence condition, and obtaining an answer evaluation model obtained by the target.
Fig. 7 is a flowchart illustrating a method of answer evaluation according to an example embodiment. The method provided by the embodiment of fig. 7 may be performed by any electronic device, for example, the terminal device in fig. 1, or the server in fig. 1, or a combination of the terminal device and the server in fig. 1, which is not limited in this disclosure.
In step S710, receiving an answer to be evaluated; the answer to be evaluated at least comprises: question text, answer text.
In the embodiment of the disclosure, an answer to be evaluated is received. The answer to be evaluated at least comprises: question text, answer text. The answer text is an answer corresponding to the question text. The relevant question text and answer text are similar to those of the training sample in step S210, and will not be described here again.
In step S720, inputting the answer to be evaluated into the answer evaluation model; and the answer evaluation model obtains an answer score corresponding to the answer to be evaluated according to the question text and the answer text of the answer to be evaluated.
In the embodiment of the disclosure, the answer to be evaluated is input into the answer evaluation model. And the answer evaluation model obtains an answer score corresponding to the answer to be evaluated according to the question text and the answer text of the answer to be evaluated. The answer evaluation model is obtained through training by any answer evaluation model training method. The answer score is used for measuring the answer quality of the answer text.
In step S730, the correct and incorrect information of the answer to be evaluated is determined according to the answer score.
In the embodiment of the disclosure, the correct and incorrect information of the answer to be evaluated is determined according to the answer score, that is, the answer result is judged to be correct or incorrect.
In an exemplary embodiment, the answer score is converted to a positive-to-negative predictive probability by an activation function. The positive and negative prediction probability is used for representing the probability that the answer result is correct or incorrect. Based on the error prediction probability, the error information is determined according to a preset error judgment threshold.
In an exemplary embodiment, the positive and negative information and answer score are fed back.
The answer evaluation model provided by the embodiment of the disclosure not only can judge whether the answer of the question is correct or not, but also can evaluate the perfection of the answer solving step provided by the answer from the perspective of human preference. The answer evaluation model can automatically evaluate the merits of different answers, thereby saving a great deal of manpower. Meanwhile, the answer evaluation model can also be used as an answer for providing evaluation standards for answers generated by different large language models.
The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.
FIG. 8 is a block diagram illustrating an answer evaluation model training device, according to an example embodiment. Referring to fig. 8, the apparatus 800 may include: a sample acquisition module 810, a model scoring module 820, a scoring loss function module 830, an error loss function module 840, and a model training module 850.
A sample acquisition module 810 configured to acquire a training data set comprising a plurality of training samples; each of the training samples includes at least: question text, answer text, positive and negative labels and sample scoring; the answer text is an answer corresponding to the question text; the correct and incorrect label is used for indicating whether the answer text conclusion is correct or incorrect; and the sample score is used for representing the quality of the answer text.
A model scoring module 820 configured to input the training dataset into a model to be trained for evaluation; and the evaluation model to be trained obtains a model score corresponding to the training sample according to the question text and the answer text of the training sample.
A scoring loss function module 830 configured to generate a scoring loss function from the model score and sample score of the training sample; the scoring loss function is used to represent the degree of difference between the model score and the sample score.
A positive-negative loss function module 840 configured to generate a positive-negative loss function from the model scores and positive-negative labels of the training samples; the positive-negative loss function is used for representing the degree of difference between the model score and the positive-negative label.
The model training module 850 is configured to train the evaluation model to be trained according to the scoring loss function and the positive-negative loss function, so as to obtain an answer evaluation model.
In some exemplary embodiments of the present disclosure, the sample acquisition module 810 is further configured to form a plurality of training sample pairs from the plurality of training samples in the training data set; the training sample pair comprises: a first training sample and a second training sample; the first training sample comprises at least: a first question text, a first answer text, a first positive-negative label, and a first sample score; the second training sample comprises at least: a second question text, a second answer text, a second positive-negative label, and a second sample score; the first question text and the second question text are the same question text; the first answer text and the second answer text are different answer texts; and obtaining a first model score corresponding to the first training sample and a second model score corresponding to the second training sample through the evaluation model to be trained.
In some exemplary embodiments of the present disclosure, the sample acquisition module 810 is further configured to score the training samples from multiple dimensions; taking the average of the multiple dimension scores as the sample score; the plurality of dimensions includes: at least two dimensions of step perfection, inference correctness, process relatedness, and format aesthetics.
In some exemplary embodiments of the present disclosure, the evaluation model to be trained includes: the word segmentation device is used for carrying out word segmentation processing on the question text and the answer text of the training sample through the word segmentation device to obtain a word sequence corresponding to the training sample; the embedding layer is used for converting each word in the word sequence into a corresponding word vector through the embedding layer; the decoding layer is used for generating hidden vectors corresponding to the words according to the word vectors of the words and the position relation of the words in the word sequence; the output layer is used for enabling the hidden vectors of the words to pass through a neural network of the output layer to obtain word scores corresponding to the words; the term score is used for representing the quality score of each term relative to the question text; and obtaining the model scores according to the word scores of the words.
In some exemplary embodiments of the present disclosure, the model scoring module 820 is further configured to score a term of a last term in the sequence of terms as the model score; or carrying out weighted average calculation on the word scores of the words to obtain the model scores.
In some exemplary embodiments of the present disclosure, the scoring loss function module 830 is further configured to determine a magnitude relationship between the first and second sample scores in the training sample pair; calculating a model score difference from the first model score and the second model score based on the magnitude relation; and calculating the scoring loss function according to the model scoring difference.
In some exemplary embodiments of the present disclosure, the scoring loss function module 830 is further configured to determine a magnitude relationship between the first and second sample scores in the training sample pair; acquiring the word scores of the words of the first training sample and the second training sample; calculating word difference values between words corresponding to the first training samples and the second training samples according to the position relation of each word in the word sequence based on the size relation; carrying out weighted average calculation on the word difference values of the words to obtain model scoring differences; and calculating the scoring loss function according to the model scoring difference.
In some exemplary embodiments of the present disclosure, the scoring loss function module 830 is further configured to obtain word sequence lengths of the first training sample and the second training sample; the word sequence length of the first training sample is shorter than the word sequence length of the second training sample; filling words into the word sequence of the first training sample, so that the length of the word sequence of the first training sample after filling is the same as that of the word sequence of the second training sample; and the word score corresponding to the filling word is a preset score.
In some exemplary embodiments of the present disclosure, the positive-negative loss function module 840 is further configured to translate the model score into a predictive probability by an activation function; and calculating the error loss function through cross entropy according to the prediction probability and the error label.
In some exemplary embodiments of the present disclosure, the model training module 850 is further configured to obtain a training sample set having the same problem text in the training data set; generating a first loss function according to the scoring loss function and the positive-error loss function corresponding to each training sample in the training sample set; and training the evaluation model to be trained according to the first loss function.
In some exemplary embodiments of the present disclosure, the model training module 850 is further configured to generate a second loss function from the scoring loss function and the positive-negative loss function; calculating the gradient of the second loss function by a random gradient descent method; adjusting parameters of the neural network through the gradient; and responding to the to-be-trained evaluation model meeting a preset convergence condition to obtain the answer evaluation model.
Fig. 9 is a block diagram illustrating an answer evaluation device according to an exemplary embodiment. Referring to fig. 9, the apparatus 900 may include: an answer to be evaluated receiving module 910, an answer scoring module 920 and an error judging module 930.
The answer to be evaluated receiving module 910 is configured to receive an answer to be evaluated; the answer to be evaluated at least comprises: question text, answer text.
An answer scoring module 920 configured to input the answer to be evaluated into the answer evaluation model; and the answer evaluation model obtains an answer score corresponding to the answer to be evaluated according to the question text and the answer text of the answer to be evaluated.
And an error judging module 930 configured to determine error information of the answer to be evaluated according to the answer score.
In some exemplary embodiments of the present disclosure, the answer evaluation model is trained by the answer evaluation model training method described above.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
An electronic device 1000 according to such an embodiment of the present disclosure is described below with reference to fig. 10. The electronic device 1000 shown in fig. 10 is merely an example and should not be construed as limiting the functionality and scope of use of the disclosed embodiments.
As shown in fig. 10, the electronic device 1000 is embodied in the form of a general purpose computing device. Components of electronic device 1000 may include, but are not limited to: the at least one processing unit 1010, the at least one memory unit 1020, a bus 1030 connecting the various system components (including the memory unit 1020 and the processing unit 1010), and a display unit 1040.
Wherein the storage unit stores program code that is executable by the processing unit 1010 such that the processing unit 1010 performs steps according to various exemplary embodiments of the present disclosure described in the above section of the present specification. For example, the processing unit 1010 may perform the various steps shown in fig. 2.
As another example, the electronic device may implement the various steps shown in fig. 2.
The memory unit 1020 may include readable media in the form of volatile memory units such as Random Access Memory (RAM) 1021 and/or cache memory unit 1022, and may further include Read Only Memory (ROM) 1023.
Storage unit 1020 may also include a program/utility 1024 having a set (at least one) of program modules 1025, such program modules 1025 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 1030 may be representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 1000 can also communicate with one or more external devices 1070 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1000, and/or with any device (e.g., router, modem, etc.) that enables the electronic device 1000 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 1050. Also, electronic device 1000 can communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 1060. As shown, the network adapter 1060 communicates with other modules of the electronic device 1000 over the bus 1030. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with the electronic device 1000, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
In an exemplary embodiment, a computer readable storage medium is also provided, e.g., a memory, comprising instructions executable by a processor of an apparatus to perform the above method. Alternatively, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
In an exemplary embodiment, a computer program product is also provided, comprising a computer program/instruction which, when executed by a processor, implements the method in the above embodiments.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (16)

1. An answer evaluation model training method, comprising the steps of:
acquiring a training data set comprising a plurality of training samples; each of the training samples includes at least: question text, answer text, positive and negative labels and sample scoring; the answer text is an answer corresponding to the question text; the correct and incorrect label is used for indicating whether the answer text conclusion is correct or incorrect; the sample score is used for representing the quality of the answer text;
inputting the training data set into an evaluation model to be trained; the to-be-trained evaluation model obtains a model score corresponding to the training sample according to the question text and the answer text of the training sample; the model score is used for representing the quality score of the answer text relative to the question text;
generating a scoring loss function according to the model score and the sample score of the training sample; the scoring loss function is used to represent the degree of difference between the model score and the sample score;
Generating a positive-error loss function according to the model score and the positive-error label of the training sample; the positive and negative loss function is used for representing the degree of difference between the model scores and positive and negative labels;
And training the evaluation model to be trained according to the scoring loss function and the positive and negative loss function to obtain an answer evaluation model.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises,
Forming a plurality of training sample pairs from the plurality of training samples in the training data set; the training sample pair comprises: a first training sample and a second training sample; the first training sample comprises at least: a first question text, a first answer text, a first positive-negative label, and a first sample score; the second training sample comprises at least: a second question text, a second answer text, a second positive-negative label, and a second sample score; the first question text and the second question text are the same question text; the first answer text and the second answer text are different answer texts;
the evaluation model to be trained obtains a model score corresponding to the training sample according to the question text and the answer text of the training sample, and the evaluation model to be trained further comprises:
and obtaining a first model score corresponding to the first training sample and a second model score corresponding to the second training sample through the evaluation model to be trained.
3. The method as recited in claim 1, further comprising:
Scoring the training samples from a plurality of dimensions; taking the average of the multiple dimension scores as the sample score; the plurality of dimensions includes: at least two dimensions of step perfection, inference correctness, process relatedness, and format aesthetics.
4. The method of claim 1, wherein the model to be trained to evaluate obtains a model score corresponding to the training sample from the question text and answer text of the training sample, comprising:
Word segmentation processing is carried out on the question text and the answer text of the training sample through a word segmentation device, so that a word sequence corresponding to the training sample is obtained;
converting each word in the word sequence into a corresponding word vector through an embedding layer;
Generating hidden vectors corresponding to the words through a decoding layer according to the word vectors of the words and the position relation of the words in the word sequence;
The hidden vectors of the words pass through a neural network of an output layer to obtain word scores corresponding to the words; the term score is used for representing the quality score of each term relative to the question text; and obtaining the model scores according to the word scores of the words.
5. The method of claim 4, wherein the deriving the model score from the term scores for the respective terms comprises:
The word score of the last word in the word sequence is used as the model score; or alternatively
And carrying out weighted average calculation on the word scores of the words to obtain the model scores.
6. The method of claim 2, wherein the generating a scoring loss function from the model score and sample score of the training sample comprises:
determining a magnitude relationship between the first and second sample scores in the training sample pair;
Calculating a model score difference from the first model score and the second model score based on the magnitude relation;
and calculating the scoring loss function according to the model scoring difference.
7. The method of claim 2 or 4, wherein the generating a scoring loss function from the model score and sample score of the training sample comprises:
determining a magnitude relationship between the first and second sample scores in the training sample pair;
acquiring the word scores of the words of the first training sample and the second training sample;
Calculating word differences between the word scores of the words corresponding to the first training samples and the second training samples according to the position relation of the words in the word sequence based on the size relation;
Carrying out weighted average calculation on the word difference values of the words to obtain model scoring differences;
and calculating the scoring loss function according to the model scoring difference.
8. The method of claim 7, wherein the obtaining the word score for the respective word of the first training sample and the second training sample further comprises:
Acquiring word sequence lengths of the first training sample and the second training sample; the word sequence length of the first training sample is shorter than the word sequence length of the second training sample;
Filling words into the word sequence of the first training sample, so that the length of the word sequence of the first training sample after filling is the same as that of the word sequence of the second training sample; and the word score corresponding to the filling word is a preset score.
9. The method of claim 1, wherein the generating a positive-negative loss function from the model score and positive-negative labels of the training samples comprises:
Converting the model scores into prediction probabilities through an activation function;
and calculating the error loss function through cross entropy according to the prediction probability and the error label.
10. The method according to claim 1, wherein training the evaluation model to be trained according to the scoring loss function and the positive-negative loss function comprises:
Acquiring a training sample set with the same problem text in the training data set;
Generating a first loss function according to the scoring loss function and the positive-error loss function corresponding to each training sample in the training sample set;
and training the evaluation model to be trained according to the first loss function.
11. An answer evaluation method, comprising:
Receiving answers to be evaluated; the answer to be evaluated at least comprises: question text and answer text; the answer text is an answer corresponding to the question text; the answer text is an answer corresponding to the question text;
inputting the answer to be evaluated into the answer evaluation model; the answer evaluation model obtains an answer score corresponding to the answer to be evaluated according to the question text and the answer text of the answer to be evaluated; the answer evaluation model being trained by the answer evaluation model training method according to any one of claims 1 to 10;
Determining the correct and incorrect information of the answer to be evaluated according to the answer score; and the positive and negative information is used for indicating whether the answer text conclusion is correct or incorrect.
12. An answer evaluation model training device, comprising:
A sample acquisition module configured to acquire a training data set including a plurality of training samples; each of the training samples includes at least: question text, answer text, positive and negative labels and sample scoring; the answer text is an answer corresponding to the question text; the correct and incorrect label is used for indicating whether the answer text conclusion is correct or incorrect; the sample score is used for representing the quality of the answer text;
A model scoring module configured to input the training dataset into a model to be trained for evaluation; the to-be-trained evaluation model obtains a model score corresponding to the training sample according to the question text and the answer text of the training sample; the model score is used for representing the quality score of the answer text relative to the question text;
A scoring loss function module configured to generate a scoring loss function from the model score and a sample score of the training sample; the scoring loss function is used to represent the degree of difference between the model score and the sample score;
a positive-to-negative loss function module configured to generate a positive-to-negative loss function from the model score and positive-to-negative label of the training sample; the positive and negative loss function is used for representing the degree of difference between the model scores and positive and negative labels;
and the model training module is configured to train the evaluation model to be trained according to the scoring loss function and the positive and negative loss function to obtain an answer evaluation model.
13. An answer evaluation device, comprising:
The answer to be evaluated receiving module is configured to receive an answer to be evaluated; the answer to be evaluated at least comprises: question text and answer text; the answer text is an answer corresponding to the question text;
An answer scoring module configured to input the answer to be evaluated into the answer evaluation model; the answer evaluation model obtains an answer score corresponding to the answer to be evaluated according to the question text and the answer text of the answer to be evaluated; the answer evaluation model being trained by the answer evaluation model training method according to any one of claims 1 to 10;
the correct and incorrect judgment module is configured to determine correct and incorrect information of the answer to be evaluated according to the answer score; the positive and negative information is used for indicating whether the answer text conclusion is correct or incorrect;
and the feedback module is configured to output the correct and incorrect information and the answer score.
14. An electronic device, comprising:
A processor;
a memory for storing the processor-executable instructions;
Wherein the processor is configured to execute the executable instructions to implement the answer evaluation model training method of any one of claims 1 to 10 or the answer evaluation method of claim 11.
15. A computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the answer evaluation model training method of any one of claims 1 to 10 or the answer evaluation method of claim 11.
16. A computer program product comprising computer program/instructions which, when executed by a processor, implements the answer evaluation model training method of any one of claims 1 to 10 or the answer evaluation method of claim 11.
CN202410465501.0A 2024-04-17 2024-04-17 Answer evaluation model training method, evaluation method, device, equipment and medium Pending CN118278543A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410465501.0A CN118278543A (en) 2024-04-17 2024-04-17 Answer evaluation model training method, evaluation method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410465501.0A CN118278543A (en) 2024-04-17 2024-04-17 Answer evaluation model training method, evaluation method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN118278543A true CN118278543A (en) 2024-07-02

Family

ID=91649538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410465501.0A Pending CN118278543A (en) 2024-04-17 2024-04-17 Answer evaluation model training method, evaluation method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN118278543A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118643807A (en) * 2024-08-13 2024-09-13 北京中数睿智科技有限公司 A method for evaluating the quality of information synthesis in large models

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118643807A (en) * 2024-08-13 2024-09-13 北京中数睿智科技有限公司 A method for evaluating the quality of information synthesis in large models

Similar Documents

Publication Publication Date Title
CN112131366B (en) Method, device and storage medium for training text classification model and text classification
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
CN111401084B (en) Method and device for machine translation and computer readable storage medium
CN110377916B (en) Word prediction method, word prediction device, computer equipment and storage medium
CN113268609A (en) Dialog content recommendation method, device, equipment and medium based on knowledge graph
CN118093834B (en) AIGC large model-based language processing question-answering system and method
CN112131883B (en) Language model training method, device, computer equipment and storage medium
US20220147719A1 (en) Dialogue management
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN110929532A (en) Data processing method, device, equipment and storage medium
CN113779225A (en) Entity link model training method, entity link method and device
WO2023173554A1 (en) Inappropriate agent language identification method and apparatus, electronic device and storage medium
CN112349294A (en) Voice processing method and device, computer readable medium and electronic equipment
CN114398899A (en) Training method and device for pre-training language model, computer equipment and medium
CN117726011A (en) Model distillation method and device, medium and equipment for natural language processing
CN111597807B (en) Word segmentation data set generation method, device, equipment and storage medium thereof
CN118278543A (en) Answer evaluation model training method, evaluation method, device, equipment and medium
Hu et al. Emotion prediction oriented method with multiple supervisions for emotion-cause pair extraction
CN114492661B (en) Text data classification method and device, computer equipment and storage medium
CN114648032A (en) Training method and device of semantic understanding model and computer equipment
CN113705207A (en) Grammar error recognition method and device
CN118378631A (en) Text examination method, device, equipment and storage medium
CN117235523A (en) A language generation model improvement method, system, equipment and storage medium
CN113627197B (en) Text intention recognition method, device, equipment and storage medium
CN111680515B (en) Answer determination method and device based on AI (Artificial Intelligence) recognition, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination