CN116561538A

CN116561538A - Question-answer scoring method, question-answer scoring device, electronic equipment and storage medium

Info

Publication number: CN116561538A
Application number: CN202310350746.4A
Authority: CN
Inventors: 李良知; 项彤; 陈方毅
Original assignee: Xiamen Meishao Co ltd
Current assignee: Xiamen Meishao Co ltd
Priority date: 2023-04-04
Filing date: 2023-04-04
Publication date: 2023-08-08

Abstract

The embodiment of the application provides a question and answer scoring method, a question and answer scoring device, electronic equipment and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the following steps: obtaining target corpus data, wherein the target corpus data comprises at least one of structured data and unstructured data; constructing question-answer pair data based on target corpus data, wherein the question-answer pair data comprises a plurality of question-answer pairs, and each question-answer pair comprises a target question and a target answer corresponding to the target question; inputting the target questions into a question-answer model to be scored to generate answers, and obtaining predicted answers; and scoring the question-answer model based on the predicted answer and the target answer to obtain target scoring data of the question-answer model, wherein the target scoring data is used for characterizing the question-answer performance of the question-answer model. The method and the device can improve the scoring efficiency of the question-answering performance of the question-answering model more conveniently on the premise of ensuring that the human annotation level is close.

Description

Question-answer scoring method, question-answer scoring device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a question and answer scoring method, a question and answer scoring device, electronic equipment and a storage medium.

Background

The generation of question-answer pairs of the current question-answer model often depends on manually constructed templates and rules or structured prior knowledge bases, which often causes the generated question-answer pairs to have larger limitations, so that the answer prediction capability of the question-answer model cannot be more comprehensively determined based on the question-answer pairs. In addition, when evaluating the answer prediction capability of the question-answer model, the manual scoring method is often relied on, which results in low efficiency of scoring the question-answer model, while a mechanical index (such as Bleuscore or ROGUE score) based on similarity between the answer of the calculation model and the standard answer, although having advantages in efficiency, cannot fully reflect the real quality of the question-answer model in part of the scenes such as open natural language question-answer, and therefore cannot be widely applied to the evaluation of the question-answer system.

Disclosure of Invention

The embodiment of the application mainly aims to provide a question and answer scoring method, a question and answer scoring device, electronic equipment and a storage medium, and aims to improve the efficiency of scoring a question and answer model while ensuring that the question and answer scoring is close to the human evaluation level.

To achieve the above object, a first aspect of an embodiment of the present application provides a question-answer scoring method, where the method includes:

Obtaining target corpus data, wherein the target corpus data comprises at least one of structured data and unstructured data;

constructing question-answer pair data based on the target corpus data, wherein the question-answer pair data comprises a plurality of question-answer pairs, and each question-answer pair comprises a target question and a target answer corresponding to the target question;

inputting the target questions to a question-answer model to be scored to generate answers, and obtaining predicted answers;

and scoring the question-answer model based on the predicted answer and the target answer to obtain target scoring data of the question-answer model, wherein the target scoring data is used for representing the question-answer performance of the question-answer model.

In some embodiments, the obtaining the target corpus data includes:

extracting original corpus data from a preset database, wherein the original corpus data is structured data;

and performing data cleaning on the original corpus data to obtain the target corpus data.

In some embodiments, the constructing question-answer pair data based on the target corpus data includes:

acquiring a preset question-answering template;

constructing the target questions and target answers corresponding to the target questions based on the question-answer templates and the target corpus data;

Generating the question-answer pair based on the target question and the target answer;

and integrating all the question-answer pairs to obtain the question-answer data.

In some embodiments, the obtaining the target corpus data includes:

obtaining unstructured data determined by a target object;

acquiring sample data corresponding to a target task;

and obtaining the target corpus data based on the sample data and the unstructured data.

inputting the target corpus data into a pre-trained large language model;

performing feature extraction on the target corpus data based on the large language model to obtain the target question and the target answer;

In some embodiments, the scoring the question-answer model based on the predicted answer and the target answer to obtain target scoring data of the question-answer model includes:

acquiring a preset formula corresponding to each scoring index in the target scoring data;

Calculating preliminary scoring data corresponding to each scoring index based on the preset formula, the predicted answer and the target answer;

and obtaining the target scoring data based on the preliminary scoring data.

obtaining grading case data determined by a target object;

inputting the predicted answer, the scoring case data and the target answer into a preset scoring model so that the scoring model carries out context learning according to the scoring case data and the target answer;

and scoring the question-answer model based on the scoring model subjected to the context learning, so as to obtain target scoring data of the question-answer model.

To achieve the above object, a second aspect of the embodiments of the present application proposes a question-answer scoring apparatus, including:

the device comprises:

the data acquisition module is used for acquiring target corpus data, wherein the target corpus data comprises at least one of structured data and unstructured data;

The question-answer pair generation module is used for constructing question-answer pair data based on the target corpus data, wherein the question-answer pair data comprises a plurality of question-answer pairs, and each question-answer pair comprises a target question and a target answer corresponding to the target question;

the answer generation module is used for inputting the target questions into a question-answer model to be scored to generate answers so as to obtain predicted answers;

and the scoring module is used for scoring the question-answer model based on the predicted answer and the target answer to obtain target scoring data of the question-answer model, wherein the target scoring data is used for representing the question-answer performance of the question-answer model.

To achieve the above object, a third aspect of the embodiments of the present application proposes an electronic device, which includes a memory, a processor, where the memory stores a computer program, and the processor implements the method described in the first aspect when executing the computer program.

To achieve the above object, a fourth aspect of the embodiments of the present application proposes a computer-readable storage medium storing a computer program that, when executed by a processor, implements the method of the first aspect.

The question and answer scoring method, the question and answer scoring device, the electronic equipment and the storage medium are used for obtaining target corpus data, wherein the target corpus data comprises at least one of structured data and unstructured data; question-answer pair data are constructed based on target corpus data, wherein the question-answer pair data comprise a plurality of question-answer pairs, each question-answer pair comprises a target question and a target answer corresponding to the target question, the structured data and the unstructured data can be used for generating the question-answer pairs, the quantity richness and the category richness of the question-answer pairs are improved, and the scoring of the question-answer capability of a question-answer model can be realized by using the target questions and the target answers. Further, inputting the target questions into a question-answer model to be scored to generate answers, and obtaining predicted answers; and scoring the question-answer model based on the predicted answer and the target answer to obtain target scoring data of the question-answer model, wherein the target scoring data is used for characterizing the question-answer performance of the question-answer model, so that the question-answer capability of the question-answer model can be conveniently determined according to the difference condition between the predicted answer and the target answer, and the efficiency of scoring the question-answer performance of the question-answer model can be improved while the question-answer scoring is ensured to be close to the human evaluation level.

Drawings

FIG. 1 is a flowchart of a question-answer scoring method provided by an embodiment of the present application;

fig. 2 is a flowchart of step S101 in fig. 1;

fig. 3 is another flowchart of step S101 in fig. 1;

fig. 4 is a flowchart of step S102 in fig. 1;

fig. 5 is another flowchart of step S102 in fig. 1;

fig. 6 is a flowchart of step S104 in fig. 1;

fig. 7 is another flowchart of step S104 in fig. 1;

fig. 8 is a schematic structural diagram of a question-answer scoring apparatus provided in an embodiment of the present application;

fig. 9 is a schematic hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.

First, several nouns referred to in this application are parsed:

artificial intelligence (artificial intelligence, AI): is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding the intelligence of people; artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and to produce a new intelligent machine that can react in a manner similar to human intelligence, research in this field including robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of consciousness and thinking of people. Artificial intelligence is also a theory, method, technique, and application system that utilizes a digital computer or digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

Natural language processing (natural language processing, NLP): NLP is a branch of artificial intelligence that is a interdisciplinary of computer science and linguistics, and is often referred to as computational linguistics, and is processed, understood, and applied to human languages (e.g., chinese, english, etc.). Natural language processing includes parsing, semantic analysis, chapter understanding, and the like. Natural language processing is commonly used in the technical fields of machine translation, handwriting and print character recognition, voice recognition and text-to-speech conversion, information intent recognition, information extraction and filtering, text classification and clustering, public opinion analysis and opinion mining, and the like, and relates to data mining, machine learning, knowledge acquisition, knowledge engineering, artificial intelligence research, linguistic research related to language calculation, and the like.

Information extraction (Information Extraction, IE): extracting the fact information of the appointed type of entity, relation, event and the like from the natural language text, and forming the text processing technology of the structured data output. Information extraction is a technique for extracting specific information from text data. Text data is made up of specific units, such as sentences, paragraphs, chapters, and text information is made up of small specific units, such as words, phrases, sentences, paragraphs, or a combination of these specific units. The noun phrase, the name of a person, the name of a place, etc. in the extracted text data are all text information extraction, and of course, the information extracted by the text information extraction technology can be various types of information.

The generation of question-answer pairs of the current question-answer model often depends on manually constructed templates and rules or structured prior knowledge bases, which often causes the generated question-answer pairs to have larger limitations, so that the answer prediction capability of the question-answer model cannot be more comprehensively determined based on the question-answer pairs. In addition, when evaluating the answer prediction capabilities of the question-answer model, often relying on manual scoring methods, the efficiency of scoring the question-answer model may be low.

Based on the above, the embodiment of the application provides a question and answer scoring method, a question and answer scoring device, electronic equipment and a storage medium, which aim to improve the efficiency of scoring a question and answer model.

The method and device for scoring questions and answers, the electronic device and the storage medium provided by the embodiment of the application are specifically described through the following embodiments, and the method for scoring questions and answers in the embodiment of the application is described first.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The embodiment of the application provides a question-answer scoring method, which relates to the technical field of artificial intelligence. The question and answer scoring method provided by the embodiment of the application can be applied to a terminal, a server side and software running in the terminal or the server side. In some embodiments, the terminal may be a smart phone, tablet, notebook, desktop, etc.; the server side can be configured as an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms and the like; the software may be an application or the like that implements the question-answer scoring method, but is not limited to the above form.

The subject application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Fig. 1 is an optional flowchart of a question-answer scoring method provided in an embodiment of the present application, where the method in fig. 1 may include, but is not limited to, steps S101 to S104.

Step S101, obtaining target corpus data, wherein the target corpus data comprises at least one of structured data and unstructured data;

step S102, question-answer pair data are constructed based on target corpus data, wherein the question-answer data comprise a plurality of question-answer pairs, and each question-answer pair comprises a target question and a target answer corresponding to the target question;

step S103, inputting the target questions into a question-answer model to be scored to generate answers, and obtaining predicted answers;

and step S104, scoring the question-answer model based on the predicted answer and the target answer to obtain target scoring data of the question-answer model, wherein the target scoring data is used for characterizing the question-answer performance of the question-answer model.

Step S101 to step S104 illustrated in the embodiments of the present application, the target corpus data is obtained, where the target corpus data includes at least one of structured data and unstructured data; question-answer pair data are constructed based on target corpus data, wherein the question-answer pair data comprise a plurality of question-answer pairs, each question-answer pair comprises a target question and a target answer corresponding to the target question, the structured data and the unstructured data can be used for generating the question-answer pairs, the quantity richness and the category richness of the question-answer pairs are improved, and the scoring of the question-answer capability of a question-answer model can be realized by using the target questions and the target answers. Further, inputting the target questions into a question-answer model to be scored to generate answers, and obtaining predicted answers; and scoring the question-answer model based on the predicted answer and the target answer to obtain target scoring data of the question-answer model, wherein the target scoring data is used for characterizing the question-answer performance of the question-answer model, so that the question-answer capability of the question-answer model can be conveniently determined according to the difference condition between the predicted answer and the target answer, and the efficiency of scoring the question-answer model can be improved while the question-answer score is ensured to be close to the human evaluation level.

Referring to fig. 2, in some embodiments, step S101 may include, but is not limited to, steps S201 to S202:

step S201, extracting original corpus data from a preset database, wherein the original corpus data is structured data;

step S202, data cleaning is carried out on the original corpus data, and target corpus data are obtained.

In step S201 of some embodiments, the preset database may be a knowledge graph library or a common knowledge library, and the like, without limitation. Taking a preset database as an example of a knowledge graph database, the knowledge graph database can contain binary phrase groups, triples and the like, and the binary phrase groups and the triples are structured data and can be used as original corpus data. Specifically, according to different field requirements, a predetermined number of tuples and triples can be directly extracted from a preset database to serve as original corpus data, wherein the triples can be expression forms such as < head entity, relation, tail entity > or < entity, attribute name, attribute value > and the like, and are not limited.

In step S202 of some embodiments, when the original corpus data is subjected to data cleaning, duplicate original corpus data may be subjected to deduplication, original corpus data with insufficient semantics may be subjected to completion, original corpus data with missing data may be removed, and the cleaned original corpus data may be used as target corpus data, so that the data quality of the target corpus data may be improved.

Through the steps S201 to S202, a plurality of structured data can be conveniently extracted from the existing database, and the structured data is used as target corpus data after being subjected to data cleaning, so that question-answer pairs can be constructed by using the structured data. The method directly utilizes the known entities in the preset database as the target corpus data for constructing question-answer pairs, does not need to extract the entities based on additional corpus, omits the processes of knowledge alignment and entity information retrieval, and can effectively improve the efficiency of generating the question-answer pairs and the scoring efficiency of the question-answer pairs.

Referring to fig. 4, in some embodiments, step S102 may include, but is not limited to, steps S401 to S404:

step S401, obtaining a preset question-answer template;

step S402, constructing a target question and a target answer corresponding to the target question based on the question-answer template and the target corpus data;

step S403, generating a question-answer pair based on the target question and the target answer;

and step S404, carrying out integration processing on all question-answer pairs to obtain question-answer pair data.

In step S401 of some embodiments, the preset question-answer templates may be manually designed templates, which may set question forms and answer forms matching the domain for different domains according to the questions and answers forms set by the relevant person for the domain. When a question-answer pair needs to be constructed, a corresponding question-answer template can be directly called from a preset template library.

In step S402 of some embodiments, entities in the target corpus data are combined based on the question-answer template to form a plurality of positive statement sentences and a plurality of negative statement sentences, and the entities or relationships in the positive statement sentences are replaced or pruned according to the question-answer template and a preset rule to form corresponding negative statement sentences, and the entities or relationships in the negative statement sentences are replaced or pruned according to the question-answer template and the preset rule to form corresponding positive statement sentences, so that the number of statement sentences can be effectively increased. Further, based on the statement sentences and the question forms and the answer forms in the question and answer templates, target questions and target answers corresponding to the target questions are constructed.

For example, the target corpus data is a triplet, where the triplet is < object a, gender, male >, where object a is an entity, gender is an attribute name, male is an attribute value, which is a classical triplet relationship, then the positive statement sentence corresponding to the triplet is "gender of object a is male", the corresponding tag is True (positive), and by modifying the entity or relationship, a negative statement sentence of the positive statement sentence such as "gender of object a is female", "gender of object a is not male", the tags of the two negative statement sentences are False (negative). Further, based on the question form and answer form in these statement sentences and question-answer templates, the construction target question Q1 is "judge whether the following statement is correct: 'sex of subject a is male' ", the target answer is" yes "; the target question Q2 is "determine whether the following statement is correct: 'the sex of subject a is not male' "and the target answer is" not.

In step S403 of some embodiments, each target question and the corresponding target answer are paired one by one to form a plurality of question-answer pairs. Thus, each question-answer pair includes a target question and a target answer corresponding to the target question.

In step S404 of some embodiments, all question-answer pairs are integrated, and all question-answer pairs are incorporated into the same set, and this set is used as question-answer pair data.

Through the steps S401 to S404, sentence construction can be conveniently carried out on the structured data based on the manually designed problem templates to form various statement sentences, and question-answer pairs for positive and negative judgment or question-answer pairs for multiple choices are generated based on the statement sentences, so that the generation efficiency and the generation accuracy of the question-answer pairs can be effectively improved. Meanwhile, the question-answer pairs are generated based on the question templates and the structured data, and the question-answer pairs meet the requirement of a fixed question-answer form, so that no standard or quality problem caused by unconstrained conditions is generated, the question-answer pairs have the characteristic of high quality, and the accuracy of the question-answer scores is improved.

Referring to fig. 3, in some embodiments, step S101 may further include, but is not limited to, steps S301 to S303:

Step S301, unstructured data determined by a target object is acquired;

step S302, sample data corresponding to a target task is obtained;

step S303, obtaining target corpus data based on the sample data and the unstructured data.

In step S301 of some embodiments, because there may be a difference in the requirements of the target object in different application scenarios, in a specific implementation process, unstructured data determined by the target object may be acquired, where the unstructured data is given by the target object according to the actual requirements, and the specific content of the generated question-answer pair may be better controlled, so that the question-answer pair better meets the current actual requirements. For example, for the medical field, medical knowledge can be extracted from unstructured text given by a target object, and knowledge sources can be better expanded as required.

In step S302 of some embodiments, there may be a difference between the task content and the task purpose of the question-answer generation task under different application scenarios. Therefore, in the specific implementation process, sample data corresponding to the target task can be obtained, the sample data can be unstructured data, the sample data is used as target corpus data, and the question-answer pair generation format of the large-scale language model can be standardized, so that the generation result is more standardized and meets the task target requirement.

In step S303 of some embodiments, the sample data and the unstructured data are integrated, and the integrated data set is used as target corpus data.

Through the steps S301 to S303, the specific content of the generated question-answer pair can be controlled by using the unstructured data given by the target object and the sample data corresponding to the target task, so that the generating process of the question-answer pair and the quality of the question-answer pair can be better optimized.

Referring to fig. 5, in some embodiments, step S102 may further include, but is not limited to, steps S501 to S504:

step S501, inputting target corpus data into a pre-trained large language model;

step S502, extracting features of the target corpus data based on a large language model to obtain a target question and a target answer;

step S503, generating a question-answer pair based on the target question and the target answer;

and step S504, carrying out integration processing on all question-answer pairs to obtain question-answer pair data.

In step S501 of some embodiments, a pre-trained large language model may be built based on a GPT-3 architecture or LLaMA architecture. When the large-scale language model is trained, the unstructured large-scale corpus (for example, a Common Crawl corpus and the like) is utilized to pretrain the large-scale language model, unstructured knowledge contained in the large-scale corpus is encoded into model parameters of the large-scale language model, so that the large-scale language model has the capability of generating positive and negative statement question-answer pairs, multiple choice question-answer pairs and open natural language question-answer pairs.

Furthermore, because the language model can also utilize the input unstructured information, target corpus data can also be input into the language model, and the specific content of the question-answer pair generated by the language model is controlled by utilizing the target corpus data, wherein the target corpus data comprises unstructured data provided by a target object and sample data corresponding to a target task.

In step S502 of some embodiments, since the pre-trained large language model has a certain degree of logical reasoning capability and context analysis capability, feature extraction can be performed on the target corpus data based on the large language model, and information in the target corpus data can be analyzed, so as to obtain a target question and a target answer.

In addition, in order to further improve the effect of the large-scale language model on a specific task, for the large-scale language model with larger parameter quantity (such as GPT-3), the fine tuning cost is often too high, so that the model effect is further improved by adopting a context learning mode in the embodiment of the application. For a context learning task, a certain amount of task-specific hint data is typically required, while the target object may also adjust the generated content of the model by providing unstructured text data. In this way, the model effect can be improved and further control of the model generation result by the target object can be realized. For example, given unstructured data input by a target object, control of content by question-answer generated by a model can be achieved by using a small amount of task related sample data, and quality of question-answer pair generation can be improved accordingly.

It should be noted that, for the context learning, the number of samples required is very small, generally several to hundreds of samples are different, and a significant output control effect and improvement of model performance can be achieved by only a minimum of one sample.

In step S503 of some embodiments, the large language model pairs each target question with a corresponding target answer one by one to form a plurality of question-answer pairs. Thus, each question-answer pair includes a target question and a target answer corresponding to the target question.

In step S504 of some embodiments, all question-answer pairs are integrated, and all question-answer pairs are incorporated into the same set, and this set is used as question-answer pair data.

In some embodiments, in order to improve the model performance of the large language model, the question-answer pairs generated by the large language model are not limited to positive and negative statements or multiple choices, so that the large language model can generate open question-answers, and the context learning process of the model can be further optimized. In the specific implementation of this embodiment, the question form of the target question generated by the large language model is not limited to positive and negative statements or multiple choice questions, and the question-answer form of the target question and the target answer is more free, but the specific contents of the target question and the target answer are limited to a large extent by the limitation of the field and the unstructured data given by the target object, and the target answer is not positive and negative judgment but is a natural language description. Specifically, in the context learning task of the large-scale language model, the large-scale language model is trained by utilizing one or more question-answer examples with unlimited patterns marked by the target objects or unstructured texts in a certain field given by the target objects in combination with unstructured data given by the target objects, so that the content of target questions and target answers generated by the large-scale language model is controlled, question-answer forms and question-answer contents of question-answer pairs generated by the large-scale language model are richer, and the quality of the generated question-answer pairs is improved.

The knowledge contained in the unstructured text can be encoded into the model parameters of the large-scale language model in a form of pre-training the large-scale language model by using the large-scale corpus data through the steps S501 to S504; furthermore, the unstructured data given by the target object and the sample data corresponding to the target task can be used for controlling the question and answer content generated by the large-scale language model, so that the quality of the generated question and answer pair and the generation efficiency of the question and answer pair are improved. In addition, the method can more comprehensively cover knowledge in different fields, the large-scale language model can complete the extraction and the utilization of the knowledge without utilizing a cleaned database, the knowledge contained in the large-scale unstructured corpus is generally far more than the knowledge in the structured knowledge base, and the large-scale language model has better logic reasoning capability and context analysis capability, so that more types of question-answer pairs can be generated, and the method is beneficial to improving the variety richness of the question-answer pairs, the semantic integrity of the question-answer pairs and the language fluency.

In step S103 of some embodiments, a question-answer model to be scored may be called from an existing model library, the target question is directly input into the question-answer model to be scored to generate an answer, and the question-answer model performs content recognition and answer reasoning on the target question to obtain a predicted answer. The question-answer model is a trained model, and an embodiment of the present application aims to score the series of question-answer models. The specific process of generating the predicted answer based on the target question by the question-answering model is basically consistent with the question-answering process in the related art, and is not described in detail in the embodiment of the present application.

Referring to fig. 6, in some embodiments, step S104 includes, but is not limited to, steps S601 to S603:

step S601, obtaining a preset formula corresponding to each scoring index in target scoring data;

step S602, calculating preliminary scoring data corresponding to each scoring index based on a preset formula, a predicted answer and a target answer;

step S603, obtaining target scoring data based on the preliminary scoring data.

In step S601 of some embodiments, for a scenario in which the target problem is positive and negative judgment or in a multi-choice form, in order to improve the scoring efficiency, an existing automatic scoring index may be directly introduced to perform model scoring. Specifically, the target scoring data may include one or more scoring indexes, where each scoring index is provided with a corresponding preset formula.

In step S602 of some embodiments, the predicted answer of each target question is compared with the target answer, the comparison situation is counted, and preliminary scoring data corresponding to each scoring index is calculated according to the counting situation and a preset formula.

Specifically, the goal scoring data may include accuracy, precision, recall, and F1 values of the question-answer model, among others. Taking the accuracy rate as an example, taking the target answer of each target question as a reference, taking the predicted answer as an actual, comparing whether the predicted answer is consistent with the target answer, calculating the number of target questions with consistent predicted answer and target answer as 77, calculating the number of target questions with inconsistent predicted answer and target answer as 23, calculating the total number of target questions input into a question-answering model as 100, and obtaining the accuracy rate of the question-answering model as 77/100=0.77 more conveniently according to the statistical data.

In step S603 of some embodiments, the preliminary scoring data is statistically summarized to obtain target scoring data.

Through the steps S601 to S503, the preset formula corresponding to each scoring index can be conveniently called, and the target scoring data of each question-answer model can be directly calculated according to the comparison condition of the predicted answer and the target answer, so that the scoring process can be effectively simplified, and the scoring efficiency can be improved.

Referring to fig. 7, in some embodiments, step S104 may further include, but is not limited to, steps S701 to S703:

step S701, obtaining grading case data determined by a target object;

step S702, inputting the predicted answer, the scoring case data and the target answer into a preset scoring model so that the scoring model performs context learning according to the scoring case data and the target answer;

and step 703, scoring the question-answer model based on the scoring model subjected to the context learning to obtain target scoring data of the question-answer model.

In step S701 of some embodiments, for the scenario that the target question is not a positive or negative judgment or a multiple choice form, the target answer is a statement in a natural language, in order to improve the scoring accuracy, a scoring model needs to be introduced to score the question answering performance of the question answering model. Specifically, scoring is performed on questions and answers in different fields and different application scenarios, and scoring case data given by a target object according to actual requirements needs are required to be obtained, where the scoring case data may be scoring detail data of the same field or similar fields generated in the past, and the scoring case data may be generated according to a request of the target object or according to a request of other objects, and is not limited. The scoring case data are often stored in the same database, and a plurality of scoring case data can be directly called from the database when needed, so that the obtaining difficulty of the scoring case data can be effectively reduced, and the data obtaining efficiency is improved.

In step S702 of some embodiments, after the predicted answer, the scoring case data, and the target answer are input to the predetermined scoring model, the scoring model performs a context learning according to the scoring case data and the target answer, and the context learning process is similar to that of the large language model in step S502, which is omitted for brevity.

In step S703 of some embodiments, when the question-answer model is scored based on the scoring model subjected to the context learning, the scoring model compares the answer information subjected to the context learning according to the scoring case data and the target answer with the predicted answer, and determines a scoring result of the question-answer model according to the comparison condition, where the scoring result is the target scoring data of the question-answer model. When comparing the answer information with the predicted answer, the answer information and the predicted answer can be compared in a plurality of different dimensions, and a final score is output by a scoring model according to the comparison condition of each dimension. For example, the scoring of the question-answer model may include three dimensions: (1) Judging whether the predicted answer output by the question-answering model answers the question or not, and determining the context analysis capability of the question-answering model according to the dimension; (2) Judging whether the predicted answer output by the question-answer model is correct or not, and determining the knowledge accuracy of the question-answer model according to the dimension; (3) Judging whether the predicted answers output by the question-answering model are smooth or not, and determining the language fluency of the question-answering model according to the dimension.

The target questions and target answers that can be scored by the question-answer model through the above steps S701 to S703 are not limited to the positive and negative judgment and the form of multiple choices, and the target answers may be natural language answers given by a large language model. Aiming at the question-answer pair, a scoring model is introduced to score the question-answer performance of the question-answer model, the scoring model can be used for carrying out context learning by using the scoring case data appointed by the target object, the automatic scoring of the question-answer model to be scored can be realized, the scoring automation of the question-answer model can be realized, the accurate assessment of the prediction capability of the question-answer model to the question-answer pair in the natural language form can be realized, and the scoring accuracy and the scoring efficiency of the question-answer model are improved.

In some embodiments, for the situation that the question-answer pair is positive and negative judgment or multiple choices, extracting original corpus data from a preset database, wherein the original corpus data is structured data, and performing data cleaning on the original corpus data to obtain target corpus data. Further, a preset question-answer template is obtained, and a target question and a target answer corresponding to the target question are constructed based on the question-answer template and target corpus data; generating a question-answer pair based on the target question and the target answer; and integrating all question-answer pairs to obtain question-answer pair data. Further, a question-answer model to be scored is called from the existing model library, the target questions are directly input into the question-answer model to be scored for answer generation, and content recognition and answer reasoning are carried out on the target questions by the question-answer model to obtain predicted answers. Finally, a preset formula corresponding to each grading index in the target grading data is obtained; calculating preliminary scoring data corresponding to each scoring index based on a preset formula, a predicted answer and a target answer; and obtaining target scoring data based on the preliminary scoring data. The embodiment can generate question-answer pair data based on the structured data and the question-answer templates, can conveniently call the preset formula corresponding to each scoring index, directly calculate the target scoring data of each question-answer model according to the comparison condition of the predicted answers and the target answers, effectively simplify the scoring process and improve the scoring efficiency.

In some embodiments, unstructured data determined by the target object is obtained for cases where the question-answer pair is positive or negative or multiple choice; sample data corresponding to the target task is obtained, and target corpus data is obtained based on the sample data and unstructured data. Further, inputting the target corpus data into a pre-trained large language model; extracting features of the target corpus data based on the large language model to obtain a target question and a target answer; and generating question-answer pairs based on the target questions and the target answers, and integrating all the question-answer pairs to obtain question-answer pair data. Further, a question-answer model to be scored is called from the existing model library, the target questions are directly input into the question-answer model to be scored for answer generation, and content recognition and answer reasoning are carried out on the target questions by the question-answer model to obtain predicted answers. Finally, a preset formula corresponding to each grading index in the target grading data is obtained; calculating preliminary scoring data corresponding to each scoring index based on a preset formula, a predicted answer and a target answer; and obtaining target scoring data based on the preliminary scoring data. The embodiment can generate question-answer pair data based on unstructured data and a large language model, can conveniently call a preset formula corresponding to each scoring index, directly calculate target scoring data of each question-answer model according to comparison conditions of predicted answers and target answers, and can effectively simplify the scoring process and improve scoring efficiency.

In some embodiments, unstructured data determined by the target object is obtained for situations where question-answer pairs are not limited to positive and negative judgments or multiple choices; sample data corresponding to the target task is obtained, and target corpus data is obtained based on the sample data and unstructured data. Further, inputting the target corpus data into a pre-trained large language model; extracting features of the target corpus data based on the large language model to obtain a target question and a target answer; and generating question-answer pairs based on the target questions and the target answers, and integrating all the question-answer pairs to obtain question-answer pair data. Further, a question-answer model to be scored is called from the existing model library, the target questions are directly input into the question-answer model to be scored for answer generation, and content recognition and answer reasoning are carried out on the target questions by the question-answer model to obtain predicted answers. Finally, scoring case data determined by the target object is obtained; inputting the predicted answers, the scoring case data and the target answers into a preset scoring model so that the scoring model carries out context learning according to the scoring case data and the target answers; and scoring the question-answer model based on the scoring model subjected to the contextual learning to obtain target scoring data of the question-answer model. The embodiment can generate question-answer pair data based on unstructured data and a large language model, can conveniently score the question-answer capability of the question-answer model by utilizing the logical reasoning capability and the contextual learning capability of the scoring model, and can improve the scoring efficiency and the scoring accuracy while ensuring that the question-answer score is close to the human evaluation level.

According to the question-answer scoring method, target corpus data is obtained, wherein the target corpus data comprises at least one of structured data and unstructured data; question-answer pair data are constructed based on target corpus data, wherein the question-answer pair data comprise a plurality of question-answer pairs, each question-answer pair comprises a target question and a target answer corresponding to the target question, the structured data and the unstructured data can be used for generating the question-answer pairs, the quantity richness and the category richness of the question-answer pairs are improved, and the scoring of the question-answer capability of a question-answer model can be realized by using the target questions and the target answers. Further, inputting the target questions into a question-answer model to be scored to generate answers, and obtaining predicted answers; and scoring the question-answer model based on the predicted answer and the target answer to obtain target scoring data of the question-answer model, wherein the target scoring data is used for characterizing the question-answer performance of the question-answer model, so that the question-answer capability of the question-answer model can be conveniently determined according to the difference condition between the predicted answer and the target answer, and the scoring efficiency of the question-answer model can be improved.

Referring to fig. 8, the embodiment of the present application further provides a question-answer scoring device, which may implement the question-answer scoring method, where the device includes:

A data obtaining module 801, configured to obtain target corpus data, where the target corpus data includes at least one of structured data and unstructured data;

a question-answer pair generation module 802, configured to construct question-answer pair data based on target corpus data, where the question-answer data includes a plurality of question-answer pairs, each question-answer pair including a target question and a target answer corresponding to the target question;

the answer generation module 803 is configured to input a target question to a question-answer model to be scored to generate an answer, so as to obtain a predicted answer;

the scoring module 804 is configured to score the question-answer model based on the predicted answer and the target answer, and obtain target scoring data of the question-answer model, where the target scoring data is used to characterize a question-answer performance of the question-answer model.

The specific implementation of the question-answer scoring device is basically the same as the specific embodiment of the question-answer scoring method, and will not be described herein.

The embodiment of the application also provides electronic equipment, which comprises: the system comprises a memory, a processor, a program stored on the memory and capable of running on the processor, and a data bus for realizing connection communication between the processor and the memory, wherein the program realizes the question-answer scoring method when being executed by the processor. The electronic equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.

Referring to fig. 9, fig. 9 illustrates a hardware structure of an electronic device according to another embodiment, the electronic device includes:

the processor 901 may be implemented by a general purpose CPU (central processing unit), a microprocessor, an application specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided by the embodiments of the present application;

the memory 902 may be implemented in the form of read-only memory (ReadOnlyMemory, ROM), static storage, dynamic storage, or random access memory (RandomAccessMemory, RAM). The memory 902 may store an operating system and other application programs, and when the technical solutions provided in the embodiments of the present application are implemented by software or firmware, relevant program codes are stored in the memory 902, and the processor 901 invokes the question-answer scoring method to execute the embodiments of the present application;

an input/output interface 903 for inputting and outputting information;

the communication interface 904 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g. USB, network cable, etc.), or may implement communication in a wireless manner (e.g. mobile network, WIFI, bluetooth, etc.);

A bus 905 that transfers information between the various components of the device (e.g., the processor 901, the memory 902, the input/output interface 903, and the communication interface 904);

wherein the processor 901, the memory 902, the input/output interface 903 and the communication interface 904 are communicatively coupled to each other within the device via a bus 905.

The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to realize the question-answer scoring method.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The question and answer scoring method, the question and answer scoring device, the electronic equipment and the computer readable storage medium provided by the embodiment of the application are used for obtaining target corpus data, wherein the target corpus data comprises at least one of structured data and unstructured data; question-answer pair data are constructed based on target corpus data, wherein the question-answer pair data comprise a plurality of question-answer pairs, each question-answer pair comprises a target question and a target answer corresponding to the target question, the structured data and the unstructured data can be used for generating the question-answer pairs, the quantity richness and the category richness of the question-answer pairs are improved, and the scoring of the question-answer capability of a question-answer model can be realized by using the target questions and the target answers. Further, inputting the target questions into a question-answer model to be scored to generate answers, and obtaining predicted answers; and scoring the question-answer model based on the predicted answer and the target answer to obtain target scoring data of the question-answer model, wherein the target scoring data is used for characterizing the question-answer performance of the question-answer model, so that the question-answer capability of the question-answer model can be conveniently determined according to the difference condition between the predicted answer and the target answer, and the scoring efficiency of the question-answer model can be improved.

The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and as those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.

It will be appreciated by those skilled in the art that the solutions shown in fig. 1-7 are not limiting to embodiments of the present application and may include more or fewer steps than shown, or certain steps may be combined, or different steps.

The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The terms "first," "second," "third," "fourth," and the like in the description of the present application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing a program.

Preferred embodiments of the present application are described above with reference to the accompanying drawings, and thus do not limit the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims

1. A question-answer scoring method, the method comprising:

2. The question-answer scoring method according to claim 1, wherein the obtaining target corpus data includes:

3. The question-answer scoring method according to claim 2, wherein the constructing question-answer pair data based on the target corpus data includes:

Acquiring a preset question-answering template;

4. The question-answer scoring method according to claim 1, wherein the obtaining target corpus data includes:

obtaining unstructured data determined by a target object;

acquiring sample data corresponding to a target task;

5. The question-answer scoring method of claim 4, wherein said constructing question-answer pair data based on said target corpus data comprises:

inputting the target corpus data into a pre-trained large language model;

6. The method for scoring a question and answer according to any one of claims 1 to 5, wherein scoring the question and answer model based on the predicted answer and the target answer to obtain target scoring data of the question and answer model includes:

and obtaining the target scoring data based on the preliminary scoring data.

7. The method for scoring a question and answer according to any one of claims 1 to 5, wherein scoring the question and answer model based on the predicted answer and the target answer to obtain target scoring data of the question and answer model includes:

obtaining grading case data determined by a target object;

8. A question-answer scoring apparatus, the apparatus comprising:

9. An electronic device comprising a memory storing a computer program and a processor that when executing the computer program implements the question-answer scoring method of any one of claims 1 to 7.

10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the question-answer scoring method of any one of claims 1 to 7.