CN111090742A

CN111090742A - Question and answer pair evaluation method and device, storage medium and equipment

Info

Publication number: CN111090742A
Application number: CN201911320757.8A
Authority: CN
Inventors: 陈建华; 崔朝辉; 赵立军; 张霞
Original assignee: Neusoft Corp
Current assignee: Neusoft Corp
Priority date: 2019-12-19
Filing date: 2019-12-19
Publication date: 2020-05-01

Abstract

The application discloses a question-answer pair evaluation method, a device, a storage medium and equipment, wherein the method comprises the following steps: firstly, generating a first evaluation index according to the number of word segments of the questions in the question-answer pair to be evaluated, generating a second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated, and generating a third evaluation index according to the number of words of answers in the question-answer pair to be evaluated, then utilizing a question-answer pair evaluation model constructed in advance, the first evaluation index, the second evaluation index and the third evaluation index are evaluated to obtain the quality of the question and answer pair to be evaluated, the quality of the question and answer pair to be evaluated can be quickly and accurately obtained, the evaluation result eliminates the influence caused by the subjectivity of manual evaluation, in addition, the method and the device consider the word segmentation quantity of the questions in the question-answer pair to be evaluated, the correlation between the subject and the answers and the word number of the answers when evaluating the question-answer pair, so that the quality of the question-answer pair can be evaluated more accurately.

Description

Question and answer pair evaluation method and device, storage medium and equipment

Technical Field

The present application relates to the field of natural language understanding, and in particular, to a method, an apparatus, a storage medium, and a device for evaluating question-answer pairs.

Background

The intelligent question-answering system has the advantages of high efficiency, low cost and the like compared with the traditional artificial customer service system, and more enterprises use the intelligent question-answering system to provide conversation services for users so as to improve the degree of satisfaction of the user services.

In practical application, when the intelligent question-answering system receives a question provided by a user, the intelligent question-answering system automatically queries an answer corresponding to the question from a pre-constructed knowledge base and returns the answer to the user. A large number of question-answer pairs consisting of a plurality of questions and answers corresponding to the questions are stored in a pre-constructed knowledge base. The higher the accuracy rate of answers in each question-answer pair to the question replies, the higher the quality of the question-answer pair is, and the higher the quality of replies when the intelligent question-answer system replies according to the question-answer pair is.

Therefore, in order to improve the response accuracy of the intelligent question-answering system, the storage amount and the coverage field range of the question-answering pairs in the knowledge base need to be continuously expanded. At present, when a knowledge base is expanded, a mode of multi-person cooperation or document question and answer pair extraction is usually adopted, namely, a mode of adding question and answer pairs to the knowledge base by a plurality of persons simultaneously or automatically extracting question and answer pairs from documents is adopted, when the question and answer pairs are added to the knowledge base by the plurality of persons simultaneously, the question and answer pairs are easily influenced by personal subjective factors, the quality standards of the added question and answer pairs are inconsistent, and the mode of automatically extracting the question and answer pairs from the documents cannot ensure the quality of the extracted question and answer pairs, so that the question and answer pairs added to each field in the knowledge base are usually manually evaluated and screened by professionals in each field.

Disclosure of Invention

The embodiment of the present application mainly aims to provide a method, an apparatus, a storage medium, and a device for evaluating a question-answer pair, which can evaluate the quality of the question-answer pair more quickly and accurately.

The embodiment of the application provides a question-answer pair evaluation method, which comprises the following steps:

generating a first evaluation index according to the word segmentation quantity of the questions in the question-answer pair to be evaluated; generating a second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated; generating a third evaluation index according to the number of words of the answer in the question-answer pair to be evaluated; the question-answer pairs to be evaluated comprise questions and answers; the subject in the question-answer pair to be evaluated is extracted from the question;

and evaluating the first evaluation index, the second evaluation index and the third evaluation index of the question-answer pair to be evaluated by utilizing a pre-constructed question-answer pair evaluation model to obtain the quality of the question-answer pair to be evaluated.

In a possible implementation manner, the generating a first evaluation index according to the number of the word segments of the question in the question-answer pair to be evaluated includes:

performing word segmentation on the questions in the question-answer pair to be evaluated by using a conditional random field CRF word segmentation model to obtain a first word segmentation result;

calculating mutual information values between all two adjacent participles; performing word segmentation on the first word segmentation result according to the mutual information value to obtain a second word segmentation result;

and obtaining the number of the participles in the second participle result as a first evaluation index of the question-answer pair to be evaluated.

In a possible implementation manner, the generating a second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated includes:

and obtaining the cosine similarity between the subject and the answer in the question-answer pair to be evaluated as a second evaluation index of the question-answer pair to be evaluated.

In one possible implementation, the method further includes:

obtaining a sample question-answer pair in the field to which the question-answer pair to be evaluated belongs;

and training a pre-constructed initial question-answer pair evaluation model by using the sample question-answer pairs to obtain the question-answer pair evaluation model.

In one possible implementation, the method further includes:

generating a first evaluation index according to the word segmentation quantity of the question in the sample question-answer pair; generating a second evaluation index according to the correlation between the subject and the answer in the sample question-answer pair; generating a third evaluation index according to the number of words of the answer in the sample question-answer pair;

classifying the first evaluation index, the second evaluation index and the third evaluation index of the sample question-answer pair respectively to obtain a classification result corresponding to each evaluation index;

and constructing a corresponding decision tree model as an initial question-answer pair evaluation model according to the classification result corresponding to each evaluation index.

In one possible implementation, the method further includes:

obtaining a verification question-answer pair belonging to the field of the question-answer pair to be evaluated;

generating a first evaluation index according to the word segmentation quantity of the question in the verification question-answer pair; generating a second evaluation index according to the correlation between the subject and the answer in the verification question-answer pair; generating a third evaluation index according to the number of words of the answer in the question-answer verification pair;

inputting the first evaluation index, the second evaluation index and the third evaluation index of the verification question-answer pair into the question-answer pair evaluation model to obtain a quality evaluation result of the verification question-answer pair;

and when the quality evaluation result of the verification question-answer pair is inconsistent with the quality marking result corresponding to the verification question-answer pair, the verification question-answer pair is used as the sample question-answer pair again, and the parameter updating is carried out on the question-answer pair evaluation model.

The embodiment of the present application further provides an evaluation device of a question-answer pair, the device includes:

the first generation unit is used for generating a first evaluation index according to the word segmentation quantity of the question in the question-answer pair to be evaluated; generating a second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated; generating a third evaluation index according to the number of words of the answer in the question-answer pair to be evaluated; the question-answer pairs to be evaluated comprise questions and answers; the subject in the question-answer pair to be evaluated is extracted from the question;

and the evaluation unit is used for evaluating the first evaluation index, the second evaluation index and the third evaluation index of the question-answer pair to be evaluated by utilizing a pre-constructed question-answer pair evaluation model to obtain the quality of the question-answer pair to be evaluated.

In one possible implementation manner, the first generating unit includes:

the first word segmentation subunit is used for segmenting words of the questions in the question-answer pair to be evaluated by using a conditional random field CRF word segmentation model to obtain a first word segmentation result;

the second participle subunit is used for calculating mutual information values between all two adjacent participles; performing word segmentation on the first word segmentation result according to the mutual information value to obtain a second word segmentation result;

and the obtaining subunit is used for obtaining the number of the participles in the second participle result as a first evaluation index of the question-answer pair to be evaluated.

In a possible implementation manner, the first generating unit is specifically configured to:

In one possible implementation, the apparatus further includes:

the first acquisition unit is used for acquiring sample question-answer pairs in the field to which the question-answer pairs to be evaluated belong;

and the training unit is used for training a pre-constructed initial question-answer pair evaluation model by utilizing the sample question-answer pairs to obtain the question-answer pair evaluation model.

In one possible implementation, the apparatus further includes:

the second generation unit is used for generating a first evaluation index according to the word segmentation quantity of the questions in the sample question-answer pairs; generating a second evaluation index according to the correlation between the subject and the answer in the sample question-answer pair; generating a third evaluation index according to the number of words of the answer in the sample question-answer pair;

the classification unit is used for classifying the first evaluation index, the second evaluation index and the third evaluation index of the sample question-answer pair respectively to obtain a classification result corresponding to each evaluation index;

and the construction unit is used for constructing a corresponding decision tree model as an initial question-answer pair evaluation model according to the classification result corresponding to each evaluation index.

In one possible implementation, the apparatus further includes:

the second acquisition unit is used for acquiring a verification question-answer pair in the field to which the question-answer pair to be evaluated belongs;

the third generation unit is used for generating a first evaluation index according to the word segmentation quantity of the question in the verification question-answer pair; generating a second evaluation index according to the correlation between the subject and the answer in the verification question-answer pair; generating a third evaluation index according to the number of words of the answer in the question-answer verification pair;

the obtaining unit is used for inputting a first evaluation index, a second evaluation index and a third evaluation index of the verification question-answer pair into the question-answer pair evaluation model to obtain a quality evaluation result of the verification question-answer pair;

and the updating unit is used for taking the verification question-answer pair as the sample question-answer pair again and updating parameters of the question-answer pair evaluation model when the quality evaluation result of the verification question-answer pair is inconsistent with the quality marking result corresponding to the verification question-answer pair.

According to the technical scheme, the embodiment of the application has the following advantages:

when evaluating a question-answer pair to be evaluated, first generating a first evaluation index according to the number of word segments of a question in the question-answer pair to be evaluated, generating a second evaluation index according to the correlation between a subject and an answer in the question-answer pair to be evaluated, and generating a third evaluation index according to the number of words of the answer in the question-answer pair to be evaluated, wherein the question-answer pair to be evaluated comprises the question and the answer; the method comprises the steps of extracting topics in question-answer pairs to be evaluated from questions, evaluating a first evaluation index, a second evaluation index and a third evaluation index by utilizing a pre-constructed question-answer pair evaluation model to obtain the quality of the question-answer pairs to be evaluated, and comparing with a manual evaluation mode, rapidly and accurately obtaining the quality of the question-answer pairs to be evaluated, eliminating influences caused by subjectivity of manual evaluation on evaluation results, considering the word number of the terms of the questions in the question-answer pairs to be evaluated, the correlation between the topics and the answers and the word number of the answers in the question-answer pairs to be evaluated, and accordingly more accurately evaluating the quality of the question-answer pairs.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic flow chart of an evaluation method of question-answer pairs provided in the present application;

FIG. 2 is a schematic flow chart of a question-answer pair evaluation model constructed according to the present disclosure;

FIG. 3 is a schematic diagram of a decision tree model structure provided herein;

fig. 4 is a block diagram of a structure of an evaluation apparatus for question-answer pairs according to the present application.

Detailed Description

In some methods for evaluating question-answer pairs, professionals in various fields are required to manually evaluate and screen question-answer pairs in a knowledge base so as to judge whether answers in each question-answer pair are accurate to answer questions. Taking the insurance field as an example, after a large number of question-answer pairs in the insurance field are added into a knowledge base in a mode of multi-person cooperation or document extraction, in order to accurately evaluate the quality of the question-answer pairs, the conventional mode needs professionals in the insurance field to manually evaluate the question-answer pairs, and similarly, for other fields, such as the financial field, the medical field and the like, professionals in corresponding fields need to manually evaluate the question-answer pairs in respective fields. However, the quality of the question-answer pairs obtained by manual evaluation by professionals in various fields is easily affected by human subjective factors, and random deviation is generated in evaluation results. Not only the evaluation efficiency is low and the accuracy is not high, but also a large amount of human resources are needed.

In order to solve the above-mentioned drawbacks, in the evaluation of a question-answer pair to be evaluated, first, a first evaluation index is generated according to the number of word segments of a question in the question-answer pair to be evaluated, a second evaluation index is generated according to the correlation between a subject and an answer in the question-answer pair to be evaluated, a third evaluation index is generated according to the number of words of the answer in the question-answer pair to be evaluated, then, a pre-constructed question-answer pair evaluation model is used to evaluate the generated first evaluation index, second evaluation index and third evaluation index to obtain the quality of the question-answer pair to be evaluated The correlation between the theme and the answer and the word number of the answer can further obtain a more accurate evaluation result.

Further, after the evaluation result of the question-answer pair to be evaluated is obtained, if the evaluation result of the question-answer pair is better, that is, the quality of the question-answer pair is higher, the question-answer pair can be directly added into the knowledge base to expand the knowledge base so as to improve the response accuracy of the intelligent question-answer system. However, if the evaluation result of the question-answer pair is poor, that is, the quality of the question-answer pair is low, the question-answer pair can be modified manually to improve the quality of the question-answer pair, and the modified question-answer pair is added into the knowledge base to expand the knowledge base so as to improve the response accuracy of the intelligent question-answer system, thereby further improving the service satisfaction of the user.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Example one

Referring to fig. 1, a schematic flow chart of an evaluation method of a question-answer pair provided in this embodiment is shown, where the method includes the following steps:

s101: generating a first evaluation index according to the word segmentation quantity of the questions in the question-answer pair to be evaluated; generating a second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated; generating a third evaluation index according to the number of words of the answer in the question-answer pair to be evaluated, wherein the question-answer pair to be evaluated comprises a question and an answer; the topics in the question-answer pair to be evaluated are extracted from the questions.

In this embodiment, any question-answer pair for implementing quality evaluation by using this embodiment is defined as a question-answer pair to be evaluated. And each question-answer pair to be evaluated comprises a question and an answer. The question-answer pairs to be evaluated can be question-answer pairs in various fields added into the knowledge base in a multi-person cooperation or document extraction mode. The question-answer pairs can be used as the answer basis of the intelligent question-answer system, namely, after the intelligent question-answer system identifies the questions proposed by the user, the intelligent question-answer system can find the questions corresponding to the questions of the user from a large number of question-answer pairs stored in the knowledge base, then find the answers matched with the questions, and return the answers to the user. It can be seen that the quality of the question-answer pairs is crucial to the reply quality of the intelligent question-answer system.

Therefore, in order to improve the response quality of the intelligent question-answering system, accurate evaluation of the quality of the question-answering system needs to be realized. In addition, in order to facilitate the recognition of questions in question-answer pairs and avoid the influence of excessive topic-independent participles, so as to improve the quality of question-answer pairs, most of the questions are selected from some conceptual questions containing concept words instead of the questions with excessive participles, and meanwhile, the concept definitions of the concept words are used as answers in the question-answer pairs. Therefore, the number of the participles included in the question can be used as an evaluation index for evaluating the quality of the question-answer pair, for example, the smaller the number of the participles included in the question, the smaller the participles irrelevant to the subject are, the higher the quality of the corresponding question-answer pair is, whereas, the larger the number of the participles included in the question, the excessive influence of the participles irrelevant to the subject is indicated, and the lower the quality of the corresponding question-answer pair is. And because the answer of the question is a conceptual explanation of the conceptual question, the answer with too long word number usually has the situation of not asking questions, so the word number of the answer can also be used as an evaluation index for evaluating the quality of question-answer pairs, for example, the smaller the word number of the answer is, the higher the quality of the corresponding question-answer pair is, otherwise, the larger the word number of the answer is, the lower the quality of the corresponding question-answer pair is.

In the present embodiment, in order to evaluate the quality of question-answer pairs more quickly and accurately, the influence of the subjectivity of manual evaluation is eliminated. Firstly, generating a first evaluation index according to the word segmentation quantity of a question in a question-answer pair to be evaluated; generating a second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated; and generating a third evaluation index according to the number of words of the answer in the question-answer pair to be evaluated, and then executing the subsequent step S102 by using the three evaluation indexes to obtain the quality of the question-answer pair to be evaluated.

Next, specific processes of generating the first evaluation index, the second evaluation index, and the third evaluation index will be described in order.

In this embodiment, an optional implementation manner is that a specific implementation process of "generating the first evaluation index according to the number of the participles of the question in the question-answer pair to be evaluated" in step S101 may include the following steps a 1-A3:

step A1: and performing word segmentation on the questions in the question-answer pair to be evaluated by using a conditional random field CRF word segmentation model to obtain a first word segmentation result.

In this implementation manner, in order to generate the first evaluation index, a general word segmentation method with a better word segmentation effect in the current chinese word segmentation Field needs to be selected first, and words are segmented for the questions in the question-answer pair to be evaluated, for example, a Conditional Random Field (CRF) word segmentation model may be used to segment the words for the questions in the question-answer pair to be evaluated, so as to obtain each word segmentation included in the questions, and the word segmentation result is defined as the first word segmentation result. The specific CRF participle calculation formula is as follows:

wherein the content of the first and second substances,

representing a normalization parameter; t represents a transfer characteristic; t is t_kRepresenting a kth transfer characteristic function; s represents a status feature; s_lRepresenting the ith state feature function; lambda [ alpha ]_kAnd mu_kRespectively represent the characteristics t_kAnd s_lThe corresponding weight, the concrete value can be set according to the actual situation and experience value; x represents a question in the question-answer pair to be evaluated; y represents a first segmentation result; y is_iRepresenting the ith word segmentation in the first word segmentation result.

For example, the following steps are carried out: assuming that the question x in the question-answer pair to be evaluated is "how today's weather", the first word segmentation result Y obtained by performing word segmentation on the question-answer pair by using the formula (1) is: "today", "weather", "how".

It can be seen that the essence of the CRF word segmentation model adopted in this embodiment is to find a most likely (i.e., the most probable) word segmentation result sequence for the question in the question-answer pair to be evaluated, and the specific implementation process is consistent with that of the existing method and is not described herein any more.

Step A2: calculating mutual information values between all two adjacent participles; and performing word segmentation on the first word segmentation result according to the mutual information value to obtain a second word segmentation result.

In this implementation manner, when the first segmentation result is obtained by segmenting the question in the question-answer pair to be evaluated in step a1, a CRF segmentation model generated based on general corpus training is used, and the segmentation model has a good segmentation effect in the general field, but has a poor segmentation effect on the question-answer pair containing the professional terms in the professional field. For example, for the problem of including specific disease names and drug names in the medical field, if the words are segmented only by using the CRF word segmentation model, the medical professional terms such as the disease names and the drug names included in the words cannot be accurately identified.

For example, the following steps are carried out: assuming that the question in the question-answer pair to be evaluated is "what effect paracetamol has", after performing word segmentation on the question "what effect paracetamol has" by using a CRF word segmentation model, the obtained first word segmentation result is: "pounding", "heat", "rest", "pain", "having", "what", "acting". It can be seen that, since paracetamol is used as a specific drug name and is not divided into words, the correct word segmentation result cannot be obtained only by using the CRF word segmentation model to segment the questions in question-answer pairs in each field.

Based on this, after the CRF word segmentation model is used for performing word segmentation on the questions in the question-answer pairs to be evaluated to obtain a first word segmentation result, in order to further improve the accuracy of the word segmentation result, mutual information values between all two adjacent words are sequentially calculated from the first word segmentation in the first word segmentation result to judge the closeness degree of the character relationship between the two adjacent words, and then the first word segmentation result can be re-segmented according to the obtained mutual information values, and the re-segmented result is used as a second word segmentation result. The formula for calculating the mutual information value between two adjacent participles is as follows:

wherein A and B represent two adjacent participles in the first participle result; i (A, B) represents mutual information values between the participle A and the participle B; p (A, B) represents the probability that the participle A and the participle B simultaneously appear in question-answer pairs of a pre-constructed knowledge base; p (A) represents the probability of the participle A appearing in the question-answer pair of the pre-constructed knowledge base; p (B) represents the probability that the participle B appears in the question-answer pair of the pre-constructed knowledge base.

After statistical analysis is performed on the question-answer pairs in the pre-constructed knowledge base, the above formula (2) can be expressed as the following formula (3):

wherein n (A, B) represents the co-occurrence times of the participle A and the participle B in the question-answer pair of the pre-constructed knowledge base; n (A) represents the number of times of occurrence of the participle A in the question-answer pair of the pre-constructed knowledge base; n (B) represents the number of times of occurrence of the participle B in the question-answer pair of the pre-constructed knowledge base; and N represents the total number of question-answer pairs in the pre-constructed knowledge base.

It should be noted that the larger the mutual information value between two participles is, the higher the closeness of the text relationship between the two participles is, and further, the larger the probability that the two participles constitute an independent participle is, whereas the smaller the mutual information value between the two participles is, the lower the closeness of the text relationship between the two participles is, and further, the smaller the probability that the two participles constitute an independent participle is.

Specifically, if the mutual information value I (a, B) between the adjacent participles a and B calculated by the above formula (3) exceeds a preset threshold, it indicates that the closeness of the character relationship between the participle a and the participle B is high, and the two participles can form an independent participle. The preset threshold value is a critical value for distinguishing whether two adjacent participles can form an independent participle, if the mutual information value between the two adjacent participles is smaller than the critical value, the tightness degree of the word relationship between the two participles is low, the two participles are still mutually independent two participles, and an independent participle cannot be formed, but if the mutual information value between the two adjacent participles is not smaller than the critical value, the tightness degree of the word relationship between the two participles is high, and an independent participle can be formed and used for representing a specific meaning. It should be noted that a value of the preset threshold may be set according to an actual situation, which is not limited in the embodiment of the present application, for example, the preset threshold may be 10.

For example, the following steps are carried out: based on the above example, after using the CRF word segmentation model to segment the question "what effect paracetamol has", the first segmentation result obtained is: "pounding", "heat", "rest", "pain", "having", "what", "acting". And the total number of question-answer pairs in the pre-constructed knowledge base is 1000, wherein the number of times of occurrence of 'pounce' is 32, the number of times of occurrence of 'heat' is 40, and the preset threshold value is 10. The mutual information value between "flapping" and "heat" can be calculated to be 15.81 using the above equation (3), that is,

exceeding the preset threshold value of 10 indicates that "heat" and "heat" can constitute an independent participle of "heat" and then, if "rest" appears in the pre-constructed knowledge base for 36 times, the mutual information value between "heat" and "rest" can be calculated to be 16.67, i.e.,

if the preset threshold value 10 is exceeded, it indicates that the "paracetamol" and the "message" can form an independent participle of the "paracetamol", and similarly, if the mutual information value between the "paracetamol" and the "pain" still exceeds the preset threshold value 10, it indicates that the "paracetamol" and the "pain" can form an independent participle of the "paracetamol", and further, if the mutual information value between the "paracetamol" and the "there" does not exceed the preset threshold value 10, it indicates that the "paracetamol" and the "there" are mutually independent two participles and cannot form an independent participle. Similarly, the subsequent "what" and "effects can be determined in the same mannerAll the terms are mutually opposite. Therefore, after the re-segmentation is performed on the first segmentation result in combination with the calculated mutual information value, the obtained second segmentation result is: "paracetamol", "has", "what", "acts".

Therefore, the word segmentation accuracy of the second word segmentation result is higher than that of the first word segmentation result.

It should be noted that, in order to improve the word segmentation accuracy, a threshold of the number of characters of a word segmentation may be preset according to the attribute of a professional term in each field, taking the medical field as an example, since the number of characters included in a single word segmentation of a professional term such as a disease name, a drug name, and the like in the medical field generally does not exceed 6, in the medical field, the threshold of the number of characters of a single word segmentation may be set to 6, when the above formula (3) is used to calculate the mutual information of adjacent words in the first word segmentation result, and after the words whose mutual information value exceeds the preset threshold are combined into an independent word, it is to be ensured that the number of characters included in the independent word is not greater than 6, that is, when the number of characters of an independent word reaches 6, it is not necessary to calculate the value of the mutual information between the independent word and the subsequent word segmentation. It should be noted that, the value of the character number threshold of one segmented word preset in each field may be set according to an actual situation, for example, for the medical field, the character number threshold of a single segmented word may be set to 6, but for other fields, the character number threshold may be set to other values, which is not limited in this embodiment of the present application.

Step A3: and obtaining the number of the participles in the second participle result, and using the number of the participles as a first evaluation index of the question-answer pair to be evaluated.

In this implementation manner, after the second word segmentation result corresponding to the question in the question-answer pair to be evaluated is obtained through step a2, the number of words included in the second word segmentation result may be further counted, and the second word segmentation result is used as the first evaluation index of the question-answer pair to be evaluated.

For example, the following steps are carried out: based on the above example, the second sub-word result of the question to be evaluated, which effect the question had on the "paracetamol", obtained by the above step a2, is: "paracetamol", "has", "what", "acts". And further, the number of the participles contained in the second participle result can be calculated to be 4, and then the participle number can be used as a first evaluation index of the question-answer pair to be evaluated.

In this embodiment, an optional implementation manner is that, in step S101, a specific implementation process of "generating a second evaluation index according to a correlation between a topic and an answer in a question-answer pair to be evaluated" may include: and obtaining the cosine similarity between the subject and the answer in the question-answer pair to be evaluated as a second evaluation index of the question-answer pair to be evaluated.

In this implementation, in order to generate the second evaluation index, a topic representing the question and answer to the core content needs to be extracted from the questions of the question and answer pair to be evaluated, for example, for the question "what effect the paracetamol has", the topic representing the question and answer to the core content extracted from the question is "paracetamol". Then, words with higher importance degree can be found out from the answers, for example, words with higher weight values (i.e., higher importance degree) in the answers can be found out by using word Frequency (TF) and Inverse Document Frequency (IDF) of each word in the answers, and cosine similarity between the words and the subject is calculated to serve as a second evaluation index of the question-answer pair to be evaluated, so as to represent the correlation between the subject and the answer in the question-answer pair to be evaluated.

If the calculated cosine similarity value is larger, the correlation between the subject and the answer in the question-answer pair to be evaluated is higher, and otherwise, if the calculated cosine similarity value is smaller, the correlation between the subject and the answer in the question-answer pair to be evaluated is lower. It should be noted that the specific way of calculating the cosine similarity is the same as that of the conventional method, and is not described herein again.

In this embodiment, an optional implementation manner is that, in step S101, a specific implementation process of "generating a third evaluation index according to the number of words of the answer in the question-answer pair to be evaluated" may include: and counting the number of words of the answers in the question-answer pair to be evaluated, and taking the number of words as a third evaluation index of the question-answer pair to be evaluated.

For example, the following steps are carried out: suppose the answer in the question-answer pair to be evaluated is "paracetamol for the treatment of cold fever". The number of words of the answer in the question-answer pair to be evaluated (i.e. the third evaluation index of the question-answer pair to be evaluated) can be counted as 12.

S102: and evaluating the first evaluation index, the second evaluation index and the third evaluation index of the question-answer pair to be evaluated by utilizing a pre-constructed question-answer pair evaluation model to obtain the quality of the question-answer pair to be evaluated.

In this embodiment, after the first evaluation index, the second evaluation index, and the third evaluation index of the question-answer pair to be evaluated are generated in step S101, data processing may be further performed on the evaluation indexes, and the quality of the question-answer pair to be evaluated is determined according to the processing result. Specifically, the first evaluation index, the second evaluation index and the third evaluation index of the question-answer pair to be evaluated may be used as input data, and the input data is input into a question-answer pair evaluation model constructed in advance to obtain the quality of the question-answer pair to be evaluated. It should be noted that, in order to implement step S102, a question-answer pair evaluation model needs to be constructed in advance, and the specific construction process can be referred to in the related description of the second embodiment.

Specifically, after the first evaluation index, the second evaluation index, and the third evaluation index of the question-answer pair to be evaluated are generated in step S101, the first evaluation index, the second evaluation index, and the third evaluation index may be input into an entry of the question-answer pair evaluation model, and an evaluation score value in an interval [0,100] may be output by using an exit of the question-answer pair evaluation model to represent the quality of the question-answer pair to be evaluated. For example, a rating score value of 90 points may be output, indicating that the quality of the challenge-response pair to be rated is high.

Or, an evaluation threshold value of the evaluation score can be preset to distinguish a critical value of the to-be-evaluated question-answer pair with high quality, and if the output evaluation score value is greater than the critical value, the quality of the corresponding to-be-evaluated question-answer pair is high; if the output evaluation score value is not larger than the critical value, the quality of the corresponding question-answer pair to be evaluated is low. It should be noted that the value of the evaluation threshold may be set according to actual conditions, which is not limited in the embodiment of the present application, for example, the evaluation threshold may be set to 75 points.

It should be further noted that, in a possible implementation manner of the embodiment of the present application, in order to facilitate question identification and avoid the influence of too many topic-independent participles, so as to improve the quality of a question-answer pair, in the question-answer pair, some conceptual questions are usually included instead of the question with too many participles, if the first evaluation index of the question-answer pair to be evaluated is generated through the above-mentioned steps a1-A3, it is found that the index value is too large, that is, the number of participles in the question is too many and exceeds a preset participle number threshold, it is not necessary to perform quality evaluation through subsequent steps, and it may be directly determined that the quality of the question-answer pair to be evaluated is too low. The threshold of the number of words to be segmented may be set according to actual conditions, which is not limited in the embodiment of the present application, and may be set to 10, for example.

In summary, in the method for evaluating a question-answer pair provided by this embodiment, when evaluating the question-answer pair to be evaluated, a first evaluation index is generated according to the number of word segments of a question in the question-answer pair to be evaluated, a second evaluation index is generated according to the correlation between the subject and the answer in the question-answer pair to be evaluated, and a third evaluation index is generated according to the number of words of the answer in the question-answer pair to be evaluated, where the question-answer pair to be evaluated includes the question and the answer; the method comprises the steps of extracting topics in question-answer pairs to be evaluated from questions, evaluating a first evaluation index, a second evaluation index and a third evaluation index by utilizing a pre-constructed question-answer pair evaluation model to obtain the quality of the question-answer pairs to be evaluated, and comparing with a manual evaluation mode, rapidly and accurately obtaining the quality of the question-answer pairs to be evaluated, eliminating influences caused by subjectivity of manual evaluation on evaluation results, considering the word number of the terms of the questions in the question-answer pairs to be evaluated, the correlation between the topics and the answers and the word number of the answers in the question-answer pairs to be evaluated, and accordingly more accurately evaluating the quality of the question-answer pairs.

Example two

The present embodiment will describe a specific process for constructing the question-answer pair evaluation model mentioned in the first embodiment. By utilizing the pre-constructed question-answer pair evaluation model, the quality of the question-answer pair can be evaluated more quickly and accurately.

Referring to fig. 2, a schematic flow chart of constructing a question-answer pair evaluation model provided in this embodiment is shown, where the flow chart includes the following steps:

s201: and obtaining a sample question-answer pair in the field to which the question-answer pair to be evaluated belongs.

In this embodiment, in order to construct a question-answer pair evaluation model, a large amount of preparation work needs to be performed in advance, first, sample question-answer pairs in the field to which the question-answer pairs to be evaluated belong need to be collected, for example, assuming that the question-answer pairs to be evaluated belong to the medical field, 1000 question-answer pairs in the medical field may be collected in advance, each collected question-answer pair may be respectively used as a sample question-answer pair, and evaluation scores or grades of the sample question-answer pairs are manually marked by professionals in the medical field in advance to represent actual quality of the sample question-answer pairs.

S202: and training a pre-constructed initial question-answer pair evaluation model by using the sample question-answer pairs to obtain a question-answer pair evaluation model.

In this embodiment, after the sample question-answer pairs in the field to which the question-answer pairs to be evaluated belong are obtained in step S201, the sample question-answer pairs may be further used as training data to train to obtain a question-answer pair evaluation model.

Specifically, after obtaining each sample question-answer pair, a method similar to that used in step S101 of the embodiment to generate the first evaluation index, the second evaluation index, and the third evaluation index of the question-answer pair to be evaluated may be adopted, and the question-answer pair to be evaluated is replaced by the sample question-answer pair, so that the first evaluation index, the second evaluation index, and the third evaluation index of each sample question-answer pair may be generated. Furthermore, a pre-constructed initial question-answer pair evaluation model can be trained by using a first evaluation index, a second evaluation index and a third evaluation index of the sample question-answer pairs, and relevant model parameters in the initial question-answer pair evaluation model are adjusted to obtain a question-answer pair evaluation model.

Next, the present embodiment introduces how to construct an initial question-answer evaluation model through the following steps B1-B3:

step B1: generating a first evaluation index according to the number of word segmentation of the question in the sample question-answer pair; generating a second evaluation index according to the correlation between the subject and the answer in the sample question-answer pair; and generating a third evaluation index according to the number of words of the answer in the sample question-answering pair.

In this embodiment, in order to construct an initial question-answer pair evaluation model for training and generating a question-answer pair evaluation model, and improve the quality evaluation efficiency and accuracy of the question-answer pairs, an optional implementation manner is that after a large number of sample question-answer pairs are obtained, a part of the sample question-answer pairs may be randomly selected from the sample question-answer pairs as initial training data for constructing the initial question-answer pair evaluation model. Specifically, a method similar to the method for generating the first evaluation index, the second evaluation index and the third evaluation index of the question-answer pair to be evaluated in step S101 of the embodiment may be adopted, the question-answer pair to be evaluated is replaced by the sample question-answer pair, that is, the first evaluation index is generated according to the number of the word segments of the questions in the sample question-answer pair, the second evaluation index is generated according to the correlation between the subject and the answer in the sample question-answer pair, and the third evaluation index is generated according to the number of the words of the answer in the sample question-answer pair, and the correlation points refer to the description of the first embodiment, which is not repeated here.

Step B2: and classifying the first evaluation index, the second evaluation index and the third evaluation index of the sample question-answer pair respectively to obtain a classification result corresponding to each evaluation index.

In this embodiment, after the first evaluation index, the second evaluation index and the third evaluation index of the sample question-answer pairs are generated in step B1, the first evaluation index, the second evaluation index and the third evaluation index may be further classified respectively to obtain classification results corresponding to the evaluation indexes. For example, the first evaluation index may be classified according to the number of words in the question, the second evaluation index may be classified according to the cosine similarity between the subject and the answer, and the third evaluation index may be classified according to the number of words in the answer.

For example, the following steps are carried out: assuming that after a part of randomly selected sample question-answer pairs are subjected to data processing, the first evaluation index of each sample question-answer pair is as follows: the number of the participles in the second participle result corresponding to the question is 1, 2, 3 and 4 respectively. The second evaluation index of each sample question-answer pair is as follows: the cosine similarity values between the subject and the answer fall into the following four ranges respectively: 0.2 or less, 0.2 to 0.5, 0.5 to 0.8, 0.8 to 1. The third evaluation index of each sample question-answer pair is as follows: the number of words of the answer falls into the following four ranges: 100 characters or less, 100 to 300 characters, 300 to 500 characters, and 500 characters or more.

The first evaluation index, the second evaluation index, and the third evaluation index may be further classified into four categories, respectively. Wherein, the four classification results of the first evaluation index are respectively: the number of the participles is 1, the number of the participles is 2, the number of the participles is 3 and the number of the participles is 4. The four classification results of the second evaluation index are respectively: the correlation is 0.2 or less, the correlation is 0.2 to 0.5, the correlation is 0.5 to 0.8, and the correlation is 0.8 to 1. The four classification results of the third evaluation index are respectively: the answer is 100 characters below, the answer is 100 characters to 300 characters, the answer is 300 characters to 500 characters, and the answer is 500 characters above.

Step B3: and constructing a corresponding decision tree model as an initial question-answer pair evaluation model according to the classification result corresponding to each evaluation index.

In this embodiment, after the first evaluation index, the second evaluation index, and the third evaluation index of the sample question-answer pair are classified respectively in step B2 to obtain the classification result corresponding to each evaluation index, a corresponding decision tree model may be further constructed according to the classification result corresponding to each evaluation index, and the decision tree model is used as the initial question-answer pair evaluation model.

For example, the following steps are carried out: based on the above example, after four classification results of the first evaluation index, the second evaluation index, and the third evaluation index of a part of randomly selected sample question-answer pairs are obtained, a decision tree model including an entry and an exit of the decision tree model may be constructed based on the classification results, and parameters of the decision tree model are initialized to serve as an initial question-answer pair evaluation model, as shown in fig. 3.

It is understood that the initial question-answering is not unique to the network structure of the evaluation model in the present application, and the structure of the decision tree model shown in fig. 3 is only an example, and other network structures can be adopted. With the difference of the classification results of the first evaluation index, the second evaluation index and the third evaluation index of the sample question-and-answer pair, the network structures of the constructed decision tree models are also different, and specific structural parameters can be initialized according to actual conditions, which is not limited in the embodiment of the present application.

After the initial question-answer pair evaluation model is constructed through the steps B1-B3, one sample question-answer pair may be sequentially extracted from a large number of sample question-answer pairs obtained through the step S201, and multiple rounds of model training may be performed until the training end condition is satisfied, at which time, the question-answer pair evaluation model is generated.

Specifically, during the current round of training, the question-answer pair to be evaluated in the first embodiment may be replaced by the sample question-answer pair extracted in the current round, and the quality of the question-answer pair may be obtained through the current initial question-answer pair evaluation model according to the execution process in the first embodiment. Specifically, according to steps S101 to S102 in the first embodiment, after the first evaluation index, the second evaluation index, and the third evaluation index of the sample question-answer pair are generated, the evaluation score value in one interval [0,100] can be output through the initial question-answer pair evaluation model. Then, the evaluation score value can be compared with the corresponding manually marked evaluation score, and the model parameter is updated according to the difference between the evaluation score value and the corresponding manually marked evaluation score until a preset condition is met, for example, the difference value has a small change amplitude, the updating of the model parameter is stopped, the training of the question-answer evaluation model is completed, and a trained question-answer evaluation model is generated.

By the embodiment, the question-answer pair evaluation model can be generated by utilizing the sample question-answer pair training, and further, the generated question-answer pair evaluation model can be verified by utilizing the verification question-answer pair. The specific verification process may include the following steps C1-C4:

step C1: and obtaining the verification question-answer pair belonging to the field of the question-answer pair to be evaluated.

In this embodiment, in order to verify the question-answer pair evaluation model, a verification question-answer pair in the field to which the question-answer pair to be evaluated belongs needs to be obtained first, where the verification question-answer pair refers to a question-answer pair that can be used to verify the question-answer pair evaluation model, and after the verification question-answer pair is obtained, the subsequent step C2 may be continuously performed.

Step C2: generating a first evaluation index according to the word segmentation quantity of the question in the verification question-answer pair; generating a second evaluation index according to the correlation between the subject and the answer in the verification question-answer pair; and generating a third evaluation index according to the number of words of the answer in the question-answer verification pair.

After the verification question-answer pair is obtained through the step C1, the verification question-answer pair cannot be directly used for verifying a question-answer pair evaluation model, and a first evaluation index needs to be generated according to the number of word segments of the question in the verification question-answer pair; generating a second evaluation index according to the correlation between the subject and the answer in the verification question-answer pair; and generating a third evaluation index according to the number of words of the answers in the verification question-answer pair, and further verifying to obtain a question-answer pair evaluation model by using the generated first evaluation index, second evaluation index and third evaluation index of the verification question-answer pair.

Step C3: and inputting the first evaluation index, the second evaluation index and the third evaluation index of the verification question-answer pair into the question-answer pair evaluation model to obtain a quality evaluation result of the verification question-answer pair.

After the first evaluation index, the second evaluation index, and the third evaluation index of the verification question-answer pair are generated through the step C2, the first evaluation index, the second evaluation index, and the third evaluation index of the verification question-answer pair may be further input into the question-answer pair evaluation model to obtain a quality evaluation result of the verification question-answer pair, and then the subsequent step C4 may be further continuously performed.

Step C4: and when the quality evaluation result of the verification question-answer pair is inconsistent with the quality marking result corresponding to the verification question-answer pair, the verification question-answer pair is re-used as a sample question-answer pair, and the parameter updating is carried out on the question-answer pair evaluation model.

After the quality evaluation result of the verification question-answer pair is obtained through the step C3, if the quality evaluation result of the verification question-answer pair is inconsistent with the manual labeling result corresponding to the verification question-answer pair, the verification question-answer pair may be used as the sample question-answer pair again, and the parameter update is performed on the evaluation model of the question-answer pair.

By the embodiment, the question-answer pair evaluation model can be effectively verified by using the verification question-answer, and when the quality evaluation result of the verification question-answer pair is inconsistent with the manual marking result corresponding to the verification question-answer pair, the question-answer pair evaluation model can be adjusted and updated in time, so that the evaluation precision and accuracy of the evaluation model can be improved.

In summary, the question-answer pair evaluation model trained by the embodiment can quickly and accurately evaluate the quality of the question-answer pair to be evaluated by using the first evaluation index, the second evaluation index and the third evaluation index of the question-answer pair to be evaluated, so that the efficiency and the accuracy of quality evaluation of the question-answer pair to be evaluated are effectively improved, and the waste of human resources is avoided.

EXAMPLE III

In this embodiment, a question-answer pair evaluation device will be described, and for related contents, please refer to the above method embodiments.

Referring to fig. 4, a block diagram of an evaluation apparatus for question-answer pairs provided in this embodiment is shown, where the apparatus includes:

the first generating unit 401 is configured to generate a first evaluation index according to the number of word segments of the question in the question-answer pair to be evaluated; generating a second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated; generating a third evaluation index according to the number of words of the answer in the question-answer pair to be evaluated; the question-answer pairs to be evaluated comprise questions and answers; the subject in the question-answer pair to be evaluated is extracted from the question;

the evaluation unit 402 is configured to evaluate the first evaluation index, the second evaluation index, and the third evaluation index of the question-answer pair to be evaluated by using a pre-constructed question-answer pair evaluation model, so as to obtain the quality of the question-answer pair to be evaluated.

In one possible implementation manner, the first generating unit 401 includes:

In a possible implementation manner, the first generating unit 401 is specifically configured to:

In one possible implementation, the apparatus further includes:

When a question-answer pair to be evaluated is evaluated, a first evaluation index is generated according to the number of word segments of a question in the question-answer pair to be evaluated, a second evaluation index is generated according to the correlation between a subject and an answer in the question-answer pair to be evaluated, and a third evaluation index is generated according to the number of words of the answer in the question-answer pair to be evaluated, wherein the question-answer pair to be evaluated comprises the question and the answer; the method comprises the steps of extracting topics in question-answer pairs to be evaluated from questions, evaluating a first evaluation index, a second evaluation index and a third evaluation index by utilizing a pre-constructed question-answer pair evaluation model to obtain the quality of the question-answer pairs to be evaluated, and comparing with a manual evaluation mode, rapidly and accurately obtaining the quality of the question-answer pairs to be evaluated, eliminating influences caused by subjectivity of manual evaluation on evaluation results, considering the word number of the terms of the questions in the question-answer pairs to be evaluated, the correlation between the topics and the answers and the word number of the answers in the question-answer pairs to be evaluated, and accordingly more accurately evaluating the quality of the question-answer pairs.

In addition, an embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device is caused to execute the above-mentioned method for evaluating a question-answer pair.

The embodiment of the present application further provides an evaluation device for question-answer pairs, including: the system comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the computer program, the evaluation method of the question-answer pair is realized.

The embodiment of the application also provides a computer program product, and when the computer program product runs on the terminal equipment, the terminal equipment executes the question-answer pair evaluation method.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A question-answer pair evaluation method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the generating a first evaluation index according to the number of word segments of the question in the question-answer pair to be evaluated comprises:

3. The method according to claim 1, wherein the generating a second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated comprises:

4. The method according to any one of claims 1 to 3, further comprising:

5. The method of claim 4, further comprising:

6. The method of claim 4, further comprising:

7. An apparatus for evaluating a question-answer pair, the apparatus comprising:

8. The apparatus of claim 7, wherein the first generating unit comprises:

9. A computer-readable storage medium, having stored therein instructions that, when run on a terminal device, cause the terminal device to execute the method of evaluating a question-and-answer pair according to any one of claims 1 to 6.

10. A question-and-answer pair evaluation apparatus, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method for evaluating a challenge-response pair according to any one of claims 1-6 when executing the computer program.