CN111090742B

CN111090742B - Question-answer pair evaluation method, question-answer pair evaluation device, storage medium and equipment

Info

Publication number: CN111090742B
Application number: CN201911320757.8A
Authority: CN
Inventors: 陈建华; 崔朝辉; 赵立军; 张霞
Original assignee: Neusoft Corp
Current assignee: Neusoft Corp
Priority date: 2019-12-19
Filing date: 2019-12-19
Publication date: 2024-05-17
Anticipated expiration: 2039-12-19
Also published as: CN111090742A

Abstract

The application discloses a question and answer pair evaluation method, a question and answer pair evaluation device, a storage medium and equipment, wherein the question and answer pair evaluation method comprises the following steps: firstly, generating a first evaluation index according to the word segmentation quantity of questions in a question-answer pair to be evaluated, generating a second evaluation index according to the correlation between the subjects and answers in the question-answer pair to be evaluated, and generating a third evaluation index according to the word number of the answers in the question-answer pair to be evaluated, and then evaluating the first evaluation index, the second evaluation index and the third evaluation index by utilizing a pre-constructed question-answer pair evaluation model to obtain the quality of the question-answer pair to be evaluated.

Description

Question-answer pair evaluation method, question-answer pair evaluation device, storage medium and equipment

Technical Field

The present application relates to the field of natural language understanding, and in particular, to a method, an apparatus, a storage medium, and a device for evaluating question and answer pairs.

Background

The intelligent question-answering system has the advantages of high efficiency, low cost and the like compared with the traditional artificial customer service system, and more enterprises currently use the intelligent question-answering system to provide dialogue services for users so as to improve the service satisfaction of the users.

In practical application, when the intelligent question-answering system receives a question raised by a user, the intelligent question-answering system automatically queries an answer corresponding to the question from a pre-constructed knowledge base and returns the answer to the user. Wherein, a plurality of question-answer pairs formed by a plurality of questions and answers corresponding to each question are stored in a pre-constructed knowledge base. The higher the accuracy of the answers to the questions in each question-answer pair, the higher the quality of the question-answer pair, and the higher the quality of the answer when the intelligent question-answer system replies according to the question-answer pair.

Therefore, in order to improve the accuracy of the answer of the intelligent question-answer system, the storage amount and the coverage field range of the question-answer pairs in the knowledge base need to be continuously expanded. At present, when a knowledge base is expanded, a mode of multi-person collaboration or document extraction of question-answer pairs is usually adopted, namely, a mode of adding the question-answer pairs to the knowledge base or automatically extracting the question-answer pairs from the document is adopted, but when the question-answer pairs are added to the knowledge base by a plurality of persons at the same time, the influence of subjective factors of the persons is easy, the quality standard of the added question-answer pairs is inconsistent, and the quality of the extracted question-answer pairs cannot be ensured by the mode of automatically extracting the question-answer pairs from the document, so that professional personnel in each field added to the knowledge base are usually required to manually evaluate and screen the question-answer pairs, but the mode of manual evaluation is high in subjectivity and difficult to quantify, low in evaluation efficiency and high in manpower resource cost.

Disclosure of Invention

The embodiment of the application mainly aims to provide a method, a device, a storage medium and equipment for evaluating question-answer pairs, which can evaluate the quality of the question-answer pairs more rapidly and accurately.

The embodiment of the application provides a method for evaluating question-answer pairs, which comprises the following steps:

generating a first evaluation index according to the word segmentation quantity of the question in the question-answer pair to be evaluated; generating a second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated; generating a third evaluation index according to the number of words of the answers in the question-answer pair to be evaluated; the question-answer pair to be evaluated comprises questions and answers; the subject in the question-answer pair to be evaluated is extracted from the question;

And evaluating the first evaluation index, the second evaluation index and the third evaluation index of the question-answer pair to be evaluated by utilizing a pre-constructed question-answer pair evaluation model to obtain the quality of the question-answer pair to be evaluated.

In one possible implementation manner, the generating a first evaluation index according to the number of the word segments of the questions in the question-answer pair to be evaluated includes:

performing word segmentation on the questions in the question-answer pair to be evaluated by using a conditional random field CRF word segmentation model to obtain a first word segmentation result;

Calculating mutual information values between all two adjacent segmentation words; according to the mutual information value, word segmentation is carried out on the first word segmentation result to obtain a second word segmentation result;

and obtaining the number of the word segmentation in the second word segmentation result as a first evaluation index of the question-answer pair to be evaluated.

In a possible implementation manner, the generating a second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated includes:

and obtaining cosine similarity between the subject and the answer in the question-answer pair to be evaluated as a second evaluation index of the question-answer pair to be evaluated.

In one possible implementation, the method further includes:

Acquiring a sample question-answer pair in the field to which the question-answer pair to be evaluated belongs;

Training a pre-constructed initial question-answer pair evaluation model by using the sample question-answer pair to obtain the question-answer pair evaluation model.

In one possible implementation, the method further includes:

generating a first evaluation index according to the word segmentation quantity of the questions in the sample question-answer pair; generating a second evaluation index according to the correlation between the topic and the answer in the sample question-answer pair; generating a third evaluation index according to the number of words of the answers in the sample question-answer pair;

Classifying the first evaluation index, the second evaluation index and the third evaluation index of the sample question-answer pair respectively to obtain classification results corresponding to the evaluation indexes;

And constructing a corresponding decision tree model according to the classification results corresponding to the evaluation indexes, and taking the decision tree model as an initial question-answer pair evaluation model.

In one possible implementation, the method further includes:

Acquiring verification question-answer pairs of the field to which the question-answer pairs to be evaluated belong;

Generating a first evaluation index according to the word segmentation quantity of the question in the verification question-answer pair; generating a second evaluation index according to the correlation between the topic and the answer in the verification question-answer pair; generating a third evaluation index according to the number of words of the answers in the verification question-answer pair;

Inputting the first evaluation index, the second evaluation index and the third evaluation index of the verification question-answer pair into the question-answer pair evaluation model to obtain a quality evaluation result of the verification question-answer pair;

And when the quality evaluation result of the verification question-answer pair is inconsistent with the quality marking result corresponding to the verification question-answer pair, the verification question-answer pair is taken as the sample question-answer pair again, and the question-answer pair evaluation model is updated with parameters.

The embodiment of the application also provides a device for evaluating the question-answer pairs, which comprises:

The first generation unit is used for generating a first evaluation index according to the word segmentation quantity of the question in the question-answer pair to be evaluated; generating a second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated; generating a third evaluation index according to the number of words of the answers in the question-answer pair to be evaluated; the question-answer pair to be evaluated comprises questions and answers; the subject in the question-answer pair to be evaluated is extracted from the question;

and the evaluation unit is used for evaluating the first evaluation index, the second evaluation index and the third evaluation index of the question-answer pair to be evaluated by utilizing a pre-constructed question-answer pair evaluation model to obtain the quality of the question-answer pair to be evaluated.

In one possible implementation manner, the first generating unit includes:

the first word segmentation subunit is used for segmenting the questions in the question-answer pair to be evaluated by utilizing a conditional random field CRF word segmentation model to obtain a first word segmentation result;

The second word segmentation subunit is used for calculating mutual information values between all two adjacent word segments; according to the mutual information value, word segmentation is carried out on the first word segmentation result to obtain a second word segmentation result;

the obtaining subunit is used for obtaining the number of the words in the second word segmentation result and taking the number of the words as a first evaluation index of the question-answer pair to be evaluated.

In one possible implementation manner, the first generating unit is specifically configured to:

In one possible implementation, the apparatus further includes:

the first acquisition unit is used for acquiring the sample question-answer pair in the field to which the question-answer pair to be evaluated belongs;

And the training unit is used for training the pre-constructed initial question-answer pair evaluation model by utilizing the sample question-answer pair to obtain the question-answer pair evaluation model.

In one possible implementation, the apparatus further includes:

the second generation unit is used for generating a first evaluation index according to the word segmentation quantity of the questions in the sample question-answer pair; generating a second evaluation index according to the correlation between the topic and the answer in the sample question-answer pair; generating a third evaluation index according to the number of words of the answers in the sample question-answer pair;

the classification unit is used for classifying the first evaluation index, the second evaluation index and the third evaluation index of the sample question-answer pair respectively to obtain classification results corresponding to the evaluation indexes;

and the construction unit is used for constructing a corresponding decision tree model according to the classification results corresponding to the evaluation indexes and taking the decision tree model as an initial question-answer pair evaluation model.

In one possible implementation, the apparatus further includes:

the second acquisition unit is used for acquiring the verification question-answer pair of the field to which the question-answer pair to be evaluated belongs;

The third generation unit is used for generating a first evaluation index according to the word segmentation quantity of the question in the verification question-answer pair; generating a second evaluation index according to the correlation between the topic and the answer in the verification question-answer pair; generating a third evaluation index according to the number of words of the answers in the verification question-answer pair;

The obtaining unit is used for inputting the first evaluation index, the second evaluation index and the third evaluation index of the verification question-answer pair into the question-answer pair evaluation model to obtain a quality evaluation result of the verification question-answer pair;

And the updating unit is used for re-using the verification question-answer pair as the sample question-answer pair when the quality evaluation result of the verification question-answer pair is inconsistent with the quality marking result corresponding to the verification question-answer pair, and updating parameters of the question-answer pair evaluation model.

From the above technical solutions, the embodiment of the present application has the following advantages:

When the question-answer pair is evaluated, first generating a first evaluation index according to the word segmentation quantity of questions in the question-answer pair to be evaluated, generating a second evaluation index according to the correlation between the subject and the answers in the question-answer pair to be evaluated, and generating a third evaluation index according to the word quantity of the answers in the question-answer pair to be evaluated, wherein the question-answer pair to be evaluated comprises the questions and the answers; the method comprises the steps of extracting the theme of the question-answer pair to be evaluated from the question, and evaluating the first evaluation index, the second evaluation index and the third evaluation index by utilizing a pre-constructed question-answer pair evaluation model to obtain the quality of the question-answer pair to be evaluated.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for evaluating question and answer pairs provided by the application;

FIG. 2 is a schematic flow chart of the construction of the question-answer pair evaluation model provided by the application;

FIG. 3 is a schematic diagram of a decision tree model according to the present application;

Fig. 4 is a block diagram of a question-answer pair evaluation device according to the present application.

Detailed Description

In some question-answer pair evaluation methods, professional staff in various fields are usually required to manually evaluate and screen question-answer pairs in a knowledge base to judge whether the answer in each question-answer pair is accurate for question answer. Taking the insurance field as an example, after a plurality of question-answer pairs in the insurance field are added to a knowledge base by adopting a mode of multi-person collaboration or document extraction, in order to accurately evaluate the quality of the question-answer pairs, the conventional mode needs professional personnel in the insurance field to manually evaluate the question-answer pairs, and similarly, other fields, such as financial and medical fields, need professional personnel in the corresponding field to manually evaluate the question-answer pairs in the respective fields. However, the quality of question and answer pairs is obtained by manual evaluation by professionals in various fields, and the question and answer pairs are easily influenced by artificial subjective factors, so that random deviation of evaluation results is generated. Not only the evaluation efficiency is low and the accuracy is not high, but also a great deal of manpower resources are required to be spent.

In order to solve the above-mentioned drawbacks, an embodiment of the present application provides an evaluation method for a question-answer pair, when evaluating a question-answer pair to be evaluated, first generating a first evaluation index according to the number of words of a question in the question-answer pair to be evaluated, generating a second evaluation index according to the correlation between a subject and an answer in the question-answer pair to be evaluated, and generating a third evaluation index according to the number of words of the answer in the question-answer pair to be evaluated, then evaluating the generated first, second and third evaluation indexes by using a pre-constructed question-answer pair evaluation model to obtain the quality of the question-answer pair to be evaluated.

Further, after the evaluation result of the question-answer pair to be evaluated is obtained, if the evaluation result of the question-answer pair is better, that is, the quality of the question-answer pair is higher, the question-answer pair can be directly added into the knowledge base to expand the knowledge base so as to improve the recovery accuracy of the intelligent question-answer system. However, if the evaluation result of the question-answer pair is poor, that is, the quality of the question-answer pair is low, the question-answer pair can be corrected manually to improve the quality of the question-answer pair, and the corrected question-answer pair is added into the knowledge base to expand the knowledge base so as to improve the response accuracy of the intelligent question-answer system, thereby further improving the user service satisfaction.

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Example 1

Referring to fig. 1, a flow chart of a method for evaluating question-answer pairs according to the present embodiment is provided, and the method includes the following steps:

S101: generating a first evaluation index according to the word segmentation quantity of the question in the question-answer pair to be evaluated; generating a second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated; generating a third evaluation index according to the word number of the answers in the question-answer pair to be evaluated, wherein the question-answer pair to be evaluated comprises questions and answers; the topics in the question-answer pair to be evaluated are extracted from the questions.

In this embodiment, any question-answer pair for which quality evaluation is achieved by using this embodiment is defined as a question-answer pair to be evaluated. And each question-answer pair to be evaluated includes a question and an answer. The question-answer pairs to be evaluated can be added to the knowledge base in various fields by means of multi-person collaboration or document extraction. The question-answer pairs can be used as answer basis of the intelligent question-answer system, namely, after the intelligent question-answer system recognizes the questions posed by the user, the question corresponding to the user question can be found out from a large number of question-answer pairs stored in the knowledge base, and then the answer matched with the question is found out and returned to the user. It can be seen that the quality of the question-answer pair is critical to the answer quality of the intelligent question-answer system.

Therefore, in order to improve the reply quality of the intelligent question-answering system, accurate evaluation of the question-answering quality needs to be achieved. In order to facilitate the recognition of questions in question-answer pairs, the influence of excessive word segmentation irrelevant to the subject is avoided, so that the quality of question-answer pairs is improved, some conceptual questions containing concept words are selected instead of the questions with excessive word segmentation during the question-answer pairs extraction, and meanwhile, the concept definition of the concept words is used as the answer in the question-answer pairs. Therefore, the number of the words contained in the question can be used as an evaluation index for evaluating the quality of the question-answer pair, for example, the smaller the number of the words contained in the question, the higher the quality of the question-answer pair, otherwise, the larger the number of the words contained in the question, the larger the influence of the words contained in the question, and the lower the quality of the question-answer pair. And because the answers to the questions are conceptual definitions of the conceptual questions, the answers with excessive word numbers usually appear in the case of answering the questions, so that the word numbers of the answers can also be used as an evaluation index for evaluating the quality of the question-answer pairs, for example, the smaller the answer word numbers are, the higher the quality of the corresponding question-answer pairs are, and conversely, the larger the answer word numbers are, the lower the quality of the corresponding question-answer pairs are.

In this embodiment, in order to evaluate the quality of question-answer pairs more quickly and accurately, the influence of subjectivity of manual evaluation is eliminated. Firstly, generating a first evaluation index according to the word segmentation quantity of the question in the question-answer pair to be evaluated; generating a second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated; and generating a third evaluation index according to the number of words of the answers in the question-answer pair to be evaluated, and then, executing a subsequent step S102 by using the three evaluation indexes to obtain the quality of the question-answer pair to be evaluated.

Next, specific generation processes of the first evaluation index, the second evaluation index, and the third evaluation index are sequentially described.

In this embodiment, an optional implementation manner, the specific implementation process of "generating the first evaluation index according to the number of word segments of the question in the question-answer pair to be evaluated" in step S101 may include the following steps A1-A3:

step A1: and performing word segmentation on the questions in the question-answer pair to be evaluated by using a conditional random field CRF word segmentation model to obtain a first word segmentation result.

In this implementation manner, in order to generate the first evaluation index, a general word segmentation method with a better word segmentation effect in the current chinese word segmentation field needs to be selected first, for example, a conditional random field (Conditional Random Field, abbreviated as CRF) word segmentation model may be used to segment a problem in a question-answer pair to be evaluated, so as to obtain each word segment included in the problem, and the word segmentation result is defined as the first word segmentation result. The specific CRF word segmentation calculation formula is as follows:

wherein, Representing the normalization parameters; t represents a transfer feature; t _k denotes the kth transfer feature function; s represents a state feature; s _l denotes the first state feature function; lambda _k and mu _k respectively represent weights corresponding to the features t _k and s _l, and specific values can be set according to actual conditions and empirical values; x represents the question in the question-answer pair to be evaluated; y represents a first word segmentation result; y _i denotes the ith segmentation in the first segmentation result.

Illustrating: assuming that the question x in the question-answer pair to be evaluated is "how today is weather", the first word segmentation result Y obtained by using the above formula (1) to segment the question x is: "today", "weather", "how".

It can be seen that, the essence of the CRF word segmentation model adopted in this embodiment is to find a word segmentation result sequence most likely to occur (i.e., with the highest probability) according to the problem in the question-answer pair to be evaluated, and the specific implementation process is consistent with the existing method, and will not be described in detail herein.

Step A2: calculating mutual information values between all two adjacent segmentation words; and according to the mutual information value, the first word segmentation result is segmented to obtain a second word segmentation result.

In the implementation manner, because the questions in the question-answer pair to be evaluated are segmented through the step A1, when a first segmentation result is obtained, a CRF segmentation model generated based on general corpus training is adopted, and the segmentation model has a good segmentation effect in the general field, but has a poor segmentation effect on the questions of the question-answer pair containing the professional terms in the professional field. For example, in the case of a problem in the medical field that a specific disease name and drug name are included, if word segmentation is performed only by using the CRF word segmentation model, medical professional terms such as the disease name and drug name included therein cannot be accurately identified.

Illustrating: assuming that the problem in the question-answer pair to be evaluated is what action is caused by the paracetamol, after the CRF word segmentation model is utilized to segment the problem of what action is caused by the paracetamol, the first word segmentation result is obtained as follows: "puff", "heat", "breath", "pain", "have", "what", "effect". It can be seen that, the paracetamol is taken as a specific medicine name and is not divided into a word segmentation, so that the problem in question-answer pairs in each field is segmented only by using the CRF word segmentation model, and an accurate word segmentation result cannot be obtained.

Based on the method, after the first word segmentation result is obtained by segmenting the questions in the question-answer pair to be evaluated by using the CRF word segmentation model, in order to further improve the accuracy of the word segmentation result, mutual information values between all two adjacent words are calculated in sequence from the first word segmentation in the first word segmentation result so as to judge the tightness of the word relationship between the two adjacent words, and further the word segmentation can be carried out on the first word segmentation result again according to the obtained mutual information values, and the re-segmented result is used as a second word segmentation result. The formula for calculating the mutual information value between two adjacent segmentation words is as follows:

Wherein A and B represent two adjacent word segments in the first word segment result; i (A, B) represents the mutual information value between the word A and the word B; p (A, B) represents the probability that the word segmentation A and the word segmentation B occur simultaneously in a question-answer pair of a pre-constructed knowledge base; p (A) represents the probability that the word segmentation A appears in question-answer pairs of a pre-constructed knowledge base; p (B) represents the probability that the word segment B appears in question-answer pairs of the pre-constructed knowledge base.

After statistical analysis of question-answer pairs in a pre-built knowledge base, the above formula (2) can be expressed as the following formula (3):

Wherein n (A, B) represents the number of times of co-occurrence of the segmentation word A and the segmentation word B in the question-answer of the pre-constructed knowledge base; n (A) represents the number of occurrences of the segmentation word A in the question-answer pair of the knowledge base constructed in advance; n (B) represents the number of occurrences of the segmentation word B in question-answers of the knowledge base constructed in advance; n represents the total number of question-answer pairs in the pre-constructed knowledge base.

It should be noted that, the larger the mutual information value between two words is, the higher the tightness of the word relationship between the two words is, and further the probability that the two words form an independent word is further described, otherwise, the smaller the mutual information value between two words is, the lower the tightness of the word relationship between the two words is, and further the probability that the two words form an independent word is further described.

Specifically, assuming that the mutual information value I (a, B) between the adjacent word segment a and the word segment B calculated by the above formula (3) exceeds a preset threshold, it indicates that the word relationship between the word segment a and the word segment B is higher in tightness, and the two may form an independent word segment. The preset threshold value refers to a critical value for distinguishing whether two adjacent words can form an independent word, if the mutual information value between the two adjacent words is smaller than the critical value, the degree of tightness of the word relation between the two adjacent words is lower, the two words are still two independent words, and cannot form an independent word, but if the mutual information value between the two adjacent words is not smaller than the critical value, the degree of tightness of the word relation between the two adjacent words is higher, and an independent word can be formed to represent a specific meaning. It should be noted that, the value of the preset threshold may be set according to the actual situation, which is not limited in the embodiment of the present application, for example, the preset threshold may be set to 10.

Illustrating: based on the above example, after the problem "what effect paracetamol has" is segmented using the CRF segmentation model, the first segmentation result is: "puff", "heat", "breath", "pain", "have", "what", "effect". And the total number of question-answer pairs in a pre-constructed knowledge base is 1000, wherein the number of occurrence times of 'flutter' is 32, the number of occurrence times of 'heat' is 40, and the preset threshold value is 10. The mutual information value between "puff" and "heat" can be calculated to be 15.81, that is,Exceeding the preset threshold 10 indicates that the "puff" and the "heat" can constitute an independent word of "puff heat", and then, if the number of occurrences of "information" in the pre-constructed knowledge base is 36, the mutual information value between "puff heat" and "information" can be calculated to be 16.67 again using the above formula (3), that is,When the preset threshold 10 is exceeded, it is indicated that the "flapping heat" and the "information" can form an independent word of "flapping heat information", and similarly, it is indicated that the mutual information value between the "flapping heat" and the "pain" still exceeds the preset threshold 10, it is indicated that the "flapping heat" and the "pain" can form an independent word of "paracetamol", and further, it is indicated that the mutual information value between the "paracetamol" and the "having" does not exceed the preset threshold 10, it is indicated that the "paracetamol" and the "having" are two mutually independent words, and cannot form an independent word. Similarly, the same manner can be used to determine what and action are mutually opposite words. Therefore, by combining the calculated mutual information values, the first word segmentation result is subjected to word segmentation again, and a second word segmentation result is obtained as follows: "paracetamol", "there", "what", "action".

It can be seen that the word segmentation accuracy of the second word segmentation result is higher than that of the first word segmentation result.

It should be noted that, in order to improve the word segmentation accuracy, a threshold value of the number of characters of a word segment may be preset according to the attribute of a specific term in each field, for example, in the medical field, since the number of characters included in a single word segment of a specific term such as a disease name, a drug name, etc. in the medical field generally does not exceed 6, in the medical field, the threshold value of the number of characters of a single word segment may be set to 6, and when the mutual information of adjacent word segments in the first word segment result is calculated by using the above formula (3), and the word segments with the mutual information value exceeding the preset threshold value are combined into an independent word segment, it is ensured that the number of characters included in the independent word segment is not greater than 6, that is, when the number of characters of an independent word segment reaches 6, the mutual information value between the independent word segment and the following word segment is not required to be calculated. It should be noted that, the value of the number threshold of characters of a word segment preset in each field may be set according to actual situations, for example, the number threshold of characters of a single word segment may be set to 6 for the medical field, but may be set to other values for other fields, which is not limited by the embodiment of the present application.

Step A3: and obtaining the number of the segmented words in the second segmented word result as a first evaluation index of the question-answer pair to be evaluated.

In this implementation manner, after obtaining the second word segmentation result corresponding to the question in the question-answer pair to be evaluated through the step A2, the number of the words in the second word segmentation result may be further counted, and the number of the words is used as the first evaluation index of the question-answer pair to be evaluated.

Illustrating: based on the above example, the second word result of the question to be evaluated, namely, what effect the question "paracetamol has" is obtained through the above step A2 is: "paracetamol", "there", "what", "action". And then the number of the words included in the second word segmentation result is calculated to be 4, and the number of the words can be used as a first evaluation index of the question-answer pair to be evaluated.

In this embodiment, an optional implementation manner, the specific implementation process of "generating the second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated" in step S101 may include: and obtaining cosine similarity between the subject and the answer in the question-answer pair to be evaluated as a second evaluation index of the question-answer pair to be evaluated.

In this implementation, in order to generate the second evaluation index, first, a theme characterizing the question-answer pair core content needs to be extracted from the questions of the question-answer pair to be evaluated, for example, for what effect the question "paracetamol has", the theme characterizing the question-answer pair core content extracted therefrom is "paracetamol". Then, words with higher importance degree can be found out from the answers, for example, words with higher weight value (i.e. higher importance degree) in the answers can be found out by using word frequency (TF) and reverse document frequency (Inverse Document Frequency, IDF) of each word in the answers, cosine similarity between the words and the topics is calculated and used as a second evaluation index of the question-answer pair to be evaluated, and the correlation between the topics and the answers in the question-answer pair to be evaluated is represented.

And if the calculated cosine similarity value is smaller, the correlation between the subject and the answer in the question-answer pair to be evaluated is lower. It should be noted that, the specific manner of calculating the cosine similarity is consistent with the existing method, and will not be described herein.

In this embodiment, an optional implementation manner, the specific implementation process of "generating the third evaluation index according to the number of words of the answer in the question-answer pair to be evaluated" in step S101 may include: and counting the number of the answers in the question-answer pair to be evaluated, and taking the number as a third evaluation index of the question-answer pair to be evaluated.

Illustrating: the answer in the question-answer pair to be evaluated is assumed to be "paracetamol for the treatment of cold fever". The number of words of the answer in the question-answer pair to be evaluated (i.e. the third evaluation index of the question-answer pair to be evaluated) may be counted as 12.

S102: and evaluating the first evaluation index, the second evaluation index and the third evaluation index of the question-answer pair to be evaluated by utilizing a pre-constructed question-answer pair evaluation model to obtain the quality of the question-answer pair to be evaluated.

In this embodiment, after the first evaluation index, the second evaluation index, and the third evaluation index of the question-answer pair to be evaluated are generated in step S101, data processing may be further performed on these evaluation indexes, and the quality of the question-answer pair to be evaluated may be determined according to the processing result. Specifically, the first evaluation index, the second evaluation index and the third evaluation index of the question-answer pair to be evaluated can be used as input data to be input into a pre-constructed question-answer pair evaluation model so as to obtain the quality of the question-answer pair to be evaluated. It should be noted that, in order to implement the step S102, a question-answer pair evaluation model needs to be built in advance, and a specific building process may be referred to the related description of the second embodiment.

Specifically, after the first, second and third evaluation indexes of the question-answer pair to be evaluated are generated in step S101, the first, second and third evaluation indexes may be input into the entry in the question-answer pair evaluation model, and the evaluation score value in one interval [0,100] may be output by using the exit of the question-answer pair evaluation model to characterize the quality of the question-answer pair to be evaluated. For example, an evaluation score value of 90 minutes may be output, indicating a higher quality of question-answer pairs to be evaluated.

Or an evaluation threshold value of the evaluation score can be preset to distinguish the critical value of the quality of the question-answer pair to be evaluated, and if the output evaluation score value is larger than the critical value, the quality of the corresponding question-answer pair to be evaluated is high; if the output evaluation score value is not greater than the critical value, the quality of the corresponding question-answer pair to be evaluated is low. The evaluation threshold value may be set according to the actual situation, and the embodiment of the present application is not limited to this, and for example, the evaluation threshold value may be set to 75 minutes.

It should be further noted that, in one possible implementation manner of the embodiment of the present application, in order to facilitate problem identification, avoid the influence of too many word segmentation related to the subject, so as to improve the quality of the question-answer pair, in the question-answer pair, some conceptual questions are generally included instead of too many word segmentation, and if the first evaluation index of the question-answer pair to be evaluated is generated through the steps A1-A3, it is found that the index is too large, that is, the number of word segmentation in the question is too large and exceeds the preset word segmentation threshold, then the quality of the question-answer pair to be evaluated can be directly determined to be too low without adopting a subsequent step to perform quality evaluation. The word segmentation number threshold may be set according to practical situations, and the embodiment of the present application is not limited to this, and may be set to 10, for example.

In summary, in the evaluation method of question-answer pairs provided in this embodiment, when a question-answer pair to be evaluated is evaluated, first, a first evaluation index is generated according to the number of word segments of a question in the question-answer pair to be evaluated, a second evaluation index is generated according to the correlation between a subject and an answer in the question-answer pair to be evaluated, and a third evaluation index is generated according to the number of words of an answer in the question-answer pair to be evaluated, where the question-answer pair to be evaluated includes the question and the answer; the method comprises the steps of extracting the theme of the question-answer pair to be evaluated from the question, and evaluating the first evaluation index, the second evaluation index and the third evaluation index by utilizing a pre-constructed question-answer pair evaluation model to obtain the quality of the question-answer pair to be evaluated.

Example two

This embodiment will describe a specific construction process of the question-answer pair evaluation model mentioned in embodiment one. By using the pre-constructed question-answer pair evaluation model, the quality of the question-answer pair can be evaluated more rapidly and accurately.

Referring to fig. 2, a schematic flow chart of constructing a question-answer pair evaluation model provided in this embodiment is shown, and the flow chart includes the following steps:

S201: and obtaining a sample question-answer pair in the field of question-answer pairs to be evaluated.

In this embodiment, in order to construct the question-answer pair evaluation model, a large amount of preparation work needs to be performed in advance, first, sample question-answer pairs of the to-be-evaluated question-answer pair belonging to the field need to be collected, for example, if the to-be-evaluated question-answer pair belongs to the medical field, 1000 question-answer pairs of the medical field can be collected in advance, each question-answer pair collected can be used as a sample question-answer pair, and evaluation scores or grades of the sample question-answer pairs are marked manually by professionals in the medical field in advance to indicate that the actual quality of the sample question-answer pairs is high or low, and further, the obtained sample question-answer pairs can be beneficial to training the question-answer pair evaluation model.

S202: training a pre-constructed initial question-answer pair evaluation model by using the sample question-answer pair to obtain a question-answer pair evaluation model.

In this embodiment, after obtaining the sample question-answer pairs in the field of question-answer pairs to be evaluated through step S201, the sample question-answer pairs may be further used as training data to train to obtain a question-answer pair evaluation model.

Specifically, after each sample question-answer pair is obtained, a method similar to the first evaluation index, the second evaluation index and the third evaluation index for generating the question-answer pair to be evaluated in step S101 of the embodiment may be adopted, and the first evaluation index, the second evaluation index and the third evaluation index of each sample question-answer pair may be generated by replacing the question-answer pair to be evaluated with the sample question-answer pair, and the description of the related point will be omitted herein. Furthermore, the first evaluation index, the second evaluation index and the third evaluation index of the sample question-answer pair can be utilized to train a pre-constructed initial question-answer pair evaluation model, and relevant model parameters in the initial question-answer pair evaluation model are adjusted to obtain a question-answer pair evaluation model.

Next, the embodiment of the present application describes how to construct an initial question-answer pair evaluation model by the following steps B1 to B3:

Step B1: generating a first evaluation index according to the word segmentation quantity of the questions in the sample question-answering pair; generating a second evaluation index according to the correlation between the subject and the answer in the sample question-answer pair; and generating a third evaluation index according to the number of words of the answers in the sample question-answer pair.

In this embodiment, in order to construct an initial question-answer pair evaluation model, to train and generate a question-answer pair evaluation model, and improve quality evaluation efficiency and accuracy of question-answer pairs, an optional implementation manner may randomly select a part of sample question-answer pairs from among a plurality of sample question-answer pairs after obtaining the sample question-answer pairs as initial training data, to construct the initial question-answer pair evaluation model. Specifically, a method similar to the first evaluation index, the second evaluation index, and the third evaluation index for generating the question-answer pair in step S101 of the embodiment may be adopted, and the question-answer pair to be evaluated is replaced by the part of sample question-answer pair, that is, the first evaluation index may be generated according to the number of the divided words of the questions in the part of sample question-answer pair, the second evaluation index may be generated according to the correlation between the subject and the answers in the part of sample question-answer pair, and the third evaluation index may be generated according to the number of the words of the answers in the part of sample question-answer pair, which will not be described in detail herein.

Step B2: and respectively classifying the first evaluation index, the second evaluation index and the third evaluation index of the sample question-answer pair to obtain classification results corresponding to the evaluation indexes.

In this embodiment, after the first, second and third evaluation indexes of the sample question-answer pair are generated in step B1, the first, second and third evaluation indexes may be further classified, so as to obtain classification results corresponding to the respective evaluation indexes. For example, the first evaluation index may be classified according to the number of words of the question, the second evaluation index may be classified according to the magnitude of the cosine similarity value between the subject and the answer, the third evaluation index may be classified according to the number of words of the answer, and so on.

Illustrating: assuming that after data processing is performed on a part of randomly selected sample question-answer pairs, the first evaluation index of each sample question-answer pair is: the number of the word segmentation in the second word segmentation result corresponding to the problem is 1, 2, 3 and 4 respectively. The second evaluation index of each sample question-answer pair is: cosine similarity values between the subject and the answer respectively belong to the following four ranges: 0.2 or less, 0.2 to 0.5, 0.5 to 0.8 and 0.8 to 1. The third evaluation index of each sample question-answer pair is: the number of words of the answer respectively belongs to the following four ranges: 100 words or less, 100 words to 300 words, 300 words to 500 words or more, 500 words or more.

The first, second, and third evaluation indexes may be further classified into four categories, respectively. The four classification results of the first evaluation index are respectively as follows: the word segmentation number 1, the word segmentation number 2, the word segmentation number 3 and the word segmentation number 4. The four classification results of the second evaluation index are respectively: correlation less than 0.2, correlation 0.2-0.5, correlation 0.5-0.8 and correlation 0.8-1. The four classification results of the third evaluation index are respectively: 100 words or less, 100-300 words or 300-500 words or more.

Step B3: and constructing a corresponding decision tree model according to the classification result corresponding to each evaluation index, and taking the decision tree model as an initial question-answer pair evaluation model.

In this embodiment, the first, second and third evaluation indexes of the sample question-answer pair are respectively classified by the step B2, so that after classification results corresponding to the respective evaluation indexes are obtained, a corresponding decision tree model can be further constructed according to the classification results corresponding to the respective evaluation indexes, and used as an initial question-answer pair evaluation model.

Illustrating: based on the above example, after four classification results of the first evaluation index, the second evaluation index, and the third evaluation index of a part of the sample question-answer pairs selected at random are obtained, a decision tree model including an entrance and an exit decision tree model may be constructed based on the classification results, and parameters of the decision tree model may be initialized to be used as an initial question-answer pair evaluation model, as shown in fig. 3.

It will be appreciated that the initial questions and answers are not unique to the network structure of the evaluation model in the present application, and that the structure of the decision tree model shown in fig. 3 is only an example, and other network structures may be adopted. Along with the difference of the classification results of the first evaluation index, the second evaluation index and the third evaluation index of the sample question-answer pair, the network structure of the constructed decision tree model is also different, and specific structural parameters can be initialized according to actual conditions, so that the embodiment of the application is not limited.

After the initial question-answer pair evaluation model is constructed through the steps B1-B3, one sample question-answer pair may be sequentially extracted from the plurality of sample question-answer pairs obtained through the step S201, and multiple rounds of model training may be performed until the training end condition is satisfied, at which time, a question-answer pair evaluation model is generated.

Specifically, when the training is performed in this round, the question-answer pair to be evaluated in the first embodiment is replaced by the sample question-answer pair extracted in this round, and the quality of the question-answer pair can be obtained according to the execution process in the first embodiment through the current initial question-answer pair evaluation model. Specifically, according to steps S101 to S102 in the first embodiment, after the first evaluation index, the second evaluation index, and the third evaluation index of the sample question-answer pair are generated, the evaluation score value in one section [0,100] may be output to the evaluation model through the initial question-answer pair. And then, comparing the evaluation score value with a corresponding manually marked evaluation score, updating the model parameters according to the difference of the evaluation score value and the manually marked evaluation score until a preset condition is met, and stopping updating the model parameters if the variation amplitude of the difference is small, completing the training of the question-answer pair evaluation model and generating a trained question-answer pair evaluation model.

Through the embodiment, the question-answer pair evaluation model can be generated by using sample question-answer pair training, and further, the generated question-answer pair evaluation model can be verified by using verification question-answer pair. The specific verification process may include the following steps C1-C4:

step C1: and obtaining a verification question-answer pair in the field to which the question-answer pair to be evaluated belongs.

In this embodiment, in order to implement verification of the question-answer pair evaluation model, first, a verification question-answer pair in the field to which the question-answer pair to be evaluated belongs needs to be obtained, where the verification question-answer pair refers to a question-answer pair that can be used to verify the question-answer pair evaluation model, and after the verification question-answer pair is obtained, the subsequent step C2 may be continuously executed.

Step C2: generating a first evaluation index according to the word segmentation quantity of the questions in the verification question-answer pair; generating a second evaluation index according to the correlation between the subject and the answer in the verification question-answer pair; and generating a third evaluation index according to the number of words of the answers in the verification question-answer pair.

C1, after obtaining a verification question-answer pair, the method cannot be directly used for verifying a question-answer pair evaluation model, but needs to generate a first evaluation index according to the word segmentation quantity of the question in the verification question-answer pair; generating a second evaluation index according to the correlation between the subject and the answer in the verification question-answer pair; and generating a third evaluation index according to the number of words of the answers in the verification question-answer pair, and further verifying to obtain a question-answer pair evaluation model by using the generated first evaluation index, second evaluation index and third evaluation index of the verification question-answer pair.

Step C3: and inputting the first evaluation index, the second evaluation index and the third evaluation index of the verification question-answer pair into a question-answer pair evaluation model to obtain a quality evaluation result of the verification question-answer pair.

After the first evaluation index, the second evaluation index and the third evaluation index of the verification question-answer pair are generated in the step C2, the first evaluation index, the second evaluation index and the third evaluation index of the verification question-answer pair can be further input into a question-answer pair evaluation model to obtain a quality evaluation result of the verification question-answer pair, and further the subsequent step C4 can be continuously executed.

Step C4: and when the quality evaluation result of the verification question-answer pair is inconsistent with the quality marking result corresponding to the verification question-answer pair, the verification question-answer pair is repeatedly used as a sample question-answer pair, and the parameter updating is carried out on the question-answer pair evaluation model.

And C3, after obtaining the quality evaluation result of the verification question-answer pair, if the quality evaluation result of the verification question-answer pair is inconsistent with the corresponding manual labeling result of the verification question-answer pair, the verification question-answer pair can be reused as a sample question-answer pair, and the parameter update is carried out on the question-answer pair evaluation model.

Through the embodiment, the question-answer pair evaluation model can be effectively verified by utilizing the verification question-answer pair, and when the quality evaluation result of the verification question-answer pair is inconsistent with the manual labeling result corresponding to the verification question-answer pair, the question-answer pair evaluation model can be timely adjusted and updated, so that the evaluation precision and accuracy of the evaluation model are improved.

In summary, the question-answer pair evaluation model trained by the embodiment can rapidly and accurately evaluate the quality of the question-answer pair to be evaluated by using the first evaluation index, the second evaluation index and the third evaluation index of the question-answer pair to be evaluated, thereby effectively improving the efficiency and accuracy of evaluating the quality of the question-answer pair to be evaluated and avoiding the waste of human resources.

Example III

The present embodiment will be described with respect to an evaluation device for question-answer pairs, and reference will be made to the above-described method embodiments for relevant content.

Referring to fig. 4, a block diagram of a device for evaluating question-answer pairs according to the present embodiment is provided, the device including:

A first generating unit 401, configured to generate a first evaluation index according to the number of word segments of the question in the question-answer pair to be evaluated; generating a second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated; generating a third evaluation index according to the number of words of the answers in the question-answer pair to be evaluated; the question-answer pair to be evaluated comprises questions and answers; the subject in the question-answer pair to be evaluated is extracted from the question;

and the evaluation unit 402 is configured to evaluate the first evaluation index, the second evaluation index, and the third evaluation index of the question-answer pair to be evaluated by using a pre-constructed question-answer pair evaluation model, so as to obtain the quality of the question-answer pair to be evaluated.

In one possible implementation manner, the first generating unit 401 includes:

In one possible implementation manner, the first generating unit 401 is specifically configured to:

In one possible implementation, the apparatus further includes:

When the question-answer pair to be evaluated is evaluated, first, generating a first evaluation index according to the word segmentation quantity of questions in the question-answer pair to be evaluated, generating a second evaluation index according to the correlation between the subject and the answers in the question-answer pair to be evaluated, and generating a third evaluation index according to the word quantity of the answers in the question-answer pair to be evaluated, wherein the question-answer pair to be evaluated comprises the questions and the answers; the method comprises the steps of extracting the theme of the question-answer pair to be evaluated from the question, and evaluating the first evaluation index, the second evaluation index and the third evaluation index by utilizing a pre-constructed question-answer pair evaluation model to obtain the quality of the question-answer pair to be evaluated.

In addition, the embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium is stored with instructions, and when the instructions run on the terminal equipment, the terminal equipment is enabled to execute the question-answer pair evaluation method.

The embodiment of the application also provides a question and answer pair evaluation device, which comprises: the computer program comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the question-answer pair evaluation method when executing the computer program.

The embodiment of the application also provides a computer program product, which enables the terminal equipment to execute the question-answer pair evaluation method when running on the terminal equipment.

In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method for evaluating question-answer pairs, comprising:

Generating a first evaluation index according to the word segmentation quantity of the question in the question-answer pair to be evaluated; generating a second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated; generating a third evaluation index according to the number of words of the answers in the question-answer pair to be evaluated; the question-answer pair to be evaluated comprises questions and answers; the questions belong to conceptual questions, and the quality of the question-answer pairs to be evaluated is inversely related to the word segmentation quantity of the questions; the answer is used for representing the concept paraphrasing of the concept words in the question; the quality of the question-answer pair to be evaluated is inversely related to the word number of the answer; the subject in the question-answer pair to be evaluated is extracted from the question; the topic includes part of the content of the question; the theme is used for representing the core content of the question-answer pair to be evaluated; the second evaluation index is determined according to cosine similarity between partial words in the answer and the subject, and the importance degree of the partial words is higher than that of other words except the partial words in the answer;

2. The method according to claim 1, wherein the generating a first evaluation index according to the number of word segments of the question in the question-answer pair to be evaluated includes:

3. The method of claim 1, wherein the generating a second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated comprises:

4. A method according to any one of claims 1 to 3, further comprising:

5. The method according to claim 4, wherein the method further comprises:

6. The method according to claim 4, wherein the method further comprises:

7. An apparatus for evaluating a question-answer pair, the apparatus comprising:

The first generation unit is used for generating a first evaluation index according to the word segmentation quantity of the question in the question-answer pair to be evaluated; generating a second evaluation index according to the correlation between the subject and the answer in the question-answer pair to be evaluated; generating a third evaluation index according to the number of words of the answers in the question-answer pair to be evaluated; the question-answer pair to be evaluated comprises questions and answers; the questions belong to conceptual questions, and the quality of the question-answer pairs to be evaluated is inversely related to the word segmentation quantity of the questions; the answer is used for representing the concept paraphrasing of the concept words in the question; the quality of the question-answer pair to be evaluated is inversely related to the word number of the answer; the subject in the question-answer pair to be evaluated is extracted from the question; the topic includes part of the content of the question; the theme is used for representing the core content of the question-answer pair to be evaluated; the second evaluation index is determined according to cosine similarity between partial words in the answer and the subject, and the importance degree of the partial words is higher than that of other words except the partial words in the answer;

8. The apparatus of claim 7, wherein the first generation unit comprises:

9. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein instructions, which when run on a terminal device, cause the terminal device to perform the question-answer pair evaluation method according to any one of claims 1-6.

10. An apparatus for evaluating question-answer pairs, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method of evaluating question-answer pairs according to any one of claims 1-6 when the computer program is executed.