CN114385798A

CN114385798A - Question-answering method, system, equipment and medium based on active learning

Info

Publication number: CN114385798A
Application number: CN202111541745.5A
Authority: CN
Inventors: 冯耀; 王椭; 朱祥; 熊赏; 陈娜; 陆恒宇; 罗浩昇; 赵权有
Original assignee: Shanghai Pudong Development Bank Co Ltd
Current assignee: Shanghai Pudong Development Bank Co Ltd
Priority date: 2021-12-16
Filing date: 2021-12-16
Publication date: 2022-04-22

Abstract

The invention relates to a question-answering method, a question-answering system, a question-answering device and a question-answering medium based on active learning, wherein the method comprises the following steps: constructing a basic knowledge base; the method comprises the steps that a client question is answered based on a similarity detection model, and an unanswered question set is constructed, wherein the unanswered question set comprises user questions which are fed back by a user and are unanswered in question; classifying the problems in the unanswered problem set; acquiring expert correction information, and performing category correction on the unanswered question sets classified into categories on the basis of the expert correction information; carrying out optimization training on the similarity detection model based on the class correction result; and acquiring corresponding answers of each category in the unanswered question set, and updating the questions and the corresponding answers to the basic knowledge base. Compared with the prior art, the method has the advantages of more accurate question answering, reduced labor cost, improved labeling efficiency and the like.

Description

Question-answering method, system, equipment and medium based on active learning

Technical Field

The invention relates to the field of intelligent question answering, in particular to a question answering method, a question answering system, question answering equipment and a question answering medium based on active learning.

Background

The chat robot can realize the chat with the user, and with the emphasis of enterprises on customer service, the chat robot gradually develops from the entertainment field to the fields of customer service and the like.

Since the chat robot cannot answer technical questions by itself, it is now common practice to collect many question-answer pairs consisting of "question-answer" and construct a knowledge base to provide support. When the user asks questions, the chat robot searches the most relevant questions in the knowledge base, extracts answers and replies.

The sophistication of the knowledge base determines the user's experience, and no matter how sophisticated the search technique and the similarity matching technique is, if the relevant questions are not stored in the knowledge base, the chat robot cannot answer the questions of the user. In order to perfect the knowledge base, questions which cannot be answered by the question-answering system can be collected regularly, the questions are given to experts for labeling, and corresponding question-answering pairs are supplemented to the knowledge base. However, because the manpower of the experts is limited, the magnitude of the unanswered questions is large, and the experts are difficult to mark all the questions in time and cannot find high-frequency questions from numerous candidates for preferential marking. When multiple experts label simultaneously, also can produce repeated label because of similar problem, extravagant manpower.

Disclosure of Invention

The present invention is directed to a method, system, device and medium for question answering based on active learning to overcome the above-mentioned drawbacks of the prior art.

The purpose of the invention can be realized by the following technical scheme:

a question-answering method based on active learning comprises the following steps:

s1: constructing a basic knowledge base, wherein the basic knowledge base comprises known questions and answers corresponding to the known questions;

s2: the method comprises the steps that a client question is answered based on a similarity detection model, and an unanswered question set is constructed, wherein the unanswered question set comprises user questions which are fed back by a user and are unanswered in question;

s3: classifying the problems in the unanswered problem set;

s4: acquiring expert correction information, and performing category correction on the unanswered question sets classified into categories on the basis of the expert correction information;

s5: carrying out optimization training on the similarity detection model based on the class correction result;

s6: and acquiring corresponding answers of each category in the unanswered question set, and updating the questions and the corresponding answers to the basic knowledge base.

Preferably, the step S2 specifically includes:

s21: establishing a text similarity detection model;

s22: acquiring a question to be answered by a user and searching an answer of a known question with the highest similarity in a basic knowledge base through a text similarity detection model to serve as an answer output;

s23: and constructing an unanswered question set, acquiring user feedback, and storing the corresponding question into the unanswered question set if the user feedback indicates that the question is unanswered.

Preferably, the step S3 specifically includes:

s31: selecting a first question from the unanswered question set as a reference question of a first category;

s32: selecting the problems to be classified and the reference problems of all the categories from the unanswered question set to carry out similarity calculation one by one, if the similarity between the reference problems of the categories and the problems to be classified in all the categories is greater than a similarity threshold value, judging the problems to be classified as the categories, and if not, taking the problems to be classified as the reference problems of a new category;

s33: step S32 is repeated until all questions of the unanswered question set have completed category classification.

The step S3 further includes: step S34: the categories are sorted by the number of questions in each category.

Preferably, the expert correction information comprises a type merging instruction, a type splitting instruction and a question moving instruction,

the type merging instruction is used for merging two types in the unanswered question set and corresponding questions into one type;

the type splitting instruction is used for dividing one type and question in the unanswered question set into two types and corresponding questions;

the question moving instruction is to move one or more questions in one category to another category.

Preferably, the similarity detection model is a semantic similarity model based on a BERT model.

A question-answering system based on active learning comprises a basic knowledge base module, a question-answering module, a category dividing module, a correcting module and an answer supplementing module,

the basic knowledge base module is used for constructing a basic knowledge base, and the basic knowledge base comprises known questions and answers corresponding to the known questions;

the question-answering module is used for carrying out customer question answering based on the similarity detection model and constructing an unanswered question set, wherein the unanswered question set comprises user questions which are fed back by the user and are unanswered questions;

the category classification module is used for performing category classification on the questions in the unanswered question set to obtain expert correction information, performing category correction on the unanswered question set with the classified categories based on the expert correction information, and performing optimization training on the similarity detection model based on the category correction result;

the answer supplementing module is used for acquiring corresponding answers of each category in the unanswered question set and updating the questions and the corresponding answers to the basic knowledge base.

Preferably, the specific steps of the question-answering module for constructing the question set of the user to be answered and the question set of the user to be answered include:

establishing a text similarity detection model;

acquiring a question to be answered by a user and searching an answer of a known question with the highest similarity in a basic knowledge base through a text similarity detection model to serve as an answer output;

and constructing an unanswered question set, acquiring user feedback, and storing the corresponding question into the unanswered question set if the user feedback indicates that the question is unanswered.

Preferably, the specific step of classifying the categories by the category classification module includes:

the method comprises the following steps: selecting a first question from the unanswered question set as a reference question of a first category;

step two: selecting the problems to be classified and the reference problems of all the categories from the unanswered question set to carry out similarity calculation one by one, if the similarity between the reference problems of the categories and the problems to be classified in all the categories is greater than a similarity threshold value, judging the problems to be classified as the categories, and if not, taking the problems to be classified as the reference problems of a new category;

step three: and repeating the third step until all the questions in the unanswered question set are classified into categories.

An active learning-based question answering device comprises a memory, a processor and a computer program stored on the memory, wherein the processor realizes the steps of the active learning-based question answering method when executing the computer program.

A non-transitory computer storage medium storing an executable program that is executed by a processor to implement the steps of the active learning-based question-answering method described above.

Compared with the prior art, the invention has the following advantages:

(1) according to the method, the similarity detection model is used for obtaining the customer questions and searching and answering the customer questions in the knowledge base, the unanswered questions are recorded and classified, labeled and corrected by experts according to the preset rules, the similarity detection model is further trained to form an active learning mechanism, the precision and reliability of question answering are effectively improved, the basic knowledge base can be expanded continuously based on the collected unanswered questions, the question answering amount is improved, and the user experience is further improved.

(2) The invention adopts the classification model and the rules to classify the unanswered questions, and the experts correct and answer the questions, thereby effectively liberating the labor cost, avoiding the repeated marking and improving the work efficiency.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments. Note that the following description of the embodiments is merely a substantial example, and the present invention is not intended to be limited to the application or the use thereof, and is not limited to the following embodiments.

Examples

A question-answering method based on active learning, as shown in fig. 1, includes the following steps:

s1: and constructing a basic knowledge base, wherein the basic knowledge base comprises known questions and answers corresponding to the known questions.

S2: based on the similarity detection model, carrying out customer question answering and constructing an unanswered question set, wherein the unanswered question set comprises user questions which are fed back by users and are unanswered by questions

Step S2 specifically includes:

s21: and establishing a text similarity detection model. The step uses an STS (text similarity detection) model, specifically uses a semantic similarity model based on a BERT model, and outputs the question-answer pair with the highest similarity as the most relevant question.

S22: and acquiring the question to be answered by the user, and searching the answer of the known question with the highest similarity in the basic knowledge base through a text similarity detection model to output as an answer. In this embodiment, the question to be answered by the user is obtained by a voice obtaining device such as a microphone arranged on the question answering device and converted by a voice recognition algorithm or obtained by a text input interface.

S23: and constructing an unanswered question set, acquiring user feedback, and storing the corresponding question into the unanswered question set if the user feedback indicates that the question is unanswered. User feedback is collected by user selection. Options include unanswered, answered. And storing all the questions selected as unanswered in an unanswered question bank.

S3: categorizing questions in an unanswered set of questions

Step S3 specifically includes:

Specifically, examples are:

a problem set not solved, a total of M problems, randomly selecting a problem X1 as a reference problem of the category 1

b, randomly taking a question X2 to calculate whether the similarity between the question X2 and the question X1 is larger than a similarity threshold value through an STS model, wherein the similarity threshold value is 0.9 in the embodiment, if so, the question is also put into the category 1, and if not, the question X2 is taken as a reference question of the category 2;

c, continuing to take the question X3 and repeating the step b, and if the X3 is similar to the X1, putting the question into the category 1; otherwise, whether the problem is similar to the reference problem of the category 2 is calculated, and the like. Until all questions are classified into a certain category.

The step S3 further includes: step S34: the categories are sorted according to the number of the problems in each category, so that the experts pay more attention to the problems of high frequency and hot spots during labeling, and the input-output ratio is high.

S4: and acquiring expert correction information, and performing category correction on the unanswered question sets classified into categories based on the expert correction information.

The expert correction information comprises a type merging instruction, a type splitting instruction and a problem moving instruction,

the question move instruction is to move one or more questions in one category into another category.

The invention also provides a question-answering system based on active learning, which comprises a basic knowledge base module, a question-answering module, a category division module, a correction module and an answer supplement module,

the question-answering module is used for carrying out customer question answering based on the similarity detection model and constructing an unanswered question set, wherein the unanswered question set comprises user questions which are fed back by the user and are unanswered in question;

the category classification module is used for performing category classification on the questions in the unanswered question set to obtain expert correction information, performing category correction on the unanswered question set of which the categories are classified based on the expert correction information, and performing optimization training on the similarity detection model based on the category correction result;

and the answer supplementing module is used for acquiring corresponding answers of each category in the unanswered question set and updating the questions and the corresponding answers to the basic knowledge base.

In this embodiment, the specific steps of the question-answering module for constructing the question set of the user question answering and the question not answering include:

establishing a text similarity detection model;

In this embodiment, the specific step of the category classification by the category classification module includes:

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above embodiments are merely examples and do not limit the scope of the present invention. These embodiments may be implemented in other various manners, and various omissions, substitutions, and changes may be made without departing from the technical spirit of the present invention.

Claims

1. A question-answering method based on active learning is characterized by comprising the following steps:

s3: classifying the problems in the unanswered problem set;

2. The question-answering method based on active learning according to claim 1, wherein the step S2 specifically comprises:

s21: establishing a text similarity detection model;

3. The question-answering method based on active learning according to claim 1, wherein the step S3 specifically comprises:

4. The active learning-based question answering method according to claim 1, wherein the expert correction information comprises a type merging command, a type splitting command, and a question moving command,

5. The question-answering method based on active learning of claim 1, wherein the similarity detection model is a semantic similarity model based on a BERT model.

6. The active learning-based question answering method according to claim 1, wherein the step S3 further comprises: step S34: the categories are sorted by the number of questions in each category.

7. A question-answering system based on active learning is characterized by comprising a basic knowledge base module, a question-answering module, a category dividing module, a correcting module and an answer supplementing module,

8. The active learning-based question-answering system according to claim 7, wherein the specific steps of the question-answering module for constructing the question sets of the user questions and answers and the questions not answered comprise:

establishing a text similarity detection model;

9. An active learning-based question answering device comprising a memory, a processor and a computer program stored on the memory, the processor implementing the steps of the active learning-based question answering method according to any one of claims 1 to 6 when executing the computer program.

10. A non-transitory computer storage medium storing an executable program for execution by a processor to perform the steps of implementing the active learning-based question-answering method according to any one of claims 1 to 6.