CN114385798A - Question-answering method, system, equipment and medium based on active learning - Google Patents
Question-answering method, system, equipment and medium based on active learning Download PDFInfo
- Publication number
- CN114385798A CN114385798A CN202111541745.5A CN202111541745A CN114385798A CN 114385798 A CN114385798 A CN 114385798A CN 202111541745 A CN202111541745 A CN 202111541745A CN 114385798 A CN114385798 A CN 114385798A
- Authority
- CN
- China
- Prior art keywords
- question
- unanswered
- questions
- category
- answering
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
Abstract
The invention relates to a question-answering method, a question-answering system, a question-answering device and a question-answering medium based on active learning, wherein the method comprises the following steps: constructing a basic knowledge base; the method comprises the steps that a client question is answered based on a similarity detection model, and an unanswered question set is constructed, wherein the unanswered question set comprises user questions which are fed back by a user and are unanswered in question; classifying the problems in the unanswered problem set; acquiring expert correction information, and performing category correction on the unanswered question sets classified into categories on the basis of the expert correction information; carrying out optimization training on the similarity detection model based on the class correction result; and acquiring corresponding answers of each category in the unanswered question set, and updating the questions and the corresponding answers to the basic knowledge base. Compared with the prior art, the method has the advantages of more accurate question answering, reduced labor cost, improved labeling efficiency and the like.
Description
Technical Field
The invention relates to the field of intelligent question answering, in particular to a question answering method, a question answering system, question answering equipment and a question answering medium based on active learning.
Background
The chat robot can realize the chat with the user, and with the emphasis of enterprises on customer service, the chat robot gradually develops from the entertainment field to the fields of customer service and the like.
Since the chat robot cannot answer technical questions by itself, it is now common practice to collect many question-answer pairs consisting of "question-answer" and construct a knowledge base to provide support. When the user asks questions, the chat robot searches the most relevant questions in the knowledge base, extracts answers and replies.
The sophistication of the knowledge base determines the user's experience, and no matter how sophisticated the search technique and the similarity matching technique is, if the relevant questions are not stored in the knowledge base, the chat robot cannot answer the questions of the user. In order to perfect the knowledge base, questions which cannot be answered by the question-answering system can be collected regularly, the questions are given to experts for labeling, and corresponding question-answering pairs are supplemented to the knowledge base. However, because the manpower of the experts is limited, the magnitude of the unanswered questions is large, and the experts are difficult to mark all the questions in time and cannot find high-frequency questions from numerous candidates for preferential marking. When multiple experts label simultaneously, also can produce repeated label because of similar problem, extravagant manpower.
Disclosure of Invention
The present invention is directed to a method, system, device and medium for question answering based on active learning to overcome the above-mentioned drawbacks of the prior art.
The purpose of the invention can be realized by the following technical scheme:
a question-answering method based on active learning comprises the following steps:
s1: constructing a basic knowledge base, wherein the basic knowledge base comprises known questions and answers corresponding to the known questions;
s2: the method comprises the steps that a client question is answered based on a similarity detection model, and an unanswered question set is constructed, wherein the unanswered question set comprises user questions which are fed back by a user and are unanswered in question;
s3: classifying the problems in the unanswered problem set;
s4: acquiring expert correction information, and performing category correction on the unanswered question sets classified into categories on the basis of the expert correction information;
s5: carrying out optimization training on the similarity detection model based on the class correction result;
s6: and acquiring corresponding answers of each category in the unanswered question set, and updating the questions and the corresponding answers to the basic knowledge base.
Preferably, the step S2 specifically includes:
s21: establishing a text similarity detection model;
s22: acquiring a question to be answered by a user and searching an answer of a known question with the highest similarity in a basic knowledge base through a text similarity detection model to serve as an answer output;
s23: and constructing an unanswered question set, acquiring user feedback, and storing the corresponding question into the unanswered question set if the user feedback indicates that the question is unanswered.
Preferably, the step S3 specifically includes:
s31: selecting a first question from the unanswered question set as a reference question of a first category;
s32: selecting the problems to be classified and the reference problems of all the categories from the unanswered question set to carry out similarity calculation one by one, if the similarity between the reference problems of the categories and the problems to be classified in all the categories is greater than a similarity threshold value, judging the problems to be classified as the categories, and if not, taking the problems to be classified as the reference problems of a new category;
s33: step S32 is repeated until all questions of the unanswered question set have completed category classification.
The step S3 further includes: step S34: the categories are sorted by the number of questions in each category.
Preferably, the expert correction information comprises a type merging instruction, a type splitting instruction and a question moving instruction,
the type merging instruction is used for merging two types in the unanswered question set and corresponding questions into one type;
the type splitting instruction is used for dividing one type and question in the unanswered question set into two types and corresponding questions;
the question moving instruction is to move one or more questions in one category to another category.
Preferably, the similarity detection model is a semantic similarity model based on a BERT model.
A question-answering system based on active learning comprises a basic knowledge base module, a question-answering module, a category dividing module, a correcting module and an answer supplementing module,
the basic knowledge base module is used for constructing a basic knowledge base, and the basic knowledge base comprises known questions and answers corresponding to the known questions;
the question-answering module is used for carrying out customer question answering based on the similarity detection model and constructing an unanswered question set, wherein the unanswered question set comprises user questions which are fed back by the user and are unanswered questions;
the category classification module is used for performing category classification on the questions in the unanswered question set to obtain expert correction information, performing category correction on the unanswered question set with the classified categories based on the expert correction information, and performing optimization training on the similarity detection model based on the category correction result;
the answer supplementing module is used for acquiring corresponding answers of each category in the unanswered question set and updating the questions and the corresponding answers to the basic knowledge base.
Preferably, the specific steps of the question-answering module for constructing the question set of the user to be answered and the question set of the user to be answered include:
establishing a text similarity detection model;
acquiring a question to be answered by a user and searching an answer of a known question with the highest similarity in a basic knowledge base through a text similarity detection model to serve as an answer output;
and constructing an unanswered question set, acquiring user feedback, and storing the corresponding question into the unanswered question set if the user feedback indicates that the question is unanswered.
Preferably, the specific step of classifying the categories by the category classification module includes:
the method comprises the following steps: selecting a first question from the unanswered question set as a reference question of a first category;
step two: selecting the problems to be classified and the reference problems of all the categories from the unanswered question set to carry out similarity calculation one by one, if the similarity between the reference problems of the categories and the problems to be classified in all the categories is greater than a similarity threshold value, judging the problems to be classified as the categories, and if not, taking the problems to be classified as the reference problems of a new category;
step three: and repeating the third step until all the questions in the unanswered question set are classified into categories.
An active learning-based question answering device comprises a memory, a processor and a computer program stored on the memory, wherein the processor realizes the steps of the active learning-based question answering method when executing the computer program.
A non-transitory computer storage medium storing an executable program that is executed by a processor to implement the steps of the active learning-based question-answering method described above.
Compared with the prior art, the invention has the following advantages:
(1) according to the method, the similarity detection model is used for obtaining the customer questions and searching and answering the customer questions in the knowledge base, the unanswered questions are recorded and classified, labeled and corrected by experts according to the preset rules, the similarity detection model is further trained to form an active learning mechanism, the precision and reliability of question answering are effectively improved, the basic knowledge base can be expanded continuously based on the collected unanswered questions, the question answering amount is improved, and the user experience is further improved.
(2) The invention adopts the classification model and the rules to classify the unanswered questions, and the experts correct and answer the questions, thereby effectively liberating the labor cost, avoiding the repeated marking and improving the work efficiency.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. Note that the following description of the embodiments is merely a substantial example, and the present invention is not intended to be limited to the application or the use thereof, and is not limited to the following embodiments.
Examples
A question-answering method based on active learning, as shown in fig. 1, includes the following steps:
s1: and constructing a basic knowledge base, wherein the basic knowledge base comprises known questions and answers corresponding to the known questions.
S2: based on the similarity detection model, carrying out customer question answering and constructing an unanswered question set, wherein the unanswered question set comprises user questions which are fed back by users and are unanswered by questions
Step S2 specifically includes:
s21: and establishing a text similarity detection model. The step uses an STS (text similarity detection) model, specifically uses a semantic similarity model based on a BERT model, and outputs the question-answer pair with the highest similarity as the most relevant question.
S22: and acquiring the question to be answered by the user, and searching the answer of the known question with the highest similarity in the basic knowledge base through a text similarity detection model to output as an answer. In this embodiment, the question to be answered by the user is obtained by a voice obtaining device such as a microphone arranged on the question answering device and converted by a voice recognition algorithm or obtained by a text input interface.
S23: and constructing an unanswered question set, acquiring user feedback, and storing the corresponding question into the unanswered question set if the user feedback indicates that the question is unanswered. User feedback is collected by user selection. Options include unanswered, answered. And storing all the questions selected as unanswered in an unanswered question bank.
S3: categorizing questions in an unanswered set of questions
Step S3 specifically includes:
s31: selecting a first question from the unanswered question set as a reference question of a first category;
s32: selecting the problems to be classified and the reference problems of all the categories from the unanswered question set to carry out similarity calculation one by one, if the similarity between the reference problems of the categories and the problems to be classified in all the categories is greater than a similarity threshold value, judging the problems to be classified as the categories, and if not, taking the problems to be classified as the reference problems of a new category;
s33: step S32 is repeated until all questions of the unanswered question set have completed category classification.
Specifically, examples are:
a problem set not solved, a total of M problems, randomly selecting a problem X1 as a reference problem of the category 1
b, randomly taking a question X2 to calculate whether the similarity between the question X2 and the question X1 is larger than a similarity threshold value through an STS model, wherein the similarity threshold value is 0.9 in the embodiment, if so, the question is also put into the category 1, and if not, the question X2 is taken as a reference question of the category 2;
c, continuing to take the question X3 and repeating the step b, and if the X3 is similar to the X1, putting the question into the category 1; otherwise, whether the problem is similar to the reference problem of the category 2 is calculated, and the like. Until all questions are classified into a certain category.
The step S3 further includes: step S34: the categories are sorted according to the number of the problems in each category, so that the experts pay more attention to the problems of high frequency and hot spots during labeling, and the input-output ratio is high.
S4: and acquiring expert correction information, and performing category correction on the unanswered question sets classified into categories based on the expert correction information.
The expert correction information comprises a type merging instruction, a type splitting instruction and a problem moving instruction,
the type merging instruction is used for merging two types in the unanswered question set and corresponding questions into one type;
the type splitting instruction is used for dividing one type and question in the unanswered question set into two types and corresponding questions;
the question move instruction is to move one or more questions in one category into another category.
S5: carrying out optimization training on the similarity detection model based on the class correction result;
s6: and acquiring corresponding answers of each category in the unanswered question set, and updating the questions and the corresponding answers to the basic knowledge base.
The invention also provides a question-answering system based on active learning, which comprises a basic knowledge base module, a question-answering module, a category division module, a correction module and an answer supplement module,
the basic knowledge base module is used for constructing a basic knowledge base, and the basic knowledge base comprises known questions and answers corresponding to the known questions;
the question-answering module is used for carrying out customer question answering based on the similarity detection model and constructing an unanswered question set, wherein the unanswered question set comprises user questions which are fed back by the user and are unanswered in question;
the category classification module is used for performing category classification on the questions in the unanswered question set to obtain expert correction information, performing category correction on the unanswered question set of which the categories are classified based on the expert correction information, and performing optimization training on the similarity detection model based on the category correction result;
and the answer supplementing module is used for acquiring corresponding answers of each category in the unanswered question set and updating the questions and the corresponding answers to the basic knowledge base.
In this embodiment, the specific steps of the question-answering module for constructing the question set of the user question answering and the question not answering include:
establishing a text similarity detection model;
acquiring a question to be answered by a user and searching an answer of a known question with the highest similarity in a basic knowledge base through a text similarity detection model to serve as an answer output;
and constructing an unanswered question set, acquiring user feedback, and storing the corresponding question into the unanswered question set if the user feedback indicates that the question is unanswered.
In this embodiment, the specific step of the category classification by the category classification module includes:
the method comprises the following steps: selecting a first question from the unanswered question set as a reference question of a first category;
step two: selecting the problems to be classified and the reference problems of all the categories from the unanswered question set to carry out similarity calculation one by one, if the similarity between the reference problems of the categories and the problems to be classified in all the categories is greater than a similarity threshold value, judging the problems to be classified as the categories, and if not, taking the problems to be classified as the reference problems of a new category;
step three: and repeating the third step until all the questions in the unanswered question set are classified into categories.
An active learning-based question answering device comprises a memory, a processor and a computer program stored on the memory, wherein the processor realizes the steps of the active learning-based question answering method when executing the computer program.
A non-transitory computer storage medium storing an executable program that is executed by a processor to implement the steps of the active learning-based question-answering method described above.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above embodiments are merely examples and do not limit the scope of the present invention. These embodiments may be implemented in other various manners, and various omissions, substitutions, and changes may be made without departing from the technical spirit of the present invention.
Claims (10)
1. A question-answering method based on active learning is characterized by comprising the following steps:
s1: constructing a basic knowledge base, wherein the basic knowledge base comprises known questions and answers corresponding to the known questions;
s2: the method comprises the steps that a client question is answered based on a similarity detection model, and an unanswered question set is constructed, wherein the unanswered question set comprises user questions which are fed back by a user and are unanswered in question;
s3: classifying the problems in the unanswered problem set;
s4: acquiring expert correction information, and performing category correction on the unanswered question sets classified into categories on the basis of the expert correction information;
s5: carrying out optimization training on the similarity detection model based on the class correction result;
s6: and acquiring corresponding answers of each category in the unanswered question set, and updating the questions and the corresponding answers to the basic knowledge base.
2. The question-answering method based on active learning according to claim 1, wherein the step S2 specifically comprises:
s21: establishing a text similarity detection model;
s22: acquiring a question to be answered by a user and searching an answer of a known question with the highest similarity in a basic knowledge base through a text similarity detection model to serve as an answer output;
s23: and constructing an unanswered question set, acquiring user feedback, and storing the corresponding question into the unanswered question set if the user feedback indicates that the question is unanswered.
3. The question-answering method based on active learning according to claim 1, wherein the step S3 specifically comprises:
s31: selecting a first question from the unanswered question set as a reference question of a first category;
s32: selecting the problems to be classified and the reference problems of all the categories from the unanswered question set to carry out similarity calculation one by one, if the similarity between the reference problems of the categories and the problems to be classified in all the categories is greater than a similarity threshold value, judging the problems to be classified as the categories, and if not, taking the problems to be classified as the reference problems of a new category;
s33: step S32 is repeated until all questions of the unanswered question set have completed category classification.
4. The active learning-based question answering method according to claim 1, wherein the expert correction information comprises a type merging command, a type splitting command, and a question moving command,
the type merging instruction is used for merging two types in the unanswered question set and corresponding questions into one type;
the type splitting instruction is used for dividing one type and question in the unanswered question set into two types and corresponding questions;
the question moving instruction is to move one or more questions in one category to another category.
5. The question-answering method based on active learning of claim 1, wherein the similarity detection model is a semantic similarity model based on a BERT model.
6. The active learning-based question answering method according to claim 1, wherein the step S3 further comprises: step S34: the categories are sorted by the number of questions in each category.
7. A question-answering system based on active learning is characterized by comprising a basic knowledge base module, a question-answering module, a category dividing module, a correcting module and an answer supplementing module,
the basic knowledge base module is used for constructing a basic knowledge base, and the basic knowledge base comprises known questions and answers corresponding to the known questions;
the question-answering module is used for carrying out customer question answering based on the similarity detection model and constructing an unanswered question set, wherein the unanswered question set comprises user questions which are fed back by the user and are unanswered questions;
the category classification module is used for performing category classification on the questions in the unanswered question set to obtain expert correction information, performing category correction on the unanswered question set with the classified categories based on the expert correction information, and performing optimization training on the similarity detection model based on the category correction result;
the answer supplementing module is used for acquiring corresponding answers of each category in the unanswered question set and updating the questions and the corresponding answers to the basic knowledge base.
8. The active learning-based question-answering system according to claim 7, wherein the specific steps of the question-answering module for constructing the question sets of the user questions and answers and the questions not answered comprise:
establishing a text similarity detection model;
acquiring a question to be answered by a user and searching an answer of a known question with the highest similarity in a basic knowledge base through a text similarity detection model to serve as an answer output;
and constructing an unanswered question set, acquiring user feedback, and storing the corresponding question into the unanswered question set if the user feedback indicates that the question is unanswered.
9. An active learning-based question answering device comprising a memory, a processor and a computer program stored on the memory, the processor implementing the steps of the active learning-based question answering method according to any one of claims 1 to 6 when executing the computer program.
10. A non-transitory computer storage medium storing an executable program for execution by a processor to perform the steps of implementing the active learning-based question-answering method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111541745.5A CN114385798A (en) | 2021-12-16 | 2021-12-16 | Question-answering method, system, equipment and medium based on active learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111541745.5A CN114385798A (en) | 2021-12-16 | 2021-12-16 | Question-answering method, system, equipment and medium based on active learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114385798A true CN114385798A (en) | 2022-04-22 |
Family
ID=81197003
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111541745.5A Pending CN114385798A (en) | 2021-12-16 | 2021-12-16 | Question-answering method, system, equipment and medium based on active learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114385798A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117194602A (en) * | 2023-09-06 | 2023-12-08 | 书音(上海)文化科技有限公司 | Local knowledge base updating method and system based on large language model and BERT model |
-
2021
- 2021-12-16 CN CN202111541745.5A patent/CN114385798A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117194602A (en) * | 2023-09-06 | 2023-12-08 | 书音(上海)文化科技有限公司 | Local knowledge base updating method and system based on large language model and BERT model |
CN117194602B (en) * | 2023-09-06 | 2024-04-19 | 书音(上海)文化科技有限公司 | Local knowledge base updating method and system based on large language model and BERT model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cai et al. | Large language models as tool makers | |
CN107908803B (en) | Question-answer interaction response method and device, storage medium and terminal | |
WO2021082982A1 (en) | Graphic knowledge base-based question and answer method and device, storage medium, and apparatus | |
CN106649694A (en) | Method and device for identifying user's intention in voice interaction | |
CN105893535A (en) | Intelligent question and answer method, knowledge base optimizing method and device and intelligent knowledge base | |
CN104794527A (en) | Method and equipment for constructing classification model based on convolutional neural network | |
WO2020042583A1 (en) | Method and system for type identification of potential outstanding personnel, and computer device and medium | |
CN112685550B (en) | Intelligent question-answering method, intelligent question-answering device, intelligent question-answering server and computer readable storage medium | |
CN110019729B (en) | Intelligent question-answering method, storage medium and terminal | |
CN102915493A (en) | Information processing apparatus and method | |
CN108334805A (en) | The method and apparatus for detecting file reading sequences | |
CN111178081B (en) | Semantic recognition method, server, electronic device and computer storage medium | |
CN116166782A (en) | Intelligent question-answering method based on deep learning | |
CN110321564A (en) | A kind of more wheel dialogue intension recognizing methods | |
CN116597461B (en) | Topic knowledge point association method and system based on artificial intelligence | |
CN111737439B (en) | Question generation method and device | |
CN110610698A (en) | Voice labeling method and device | |
CN114385798A (en) | Question-answering method, system, equipment and medium based on active learning | |
CN113220854B (en) | Intelligent dialogue method and device for machine reading and understanding | |
JP2018194980A (en) | Determination program, determination method and determination apparatus | |
JP2002229431A (en) | Learning system | |
CN106600489B (en) | Method and device for marking error test questions | |
CN110909174B (en) | Knowledge graph-based method for improving entity link in simple question answering | |
CN111949786A (en) | Intelligent question-answer model optimization method and device | |
CN111125329A (en) | Text information screening method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |