CN114385798A - Question-answering method, system, equipment and medium based on active learning - Google Patents

Question-answering method, system, equipment and medium based on active learning Download PDF

Info

Publication number
CN114385798A
CN114385798A CN202111541745.5A CN202111541745A CN114385798A CN 114385798 A CN114385798 A CN 114385798A CN 202111541745 A CN202111541745 A CN 202111541745A CN 114385798 A CN114385798 A CN 114385798A
Authority
CN
China
Prior art keywords
question
unanswered
questions
category
answering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111541745.5A
Other languages
Chinese (zh)
Inventor
冯耀
王椭
朱祥
熊赏
陈娜
陆恒宇
罗浩昇
赵权有
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Pudong Development Bank Co Ltd
Original Assignee
Shanghai Pudong Development Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Pudong Development Bank Co Ltd filed Critical Shanghai Pudong Development Bank Co Ltd
Priority to CN202111541745.5A priority Critical patent/CN114385798A/en
Publication of CN114385798A publication Critical patent/CN114385798A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Abstract

The invention relates to a question-answering method, a question-answering system, a question-answering device and a question-answering medium based on active learning, wherein the method comprises the following steps: constructing a basic knowledge base; the method comprises the steps that a client question is answered based on a similarity detection model, and an unanswered question set is constructed, wherein the unanswered question set comprises user questions which are fed back by a user and are unanswered in question; classifying the problems in the unanswered problem set; acquiring expert correction information, and performing category correction on the unanswered question sets classified into categories on the basis of the expert correction information; carrying out optimization training on the similarity detection model based on the class correction result; and acquiring corresponding answers of each category in the unanswered question set, and updating the questions and the corresponding answers to the basic knowledge base. Compared with the prior art, the method has the advantages of more accurate question answering, reduced labor cost, improved labeling efficiency and the like.

Description

Question-answering method, system, equipment and medium based on active learning
Technical Field
The invention relates to the field of intelligent question answering, in particular to a question answering method, a question answering system, question answering equipment and a question answering medium based on active learning.
Background
The chat robot can realize the chat with the user, and with the emphasis of enterprises on customer service, the chat robot gradually develops from the entertainment field to the fields of customer service and the like.
Since the chat robot cannot answer technical questions by itself, it is now common practice to collect many question-answer pairs consisting of "question-answer" and construct a knowledge base to provide support. When the user asks questions, the chat robot searches the most relevant questions in the knowledge base, extracts answers and replies.
The sophistication of the knowledge base determines the user's experience, and no matter how sophisticated the search technique and the similarity matching technique is, if the relevant questions are not stored in the knowledge base, the chat robot cannot answer the questions of the user. In order to perfect the knowledge base, questions which cannot be answered by the question-answering system can be collected regularly, the questions are given to experts for labeling, and corresponding question-answering pairs are supplemented to the knowledge base. However, because the manpower of the experts is limited, the magnitude of the unanswered questions is large, and the experts are difficult to mark all the questions in time and cannot find high-frequency questions from numerous candidates for preferential marking. When multiple experts label simultaneously, also can produce repeated label because of similar problem, extravagant manpower.
Disclosure of Invention
The present invention is directed to a method, system, device and medium for question answering based on active learning to overcome the above-mentioned drawbacks of the prior art.
The purpose of the invention can be realized by the following technical scheme:
a question-answering method based on active learning comprises the following steps:
s1: constructing a basic knowledge base, wherein the basic knowledge base comprises known questions and answers corresponding to the known questions;
s2: the method comprises the steps that a client question is answered based on a similarity detection model, and an unanswered question set is constructed, wherein the unanswered question set comprises user questions which are fed back by a user and are unanswered in question;
s3: classifying the problems in the unanswered problem set;
s4: acquiring expert correction information, and performing category correction on the unanswered question sets classified into categories on the basis of the expert correction information;
s5: carrying out optimization training on the similarity detection model based on the class correction result;
s6: and acquiring corresponding answers of each category in the unanswered question set, and updating the questions and the corresponding answers to the basic knowledge base.
Preferably, the step S2 specifically includes:
s21: establishing a text similarity detection model;
s22: acquiring a question to be answered by a user and searching an answer of a known question with the highest similarity in a basic knowledge base through a text similarity detection model to serve as an answer output;
s23: and constructing an unanswered question set, acquiring user feedback, and storing the corresponding question into the unanswered question set if the user feedback indicates that the question is unanswered.
Preferably, the step S3 specifically includes:
s31: selecting a first question from the unanswered question set as a reference question of a first category;
s32: selecting the problems to be classified and the reference problems of all the categories from the unanswered question set to carry out similarity calculation one by one, if the similarity between the reference problems of the categories and the problems to be classified in all the categories is greater than a similarity threshold value, judging the problems to be classified as the categories, and if not, taking the problems to be classified as the reference problems of a new category;
s33: step S32 is repeated until all questions of the unanswered question set have completed category classification.
The step S3 further includes: step S34: the categories are sorted by the number of questions in each category.
Preferably, the expert correction information comprises a type merging instruction, a type splitting instruction and a question moving instruction,
the type merging instruction is used for merging two types in the unanswered question set and corresponding questions into one type;
the type splitting instruction is used for dividing one type and question in the unanswered question set into two types and corresponding questions;
the question moving instruction is to move one or more questions in one category to another category.
Preferably, the similarity detection model is a semantic similarity model based on a BERT model.
A question-answering system based on active learning comprises a basic knowledge base module, a question-answering module, a category dividing module, a correcting module and an answer supplementing module,
the basic knowledge base module is used for constructing a basic knowledge base, and the basic knowledge base comprises known questions and answers corresponding to the known questions;
the question-answering module is used for carrying out customer question answering based on the similarity detection model and constructing an unanswered question set, wherein the unanswered question set comprises user questions which are fed back by the user and are unanswered questions;
the category classification module is used for performing category classification on the questions in the unanswered question set to obtain expert correction information, performing category correction on the unanswered question set with the classified categories based on the expert correction information, and performing optimization training on the similarity detection model based on the category correction result;
the answer supplementing module is used for acquiring corresponding answers of each category in the unanswered question set and updating the questions and the corresponding answers to the basic knowledge base.
Preferably, the specific steps of the question-answering module for constructing the question set of the user to be answered and the question set of the user to be answered include:
establishing a text similarity detection model;
acquiring a question to be answered by a user and searching an answer of a known question with the highest similarity in a basic knowledge base through a text similarity detection model to serve as an answer output;
and constructing an unanswered question set, acquiring user feedback, and storing the corresponding question into the unanswered question set if the user feedback indicates that the question is unanswered.
Preferably, the specific step of classifying the categories by the category classification module includes:
the method comprises the following steps: selecting a first question from the unanswered question set as a reference question of a first category;
step two: selecting the problems to be classified and the reference problems of all the categories from the unanswered question set to carry out similarity calculation one by one, if the similarity between the reference problems of the categories and the problems to be classified in all the categories is greater than a similarity threshold value, judging the problems to be classified as the categories, and if not, taking the problems to be classified as the reference problems of a new category;
step three: and repeating the third step until all the questions in the unanswered question set are classified into categories.
An active learning-based question answering device comprises a memory, a processor and a computer program stored on the memory, wherein the processor realizes the steps of the active learning-based question answering method when executing the computer program.
A non-transitory computer storage medium storing an executable program that is executed by a processor to implement the steps of the active learning-based question-answering method described above.
Compared with the prior art, the invention has the following advantages:
(1) according to the method, the similarity detection model is used for obtaining the customer questions and searching and answering the customer questions in the knowledge base, the unanswered questions are recorded and classified, labeled and corrected by experts according to the preset rules, the similarity detection model is further trained to form an active learning mechanism, the precision and reliability of question answering are effectively improved, the basic knowledge base can be expanded continuously based on the collected unanswered questions, the question answering amount is improved, and the user experience is further improved.
(2) The invention adopts the classification model and the rules to classify the unanswered questions, and the experts correct and answer the questions, thereby effectively liberating the labor cost, avoiding the repeated marking and improving the work efficiency.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. Note that the following description of the embodiments is merely a substantial example, and the present invention is not intended to be limited to the application or the use thereof, and is not limited to the following embodiments.
Examples
A question-answering method based on active learning, as shown in fig. 1, includes the following steps:
s1: and constructing a basic knowledge base, wherein the basic knowledge base comprises known questions and answers corresponding to the known questions.
S2: based on the similarity detection model, carrying out customer question answering and constructing an unanswered question set, wherein the unanswered question set comprises user questions which are fed back by users and are unanswered by questions
Step S2 specifically includes:
s21: and establishing a text similarity detection model. The step uses an STS (text similarity detection) model, specifically uses a semantic similarity model based on a BERT model, and outputs the question-answer pair with the highest similarity as the most relevant question.
S22: and acquiring the question to be answered by the user, and searching the answer of the known question with the highest similarity in the basic knowledge base through a text similarity detection model to output as an answer. In this embodiment, the question to be answered by the user is obtained by a voice obtaining device such as a microphone arranged on the question answering device and converted by a voice recognition algorithm or obtained by a text input interface.
S23: and constructing an unanswered question set, acquiring user feedback, and storing the corresponding question into the unanswered question set if the user feedback indicates that the question is unanswered. User feedback is collected by user selection. Options include unanswered, answered. And storing all the questions selected as unanswered in an unanswered question bank.
S3: categorizing questions in an unanswered set of questions
Step S3 specifically includes:
s31: selecting a first question from the unanswered question set as a reference question of a first category;
s32: selecting the problems to be classified and the reference problems of all the categories from the unanswered question set to carry out similarity calculation one by one, if the similarity between the reference problems of the categories and the problems to be classified in all the categories is greater than a similarity threshold value, judging the problems to be classified as the categories, and if not, taking the problems to be classified as the reference problems of a new category;
s33: step S32 is repeated until all questions of the unanswered question set have completed category classification.
Specifically, examples are:
a problem set not solved, a total of M problems, randomly selecting a problem X1 as a reference problem of the category 1
b, randomly taking a question X2 to calculate whether the similarity between the question X2 and the question X1 is larger than a similarity threshold value through an STS model, wherein the similarity threshold value is 0.9 in the embodiment, if so, the question is also put into the category 1, and if not, the question X2 is taken as a reference question of the category 2;
c, continuing to take the question X3 and repeating the step b, and if the X3 is similar to the X1, putting the question into the category 1; otherwise, whether the problem is similar to the reference problem of the category 2 is calculated, and the like. Until all questions are classified into a certain category.
The step S3 further includes: step S34: the categories are sorted according to the number of the problems in each category, so that the experts pay more attention to the problems of high frequency and hot spots during labeling, and the input-output ratio is high.
S4: and acquiring expert correction information, and performing category correction on the unanswered question sets classified into categories based on the expert correction information.
The expert correction information comprises a type merging instruction, a type splitting instruction and a problem moving instruction,
the type merging instruction is used for merging two types in the unanswered question set and corresponding questions into one type;
the type splitting instruction is used for dividing one type and question in the unanswered question set into two types and corresponding questions;
the question move instruction is to move one or more questions in one category into another category.
S5: carrying out optimization training on the similarity detection model based on the class correction result;
s6: and acquiring corresponding answers of each category in the unanswered question set, and updating the questions and the corresponding answers to the basic knowledge base.
The invention also provides a question-answering system based on active learning, which comprises a basic knowledge base module, a question-answering module, a category division module, a correction module and an answer supplement module,
the basic knowledge base module is used for constructing a basic knowledge base, and the basic knowledge base comprises known questions and answers corresponding to the known questions;
the question-answering module is used for carrying out customer question answering based on the similarity detection model and constructing an unanswered question set, wherein the unanswered question set comprises user questions which are fed back by the user and are unanswered in question;
the category classification module is used for performing category classification on the questions in the unanswered question set to obtain expert correction information, performing category correction on the unanswered question set of which the categories are classified based on the expert correction information, and performing optimization training on the similarity detection model based on the category correction result;
and the answer supplementing module is used for acquiring corresponding answers of each category in the unanswered question set and updating the questions and the corresponding answers to the basic knowledge base.
In this embodiment, the specific steps of the question-answering module for constructing the question set of the user question answering and the question not answering include:
establishing a text similarity detection model;
acquiring a question to be answered by a user and searching an answer of a known question with the highest similarity in a basic knowledge base through a text similarity detection model to serve as an answer output;
and constructing an unanswered question set, acquiring user feedback, and storing the corresponding question into the unanswered question set if the user feedback indicates that the question is unanswered.
In this embodiment, the specific step of the category classification by the category classification module includes:
the method comprises the following steps: selecting a first question from the unanswered question set as a reference question of a first category;
step two: selecting the problems to be classified and the reference problems of all the categories from the unanswered question set to carry out similarity calculation one by one, if the similarity between the reference problems of the categories and the problems to be classified in all the categories is greater than a similarity threshold value, judging the problems to be classified as the categories, and if not, taking the problems to be classified as the reference problems of a new category;
step three: and repeating the third step until all the questions in the unanswered question set are classified into categories.
An active learning-based question answering device comprises a memory, a processor and a computer program stored on the memory, wherein the processor realizes the steps of the active learning-based question answering method when executing the computer program.
A non-transitory computer storage medium storing an executable program that is executed by a processor to implement the steps of the active learning-based question-answering method described above.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above embodiments are merely examples and do not limit the scope of the present invention. These embodiments may be implemented in other various manners, and various omissions, substitutions, and changes may be made without departing from the technical spirit of the present invention.

Claims (10)

1. A question-answering method based on active learning is characterized by comprising the following steps:
s1: constructing a basic knowledge base, wherein the basic knowledge base comprises known questions and answers corresponding to the known questions;
s2: the method comprises the steps that a client question is answered based on a similarity detection model, and an unanswered question set is constructed, wherein the unanswered question set comprises user questions which are fed back by a user and are unanswered in question;
s3: classifying the problems in the unanswered problem set;
s4: acquiring expert correction information, and performing category correction on the unanswered question sets classified into categories on the basis of the expert correction information;
s5: carrying out optimization training on the similarity detection model based on the class correction result;
s6: and acquiring corresponding answers of each category in the unanswered question set, and updating the questions and the corresponding answers to the basic knowledge base.
2. The question-answering method based on active learning according to claim 1, wherein the step S2 specifically comprises:
s21: establishing a text similarity detection model;
s22: acquiring a question to be answered by a user and searching an answer of a known question with the highest similarity in a basic knowledge base through a text similarity detection model to serve as an answer output;
s23: and constructing an unanswered question set, acquiring user feedback, and storing the corresponding question into the unanswered question set if the user feedback indicates that the question is unanswered.
3. The question-answering method based on active learning according to claim 1, wherein the step S3 specifically comprises:
s31: selecting a first question from the unanswered question set as a reference question of a first category;
s32: selecting the problems to be classified and the reference problems of all the categories from the unanswered question set to carry out similarity calculation one by one, if the similarity between the reference problems of the categories and the problems to be classified in all the categories is greater than a similarity threshold value, judging the problems to be classified as the categories, and if not, taking the problems to be classified as the reference problems of a new category;
s33: step S32 is repeated until all questions of the unanswered question set have completed category classification.
4. The active learning-based question answering method according to claim 1, wherein the expert correction information comprises a type merging command, a type splitting command, and a question moving command,
the type merging instruction is used for merging two types in the unanswered question set and corresponding questions into one type;
the type splitting instruction is used for dividing one type and question in the unanswered question set into two types and corresponding questions;
the question moving instruction is to move one or more questions in one category to another category.
5. The question-answering method based on active learning of claim 1, wherein the similarity detection model is a semantic similarity model based on a BERT model.
6. The active learning-based question answering method according to claim 1, wherein the step S3 further comprises: step S34: the categories are sorted by the number of questions in each category.
7. A question-answering system based on active learning is characterized by comprising a basic knowledge base module, a question-answering module, a category dividing module, a correcting module and an answer supplementing module,
the basic knowledge base module is used for constructing a basic knowledge base, and the basic knowledge base comprises known questions and answers corresponding to the known questions;
the question-answering module is used for carrying out customer question answering based on the similarity detection model and constructing an unanswered question set, wherein the unanswered question set comprises user questions which are fed back by the user and are unanswered questions;
the category classification module is used for performing category classification on the questions in the unanswered question set to obtain expert correction information, performing category correction on the unanswered question set with the classified categories based on the expert correction information, and performing optimization training on the similarity detection model based on the category correction result;
the answer supplementing module is used for acquiring corresponding answers of each category in the unanswered question set and updating the questions and the corresponding answers to the basic knowledge base.
8. The active learning-based question-answering system according to claim 7, wherein the specific steps of the question-answering module for constructing the question sets of the user questions and answers and the questions not answered comprise:
establishing a text similarity detection model;
acquiring a question to be answered by a user and searching an answer of a known question with the highest similarity in a basic knowledge base through a text similarity detection model to serve as an answer output;
and constructing an unanswered question set, acquiring user feedback, and storing the corresponding question into the unanswered question set if the user feedback indicates that the question is unanswered.
9. An active learning-based question answering device comprising a memory, a processor and a computer program stored on the memory, the processor implementing the steps of the active learning-based question answering method according to any one of claims 1 to 6 when executing the computer program.
10. A non-transitory computer storage medium storing an executable program for execution by a processor to perform the steps of implementing the active learning-based question-answering method according to any one of claims 1 to 6.
CN202111541745.5A 2021-12-16 2021-12-16 Question-answering method, system, equipment and medium based on active learning Pending CN114385798A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111541745.5A CN114385798A (en) 2021-12-16 2021-12-16 Question-answering method, system, equipment and medium based on active learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111541745.5A CN114385798A (en) 2021-12-16 2021-12-16 Question-answering method, system, equipment and medium based on active learning

Publications (1)

Publication Number Publication Date
CN114385798A true CN114385798A (en) 2022-04-22

Family

ID=81197003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111541745.5A Pending CN114385798A (en) 2021-12-16 2021-12-16 Question-answering method, system, equipment and medium based on active learning

Country Status (1)

Country Link
CN (1) CN114385798A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117194602A (en) * 2023-09-06 2023-12-08 书音(上海)文化科技有限公司 Local knowledge base updating method and system based on large language model and BERT model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117194602A (en) * 2023-09-06 2023-12-08 书音(上海)文化科技有限公司 Local knowledge base updating method and system based on large language model and BERT model
CN117194602B (en) * 2023-09-06 2024-04-19 书音(上海)文化科技有限公司 Local knowledge base updating method and system based on large language model and BERT model

Similar Documents

Publication Publication Date Title
Cai et al. Large language models as tool makers
CN107908803B (en) Question-answer interaction response method and device, storage medium and terminal
WO2021082982A1 (en) Graphic knowledge base-based question and answer method and device, storage medium, and apparatus
CN106649694A (en) Method and device for identifying user's intention in voice interaction
CN105893535A (en) Intelligent question and answer method, knowledge base optimizing method and device and intelligent knowledge base
CN104794527A (en) Method and equipment for constructing classification model based on convolutional neural network
WO2020042583A1 (en) Method and system for type identification of potential outstanding personnel, and computer device and medium
CN112685550B (en) Intelligent question-answering method, intelligent question-answering device, intelligent question-answering server and computer readable storage medium
CN110019729B (en) Intelligent question-answering method, storage medium and terminal
CN102915493A (en) Information processing apparatus and method
CN108334805A (en) The method and apparatus for detecting file reading sequences
CN111178081B (en) Semantic recognition method, server, electronic device and computer storage medium
CN116166782A (en) Intelligent question-answering method based on deep learning
CN110321564A (en) A kind of more wheel dialogue intension recognizing methods
CN116597461B (en) Topic knowledge point association method and system based on artificial intelligence
CN111737439B (en) Question generation method and device
CN110610698A (en) Voice labeling method and device
CN114385798A (en) Question-answering method, system, equipment and medium based on active learning
CN113220854B (en) Intelligent dialogue method and device for machine reading and understanding
JP2018194980A (en) Determination program, determination method and determination apparatus
JP2002229431A (en) Learning system
CN106600489B (en) Method and device for marking error test questions
CN110909174B (en) Knowledge graph-based method for improving entity link in simple question answering
CN111949786A (en) Intelligent question-answer model optimization method and device
CN111125329A (en) Text information screening method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination