CN106980652B

CN106980652B - Intelligent question and answer method and system

Info

Publication number: CN106980652B
Application number: CN201710123627.XA
Authority: CN
Inventors: 简仁贤; 叶茂; 杨亮
Original assignee: Emotibot Technologies Ltd
Current assignee: Emotibot Technologies Ltd
Priority date: 2017-03-03
Filing date: 2017-03-03
Publication date: 2020-09-08
Anticipated expiration: 2037-03-03
Also published as: CN106980652A

Abstract

The invention provides an intelligent question answering method and system, the method is: acquiring the stored questions and corresponding answers in the knowledge base, and calculating positive semantic information and negative semantic information of the questions according to the semantic model; calculating the similarity between the keywords of the user problem and the keywords of the problems in the knowledge base to obtain a first problem set, and obtaining positive semantic information and negative semantic information corresponding to the problems in the first problem set in the knowledge base; calculating positive semantic information and negative semantic information corresponding to the user problem; removing the problem that the positive semantic information and the negative semantic information corresponding to the user problem are inconsistent to obtain a second problem set; one of the second set of questions is randomly retrieved as a matching question, the corresponding answer of which is the answer of the user question. According to the invention, semantic information in the questions proposed by the user is calculated through the semantic model, and compared with the questions and the semantic information thereof stored in the knowledge base, the questions with inconsistent semantics with the questions proposed by the user are removed, similar questions are matched, and accurate answers are given.

Description

Intelligent question and answer method and system

Technical Field

The invention relates to the technical field of electric digital data processing and artificial intelligence, in particular to an intelligent question answering method and system.

Background

In intelligent dialogue systems, answers are usually found using questions and question matching methods. The questions and corresponding answers are stored in a knowledge base, and when a user asks a question A, a question B similar to the question A is found from the knowledge base, and then the answer to the question B is returned to the user. The similarity of two questions is usually calculated by keyword comparison, i.e., the similarity is calculated based on the keywords of question a and question B. To increase recall, a keyword perfect match is not usually required, however, this approach may introduce errors. Since the extracted keywords in the two questions may have a high similarity but the semantics contained in the two questions are different, question a may express a positive semantic and question B may express a negative semantic, and thus, if the semantics contained in the questions are ignored, it is likely that the answer to the question will be given an inaccurate answer. For example, problem A: i like you, question B: i do not like you, wherein the keywords of the question A are 'I', 'like' and 'you', and the keywords of the question B are 'I', 'Do', 'like' and 'you', and because the three keywords of the question A and the question B are the same, the question B can enter a candidate set, but the semantics of the question B are different, one represents positive and the other represents negative, so that the answer of the question is probably not accurate.

Therefore, the prior art has the defect that in the intelligent dialogue system, analysis cannot be carried out according to different semantics contained in the questions, so that the given answers are inaccurate.

Disclosure of Invention

Aiming at the technical problems, the invention provides an intelligent answering method and system, semantic information in a question provided by a user is calculated through a semantic model, the question with the semantic inconsistent with the semantic of the question provided by the user is removed from a knowledge base according to the question and the semantic information thereof stored in the knowledge base, and then the question similar to the question provided by the user is matched, and an accurate answer is given.

In order to solve the technical problems, the technical scheme provided by the invention is as follows:

in a first aspect, the present invention provides an intelligent answering method, comprising:

step S1, acquiring a knowledge base, storing questions and corresponding answers in the knowledge base, calculating positive semantic information and negative semantic information of the questions according to a pre-established semantic model, and storing the positive semantic information and the negative semantic information in the knowledge base;

step S2, obtaining a user question, and calculating a keyword of the user question;

step S3, calculating the similarity between the keywords of the user question and the keywords of the questions in the knowledge base, obtaining a first question set with the similarity being a preset threshold value in the knowledge base, and obtaining positive semantic information and negative semantic information corresponding to the questions in the first question set in the knowledge base;

step S4, according to the user question, positive semantic information and negative semantic information corresponding to the user question are calculated through the semantic model;

step S5, comparing the positive semantic information and the negative semantic information corresponding to the user question with the positive semantic information and the negative semantic information corresponding to the questions in the first question set, and removing the question with inconsistent positive semantic information and negative semantic information corresponding to the user question to obtain a second question set;

and step S6, one question is randomly acquired as a matching question according to the second question set, and the answer corresponding to the matching question is the answer of the user question.

The technical scheme of the intelligent answering method comprises the following steps: acquiring a knowledge base, wherein questions and corresponding answers are stored in the knowledge base, positive semantic information and negative semantic information of the questions are calculated according to a pre-established semantic model, and the positive semantic information and the negative semantic information are stored in the knowledge base; acquiring a user question, and calculating a keyword of the user question; calculating the similarity between the keywords of the user question and the keywords of the questions in the knowledge base, obtaining a first question set with the similarity being a preset threshold value in the knowledge base, and obtaining positive semantic information and negative semantic information corresponding to the questions in the first question set in the knowledge base;

according to the user question, positive semantic information and negative semantic information corresponding to the user question are calculated through the semantic model; comparing the positive semantic information and the negative semantic information corresponding to the user question with the positive semantic information and the negative semantic information corresponding to the questions in the first question set, and removing the questions with the positive semantic information and the negative semantic information corresponding to the user question inconsistent to obtain a second question set; and randomly acquiring a question as a matching question according to the second question set, wherein the answer corresponding to the matching question is the answer of the user question.

According to the intelligent question-answering method, semantic information in questions provided by a user is calculated through a semantic model, according to the questions and the semantic information thereof stored in a knowledge base, a first question set is obtained according to similarity of keywords in the questions, then the questions which are inconsistent with the semantics of the questions provided by the user are removed from the knowledge base, a second question set is obtained, similar questions are matched in the second question set, and accurate answers are given.

Further, the establishing of the semantic model specifically includes:

acquiring a training corpus, wherein the training corpus comprises sentences, positive marks and negative marks in the sentences;

and training the training corpus through a maximum entropy model to obtain a semantic model.

Further, training the training corpus through a maximum entropy model to obtain a semantic model, specifically:

acquiring features in a training corpus, wherein the features are feature sequences obtained from the sentences and positive marks and negative marks in the sentences;

and training the characteristic sequence to obtain a semantic model.

Further, the features comprise a unary feature, a binary feature and the number of negative words in the sentence, the unary feature is a feature sequence formed by each character in the sentence, and the binary feature is a feature sequence formed by two characters in front of and behind the sentence.

In a second aspect, the present invention provides an intelligent question-answering system, comprising:

the knowledge base acquisition module is used for acquiring a knowledge base, storing questions and corresponding answers in the knowledge base, calculating positive semantic information and negative semantic information of the questions according to a pre-established semantic model, and storing the positive semantic information and the negative semantic information into the knowledge base;

the keyword acquisition module is used for acquiring a user question and calculating a keyword of the user question;

the first question set module is used for calculating the similarity between the keywords of the user question and the keywords of the questions in the knowledge base, obtaining a first question set with the similarity being a preset threshold value in the knowledge base, and obtaining positive semantic information and negative semantic information corresponding to the questions in the first question set in the knowledge base;

the semantic model calculating module is used for calculating positive semantic information and negative semantic information corresponding to the user question through the semantic model according to the user question;

a second problem set module, configured to compare positive semantic information and negative semantic information corresponding to the user problem with positive semantic information and negative semantic information corresponding to the problems in the first problem set, and remove a problem that the positive semantic information and the negative semantic information corresponding to the user problem are inconsistent, so as to obtain a second problem set;

and the answer obtaining module is used for randomly obtaining a question as a matched question according to the second question set, wherein the answer corresponding to the matched question is the answer of the user question.

The technical scheme of the intelligent question-answering system is as follows: firstly, a knowledge base acquisition module is used for acquiring a knowledge base, wherein questions and corresponding answers are stored in the knowledge base, positive semantic information and negative semantic information of the questions are calculated according to a pre-established semantic model and are stored in the knowledge base; then, a user question is obtained through a keyword obtaining module, and a keyword of the user question is calculated; then, a first question set module is used for calculating the similarity between the keywords of the user question and the keywords of the questions in the knowledge base, obtaining a first question set with the similarity being a preset threshold value in the knowledge base, and obtaining positive semantic information and negative semantic information corresponding to the questions in the first question set in the knowledge base;

then, a semantic model calculation module is used for calculating positive semantic information and negative semantic information corresponding to the user question through the semantic model according to the user question; then, a second problem set module is used for comparing positive semantic information and negative semantic information corresponding to the user problem with positive semantic information and negative semantic information corresponding to the problems in the first problem set, and removing the problem that the positive semantic information and the negative semantic information corresponding to the user problem are inconsistent to obtain a second problem set; and finally, an answer obtaining module is used for randomly obtaining a question as a matched question according to the second question set, wherein the answer corresponding to the matched question is the answer of the user question.

The intelligent question-answering system calculates semantic information in questions provided by a user through a semantic model, first obtains a first question set according to keyword similarity in the questions according to the questions and the semantic information thereof stored in a knowledge base, then removes the questions which are inconsistent with the semantics of the questions provided by the user from the knowledge base to obtain a second question set, matches similar questions in the second question set, and gives accurate answers.

Further, the system also comprises a semantic model establishing module, which is used for:

Further, the semantic model building module is specifically configured to:

and training the features to obtain a semantic model.

Drawings

In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below.

Fig. 1 is a flowchart illustrating an intelligent question answering method according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating an intelligent question answering system according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and therefore are only examples, and the protection scope of the present invention is not limited thereby.

Example one

Fig. 1 is a flowchart illustrating an intelligent question answering method according to an embodiment of the present invention; an embodiment one provided intelligent question answering method shown in fig. 1 includes:

step S1, acquiring a knowledge base, storing the questions and corresponding answers in the knowledge base, calculating positive semantic information and negative semantic information of the questions according to a pre-established semantic model, and storing the positive semantic information and the negative semantic information in the knowledge base;

step S2, obtaining a user question and calculating a keyword of the user question;

there are two methods for calculating keywords of user questions, one method is:

according to the user problem, performing word segmentation and part-of-speech tagging on the user problem to obtain an appointed word;

the specified word is taken as a keyword.

Wherein, the appointed words comprise verbs, nouns and person pronouns;

the other method is as follows:

according to the user problem, obtaining word segmentation results, part of speech and dependency syntax in the user problem;

analyzing according to the word segmentation result, the part of speech and the dependency syntax to obtain an analysis result;

and according to the analysis result, extracting the characteristics and training a maximum entropy model, and labeling the keywords through the maximum entropy model.

step S4, according to the user question, positive semantic information and negative semantic information corresponding to the user question are calculated through a semantic model;

step S5, comparing the positive semantic information and the negative semantic information corresponding to the user question with the positive semantic information and the negative semantic information corresponding to the questions in the first question set, and removing the question that the positive semantic information and the negative semantic information corresponding to the user question are inconsistent to obtain a second question set;

in step S6, one question is randomly acquired as a matching question from the second question set, and the answer corresponding to the matching question is the answer to the user question.

The technical scheme of the intelligent answering method comprises the following steps: acquiring a knowledge base, storing the questions and corresponding answers in the knowledge base, calculating positive semantic information and negative semantic information of the questions according to a pre-established semantic model, and storing the positive semantic information and the negative semantic information into the knowledge base; acquiring a user question, and calculating a keyword of the user question; calculating the similarity between the keywords of the user problem and the keywords of the problems in the knowledge base, obtaining a first problem set with the similarity being a preset threshold value in the knowledge base, and obtaining positive semantic information and negative semantic information corresponding to the problems in the first problem set in the knowledge base;

according to the user problem, positive semantic information and negative semantic information corresponding to the user problem are calculated through a semantic model; comparing positive semantic information and negative semantic information corresponding to the user problem with positive semantic information and negative semantic information corresponding to the problems in the first problem set, and removing the problem that the positive semantic information and the negative semantic information corresponding to the user problem are inconsistent to obtain a second problem set; and randomly acquiring a question as a matching question according to the second question set, wherein the answer corresponding to the matching question is the answer of the user question.

It should be noted that, a large number of questions and answers corresponding to the questions are stored in the knowledge base, and only if enough questions are stored, a more accurate answer can be provided for the user.

Specifically, the establishment of the semantic model specifically includes:

and training the training corpus through the maximum entropy model to obtain a semantic model.

Specifically, training the corpus through a maximum entropy model to obtain a semantic model, specifically:

acquiring features in the training corpus, wherein the features are feature sequences obtained from sentences, positive marks and negative marks in the sentences;

and training the characteristic sequence to obtain a semantic model.

Extracting and training the semantic features in the training corpus to obtain a semantic model, wherein the maximum entropy model has the advantages that: during modeling, a tester only needs to concentrate on selecting the features without spending energy on considering how to use the features; feature selection is flexible and does not require additional independent assumptions or internal constraints; the model has strong transportability when being applied to different fields; richer information can be incorporated. Therefore, the maximum entropy model is selected to train the training corpus to obtain the semantic model.

Specifically, the features include a unary feature, a binary feature and the number of negative words in the sentence, the unary feature is a feature sequence formed by each character in the sentence, and the binary feature is a feature sequence formed by two characters before and after the sentence.

Specifically, the preset threshold is 60%. Through verification, when the preset threshold is 60%, namely the similarity is 60%, the problems in the first problem set obtained in the knowledge base are similar to the problems proposed by the user.

Fig. 2 is a schematic diagram illustrating an intelligent question answering system according to an embodiment of the present invention, and as shown in fig. 2, an intelligent question answering system 10 according to an embodiment of the present invention includes:

a knowledge base acquisition module 101, configured to acquire a knowledge base, store the questions and corresponding answers in the knowledge base, calculate positive semantic information and negative semantic information of the questions according to a pre-established semantic model, and store the positive semantic information and the negative semantic information in the knowledge base;

the keyword acquisition module 102 is used for acquiring a user question and calculating a keyword of the user question;

the specified word is taken as a keyword.

Wherein, the appointed words comprise verbs, nouns and person pronouns;

the other method is as follows:

The first question set module 103 is used for calculating the similarity between the keywords of the user question and the keywords of the questions in the knowledge base, obtaining a first question set with the similarity being a preset threshold value in the knowledge base, and obtaining positive semantic information and negative semantic information corresponding to the questions in the first question set in the knowledge base;

the semantic model calculating module 104 is used for calculating positive semantic information and negative semantic information corresponding to the user question through a semantic model according to the user question;

a second problem set module 105, configured to compare positive semantic information and negative semantic information corresponding to the user problem with positive semantic information and negative semantic information corresponding to the problems in the first problem set, and remove the problem that the positive semantic information and the negative semantic information corresponding to the user problem are inconsistent, so as to obtain a second problem set;

and the answer obtaining module 106 is configured to randomly obtain a question as a matching question according to the second question set, where an answer corresponding to the matching question is an answer of the user question.

The technical scheme of the intelligent question answering system 10 of the invention is as follows: firstly, a knowledge base acquisition module 101 is used for acquiring a knowledge base, storing questions and corresponding answers in the knowledge base, calculating positive semantic information and negative semantic information of the questions according to a pre-established semantic model, and storing the positive semantic information and the negative semantic information in the knowledge base; then, a user question is obtained through the keyword obtaining module 102, and a keyword of the user question is calculated; then, the first question integrating module 103 is used for calculating the similarity between the keywords of the user question and the keywords of the questions in the knowledge base, obtaining a first question set with the similarity being a preset threshold value in the knowledge base, and obtaining positive semantic information and negative semantic information corresponding to the questions in the first question set in the knowledge base;

then, the semantic model calculation module 104 is used for calculating positive semantic information and negative semantic information corresponding to the user question through the semantic model according to the user question; then, the second problem set module 105 is used for comparing the positive semantic information and the negative semantic information corresponding to the user problem with the positive semantic information and the negative semantic information corresponding to the problems in the first problem set, and removing the problem that the positive semantic information and the negative semantic information corresponding to the user problem are inconsistent to obtain a second problem set; and finally, the answer obtaining module 106 is configured to randomly obtain a question as a matching question according to the second question set, where an answer corresponding to the matching question is an answer of the user question.

The intelligent question-answering system 10 of the invention calculates semantic information in questions provided by users through a semantic model, firstly obtains a first question set according to keyword similarity in the questions according to the questions and the semantic information thereof stored in a knowledge base, then removes the questions which are inconsistent with the semantics of the questions provided by the users from the knowledge base to obtain a second question set, matches similar questions in the second question set, and gives accurate answers.

Specifically, the system further comprises a semantic model establishing module, configured to:

Specifically, the semantic model building module is specifically configured to:

and training the features to obtain a semantic model.

Extracting and training the semantic features in the training corpus through a maximum entropy model to obtain a semantic model, wherein the maximum entropy model has the advantages that: during modeling, a tester only needs to concentrate on selecting the features without spending energy on considering how to use the features; feature selection is flexible and does not require additional independent assumptions or internal constraints; the model has strong transportability when being applied to different fields; richer information can be incorporated. Therefore, the maximum entropy model is selected to train the training corpus to obtain the semantic model.

Example two

Based on the intelligent question-answering method and the intelligent question-answering system 10 in the first embodiment, a detailed description of the intelligent question-answering process is performed:

1. adding the question (question) and answer (answer) to the knowledge base,

for example:

2. indexing is carried out through the keywords, meanwhile, positive semantic information and negative semantic information of the query are calculated according to the semantic model M and stored in a knowledge base;

the keywords and positive and negative semantic information of the query are as follows:

3. calculating a keyword of the question query A 'i like you' provided by the user, and calculating the keyword as 'i like you';

4. according to the keywords, a query set CQS1 (first question set) with the similarity of top n (a preset threshold) is obtained from the knowledge base. The similarity between the query A and the query in the knowledge base (calculated according to the number of the same keywords/the total number of the keywords) is calculated as follows:

5. according to the keywords, obtaining similar query sets CQS1 (the similarity is more than 60%) from the knowledge base, and simultaneously obtaining positive and negative semantic information corresponding to the queries from the knowledge base; the set of CQS1, similarity (calculated from the number of identical keywords/total number of keywords), positive negative semantic information are as follows:

similar query similarity positive negative semantics

I like you 100% positive

I do not like you 75% negative

6. Calculating the question that question A 'i likes you' provided by the user is positive according to a semantic model M;

7. using positive information and negative semantic information of the query A, filtering the queries with inconsistent positive information and negative semantic information from the set CQS1 to obtain a set CQS2 (second question set); the set of CQS2 is as follows:

similar query positive negative semantics

I like you sure

8. For each question in the set CQS2, its corresponding answer is returned to the user, so the user gets the answer "i also like you".

The intelligent question answering method and the intelligent question answering system can be used for intelligently answering, and can provide more accurate answers for users according to the positive semantics and the negative semantics of the questions.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. The intelligent question answering method is characterized by comprising the following steps:

the establishment of the semantic model specifically comprises the following steps:

training the training corpus through a maximum entropy model to obtain a semantic model;

the method for calculating the keywords of the user question comprises the following steps:

extracting features and training a maximum entropy model according to the analysis result, and labeling keywords through the maximum entropy model;

2. The intelligent question-answering method according to claim 1,

training the training corpus through a maximum entropy model to obtain a semantic model, specifically:

and training the characteristic sequence to obtain a semantic model.

3. The intelligent question-answering method according to claim 2,

the characteristics comprise a unary characteristic, a binary characteristic and the number of negative words in the sentence, wherein the unary characteristic is a characteristic sequence formed by each character in the sentence, and the binary characteristic is a characteristic sequence formed by two characters in front of and behind the sentence.

4. An intelligent question-answering system, comprising:

the keyword acquisition module is used for acquiring a user question and calculating a keyword of the user question; the method for calculating the keywords of the user question comprises the following steps:

an answer obtaining module, configured to randomly obtain a question as a matching question according to the second question set, where an answer corresponding to the matching question is an answer to the user question;

the system also comprises a semantic model establishing module used for:

5. The intelligent question-answering system according to claim 4,

the semantic model establishing module is specifically configured to:

and training the features to obtain a semantic model.

6. The intelligent question-answering system according to claim 5,