CN106874441B - Intelligent question-answering method and device - Google Patents

Intelligent question-answering method and device Download PDF

Info

Publication number
CN106874441B
CN106874441B CN201710066973.9A CN201710066973A CN106874441B CN 106874441 B CN106874441 B CN 106874441B CN 201710066973 A CN201710066973 A CN 201710066973A CN 106874441 B CN106874441 B CN 106874441B
Authority
CN
China
Prior art keywords
question
credibility
keyword set
answer
answering system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710066973.9A
Other languages
Chinese (zh)
Other versions
CN106874441A (en
Inventor
金星明
李鹏
罗斌
吴永坚
李科
黄飞跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shanghai Co Ltd
Original Assignee
Tencent Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shanghai Co Ltd filed Critical Tencent Technology Shanghai Co Ltd
Priority to CN201710066973.9A priority Critical patent/CN106874441B/en
Publication of CN106874441A publication Critical patent/CN106874441A/en
Application granted granted Critical
Publication of CN106874441B publication Critical patent/CN106874441B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems

Abstract

The invention relates to an intelligent question-answering method and device. The method comprises the following steps: acquiring a to-be-solved problem; the questions to be solved are respectively sent to a question answering system based on common question solutions and a question answering system based on a knowledge base; acquiring candidate answers and corresponding credibility of the question answering system responding to the questions to be answered based on the common questions, and acquiring candidate answers and corresponding credibility of the question answering system responding to the questions to be answered based on the knowledge base; obtaining the highest credibility in the credibility, and comparing the highest credibility with a credibility threshold; and if the highest credibility is greater than or equal to the credibility threshold, taking the candidate answer corresponding to the highest credibility as the answer corresponding to the to-be-solved question. Reliability comparison is carried out on answers obtained based on two different question answering systems, and accuracy of the obtained answers of the questions to be answered is high.

Description

Intelligent question-answering method and device
Technical Field
The invention relates to the field of data processing, in particular to an intelligent question-answering method and device.
Background
Automatic answer intelligence systems are typically built based on FAQ (Frequently Asked Question, common question solutions) data accumulated in a limited field history, limited by the completeness of the FAQ dataset, the more FAQ data, the more types and numbers of questions the system can answer, and vice versa. However, in areas where data accumulation is insufficient or no data accumulation, the answer given is less accurate.
Disclosure of Invention
Based on the above, it is necessary to provide an intelligent question-answering method and device for solving the problem of inaccurate question answering of the traditional FAQ system.
An intelligent question-answering method, comprising:
acquiring a to-be-solved problem;
the questions to be solved are respectively sent to a question answering system based on common question solutions and a question answering system based on a knowledge base;
acquiring candidate answers and corresponding credibility of the question answering system responding to the questions to be answered based on the common questions, and acquiring candidate answers and corresponding credibility of the question answering system responding to the questions to be answered based on the knowledge base;
obtaining the highest credibility in the credibility, and comparing the highest credibility with a credibility threshold;
and if the highest credibility is greater than or equal to the credibility threshold, taking the candidate answer corresponding to the highest credibility as the answer corresponding to the to-be-solved question.
An intelligent question-answering device, comprising:
the problem acquisition module is used for acquiring a problem to be solved;
the sending module is used for respectively sending the questions to be answered to a question answering system based on common question answering and a question answering system based on a knowledge base;
the candidate answer acquisition module is used for acquiring candidate answers and corresponding credibility of the response of the question answering system based on the common questions to the questions to be answered, and acquiring candidate answers and corresponding credibility of the response of the question answering system based on the knowledge base to the questions to be answered;
the comparison module is used for acquiring the highest credibility in the credibility and comparing the highest credibility with a credibility threshold;
and the answer determining module is used for taking the candidate answer corresponding to the highest reliability as the answer corresponding to the to-be-solved question if the highest reliability is greater than or equal to the reliability threshold.
According to the intelligent question-answering method and device, the question to be answered is sent to the question-answering system based on common question answering and the question-answering system based on the knowledge base, the candidate answers fed back by the question-answering system based on common question answering and the corresponding credibility are obtained, the candidate answers fed back by the question-answering system based on the knowledge base and the corresponding credibility are screened out, if the highest credibility is greater than or equal to the credibility threshold, the candidate answers corresponding to the highest credibility are used as answers of the questions to be answered, the credibility comparison is carried out based on the answers obtained by the two different question-answering systems, and the accuracy of the answers of the questions to be answered is high.
Drawings
FIG. 1 is a schematic diagram of an application environment of a smart question-answering method in one embodiment;
FIG. 2 is a schematic diagram of an internal structure of a server in one embodiment;
FIG. 3 is a flow chart of a method of intelligent question answering in one embodiment;
FIG. 4 is a flowchart of acquiring candidate answers and corresponding credibility of the response of the question answering system to the question to be answered based on the common question answering in one embodiment;
FIG. 5 is a flowchart of acquiring candidate answers and corresponding credibility of the response of the knowledge base-based question-answering system to the questions to be answered in one embodiment;
FIG. 6 is a block diagram of the intelligent question answering apparatus according to one embodiment;
fig. 7 is a block diagram of the structure of the intelligent question answering apparatus in one embodiment.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
It will be understood that the terms first, second, etc. as used herein may be used to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another element. For example, a first client may be referred to as a second client, and similarly, a second client may be referred to as a first client, without departing from the scope of the invention. Both the first client and the second client are clients, but they are not the same client.
Fig. 1 is a schematic view of an application environment of the intelligent question-answering method in one embodiment. As shown in fig. 1, the application environment includes a terminal 110 and a server 120. Terminal 110 communicates with server 120 in a session. The server 120 includes a session manager, a question-answering system based on common question solutions, and a question-answering system based on knowledge base. The session manager is used for acquiring the questions to be answered, respectively sending the questions to be answered to a question answering system based on common question answering and a question answering system based on a knowledge base, obtaining answers returned by the question answering system based on common question answering, corresponding credibility, answers returned by the question answering system based on the knowledge base and corresponding credibility, screening out the highest credibility, comparing the highest credibility with a credibility threshold, and taking the answer corresponding to the highest credibility as the answer of the questions to be answered if the highest credibility is larger than or equal to the credibility threshold.
FIG. 2 is a schematic diagram of an internal structure of a server (or cloud, etc.) in one embodiment. As shown in fig. 2, the server includes a processor, a nonvolatile storage medium, an internal memory, and a network interface connected by a system bus. The non-volatile storage medium of the server is stored with an operating system, a database and an intelligent question-answering device, the database is stored with a question-answering system based on common question solutions and a question-answering system based on a knowledge base, and the intelligent question-answering device is used for realizing an intelligent question-answering method suitable for the server. The processor of the server is used to provide computing and control capabilities, supporting the operation of the entire server. The internal memory of the server provides an environment for the operation of the intelligent question-answering device in the non-volatile storage medium, and the internal memory can store computer readable instructions which, when executed by the processor, can cause the processor to execute an intelligent question-answering method. The network interface of the server is used for communicating with an external terminal through network connection, such as receiving a to-be-solved question sent by the terminal, returning an answer to the terminal, and the like. The server may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers. It will be appreciated by those skilled in the art that the structure shown in fig. 2 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the server to which the present application applies, and that a particular server may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
Fig. 3 is a flow chart of a method of intelligent question answering in one embodiment. As shown in fig. 3, an intelligent question-answering method includes:
step 302, obtain the question to be answered.
In this embodiment, the to-be-solved problem refers to a problem consulted by the user. The questions to be answered may be entered through a web portal, or through an application App, etc. The format of the questions to be answered may be at least one of voice, text, picture, etc.
If the webpage version consultation entrance is provided, a webpage session window is started, and the to-be-solved problem is input in the webpage session window.
If an application program entry is provided, an application program session window is started, and a problem to be solved is input in the application program session window.
And step 304, the questions to be answered are respectively sent to a question answering system based on common question answering and a question answering system based on a knowledge base.
In this embodiment, the question answering system based on the common question answering refers to a question answering system based on FAQ. A common problem refers to a problem that the number of times that is raised exceeds a threshold number of times. The threshold number of times may be set as desired, such as 100 times, 10 times, etc. Common question solutions refer to answers to common questions. The question answering system based on the common question answering refers to a question answering system for answering the common questions.
Knowledge base refers to a structured knowledge set within a defined domain.
Step 306, obtaining the candidate answers and the corresponding credibility of the question answering system based on the common question answering to the to-be-answered questions, and obtaining the candidate answers and the corresponding credibility of the question answering system based on the knowledge base to the to-be-answered questions.
In this embodiment, a question-answering system based on common question answering retrieves and searches questions to be answered to obtain corresponding candidate answers, and calculates the credibility of the candidate answers. The question-answering system based on the knowledge base carries out semantic analysis on the questions to be answered, matches the analyzed questions to be answered to obtain corresponding candidate answers, and calculates the credibility of the candidate answers. The confidence level of the answers of the FAQ-based question-answering system can be calculated to obtain a similarity value by adopting a method of similarity measurement between texts, the similarity value is normalized to be between 0 and 1, and 1 is the most reliable as the confidence level. The reliability of answers of the question-answering system based on the knowledge base is 1 if answers exist in the knowledge base, and 0 if no answers exist.
Step 308, obtaining the highest reliability of the credibility, and comparing the highest reliability with a credibility threshold.
And step 310, if the highest reliability is greater than or equal to the reliability threshold, using the candidate answer corresponding to the highest reliability as the answer corresponding to the to-be-solved question.
In this embodiment, the confidence level of the candidate answer responding to the question to be answered by the question answering system based on the common question answering is compared with the confidence level of the candidate answer responding to the question to be answered by the question answering system based on the knowledge base, the highest confidence level is obtained, the highest confidence level is compared with a confidence level threshold value, and if the highest confidence level is greater than or equal to the confidence level threshold value, the candidate answer corresponding to the highest confidence level threshold value is used as the answer of the question to be answered.
The confidence threshold refers to the minimum value that the confidence needs to meet. The reliability is greater than or equal to the reliability threshold, the answer is trusted, otherwise the answer is not trusted.
According to the intelligent question-answering method, the question-answering system based on common question answering and the question-answering system based on the knowledge base are sent to the question-answering system based on common question answering, the candidate answers fed back by the question-answering system based on the common question answering and the corresponding credibility are obtained, the candidate answers fed back by the question-answering system based on the knowledge base and the corresponding credibility are screened out, if the highest credibility is greater than or equal to the credibility threshold, the candidate answers corresponding to the highest credibility are used as answers of the questions to be answered, the credibility comparison is carried out based on the answers obtained by the two different question-answering systems, and the accuracy of the answers of the questions to be answered is high. In addition, for complex questions, corresponding answers can be obtained through searching and statistics of a question-answering system based on common question solutions, so that the corresponding answers can be quickly searched, and labor is saved; for simple questions, more accurate answers can be obtained through the accuracy of a knowledge base-based question-answering system. Furthermore, the intelligent question-answering method effectively relieves the problem of insufficient FAQ problem sets in a plurality of limited fields, on the other hand, the implementation complexity of a question-answering system based on a knowledge base is effectively reduced, and the cost of constructing the question-answering system of the knowledge base for complex problems is reduced.
In one embodiment, the intelligent question answering method further includes: and if the highest credibility is smaller than the credibility threshold, acquiring a manual answer, and taking the manual answer as an answer corresponding to the to-be-solved question.
In this embodiment, the highest confidence level is less than the confidence level threshold, which indicates that none of the candidate answers is authentic, and manual answer needs to be prompted. And acquiring the manual answer as an answer of the to-be-solved question to ensure that the to-be-solved question has a corresponding answer.
In one embodiment, the intelligent question answering method further includes: and updating the questions to be answered and the corresponding manual answers into the question answering system based on the common question answering.
In this embodiment, the questions to be answered and the corresponding manual answers are updated into the question answering system based on the common question answering, so that the questions to be answered can be answered through the question answering system of the common question answering when the same or similar questions to be answered are encountered next time, and the labor cost is reduced.
Fig. 4 is a flowchart of acquiring candidate answers and corresponding credibility of the response of the question answering system to the question to be answered based on the common question answering in one embodiment. As shown in fig. 4, in one embodiment, the obtaining the candidate answers and the corresponding credibility of the question answering system based on the common question answering to the to-be-answered question response includes:
step 402, word segmentation is performed on the to-be-solved problem, keywords are extracted, and the keywords are expanded to form a first keyword set.
In this embodiment, the to-be-solved problem is segmented, keywords are extracted, and the keywords are expanded based on the limited domain synonym library to obtain a first keyword set. For example, "what model is" mauted "for the question to be answered," what is "mauted", "has", "what", "model? The keyword extraction results are "maiteng" and "model". And expanding keywords based on a synonym library in the limited field, expanding the model into a model, and enabling a first keyword set of the to-be-solved problem to be represented [ Maiteng model|model ].
And step 404, segmenting each question in the question-answering system based on the common question solutions, extracting keywords, and generating a second keyword set corresponding to each question.
In this embodiment, each question in the question-answering system based on the common question-answering is divided into words, and keywords of the question are extracted to obtain a second keyword set corresponding to each question. For example, a question in the question answering system of FAQ "which models of maten? "the word segmentation results are" maiteng "," have "," which "," model? The keyword extraction results are "maiteng" and "vehicle type", and the obtained second keyword set is [ maiteng vehicle type ].
Step 406, obtaining similarity values of the first keyword set and each second keyword set.
In this embodiment, the similarity value between the first keyword set and the second keyword set may be calculated using the jaccard distance, the edit distance, the jaccard distance, the word frequency-reverse file frequency weight, and the like.
Step 408, selecting an answer corresponding to the second keyword set with the maximum similarity value of the first keyword set as a candidate answer of the question answering system responding to the question to be answered based on the common question answering, and obtaining the credibility of the candidate answer.
In this embodiment, the second keyword set with the largest similarity value indicates that the second keyword set is the most similar to the first keyword set. The similarity value may be used as the confidence level of the candidate answer.
The method comprises the steps of extracting keywords by segmentation of questions to be answered, generating a first keyword set by expansion, generating a corresponding second keyword set by extracting keywords by segmentation of questions in a question answering system based on common question answering, calculating similarity values of the first keyword set and the second keyword set, selecting answers corresponding to the second keyword set with the largest similarity value as candidate answers, and taking the similarity values as credibility of the candidate answers, so that searching is simple and convenient.
In one embodiment, the obtaining the similarity value between the first keyword set and each of the second keyword sets includes: obtaining the Jacquard distance between the first keyword set and each second keyword set, and obtaining a similarity value between the first keyword set and each second keyword set according to the Jacquard distance, wherein the similarity value is in direct proportion to the Jacquard distance.
In this embodiment, the Jacquard distance is used to measure the similarity between two sets, and the number of elements in the intersection of the two sets divided by the number of elements in the union of the two sets may be used. The calculation formula is as follows:
wherein A and B are the sets and J (A, B) is the Jacquard distance.
For example, the first keyword set is [ maiteng model |model ], the second keyword set is [ maiteng model ], and there may be a plurality of expanded synonyms, which are also calculated as a whole. The number of elements in the intersection of the first keyword set and the second keyword set is 2, and the number of elements in the combination of the first keyword set and the second keyword set is 2, then || [ maiteng model ]/| [ maiteng model ] =2/2.
The Jacquard distance between the first set of keywords and the second set of keywords may be used as a similarity value between the first set of keywords and the second set of keywords. Alternatively, the value of similarity between the first set of keywords and the second set of keywords may be obtained by multiplying the jekcard distance between the first set of keywords and the second set of keywords by a positive number. The greater the Jaccard distance, the greater the similarity value.
In one embodiment, the obtaining the similarity value between the first keyword set and each of the second keyword sets includes: obtaining the Jacquard distance between the first keyword set and each second keyword set, obtaining the word frequency-reverse file frequency weight of each second keyword set, and obtaining the similarity value between the first keyword set and each second keyword set according to the Jacquard distance and the word frequency-reverse file frequency weight, wherein the similarity value is in direct proportion to the product of the Jacquard distance and the word frequency-reverse file frequency weight.
In this embodiment, the term frequency-reverse file frequency weight refers to a TF-IDF (term frequency-inverse document frequency) weight. The Term Frequency (TF) refers to the frequency with which a given word appears in the document. The word frequency of a word may be obtained by dividing the number of occurrences of the word in the document by the sum of the number of occurrences of all words in the document. Reverse document frequency (inverse document frequency, IDF) is a measure of the general importance of a word. The IDF of a particular word may be obtained by dividing the total number of documents by the number of documents containing the word and then taking the logarithm of the quotient obtained. And calculating the product of TF and IDF to obtain the word frequency-reverse file frequency weight.
Assuming that there are N documents in the document set, f (i, j) is the frequency (number of times) that term i occurs in document j, then the term frequency of term i in document j can be defined as:
the formula is the result of normalization of term i in document j, where normalization is calculated by dividing f (i, j) by the frequency of the most frequently occurring term in the same document, and therefore, the TF values are all less than or equal to 1.
Assuming that there are N documents in the document set, if term i occurs in N documents, then the IDF may be defined as:
however, when term i does not appear in any document, the above equation will appear with zero denominator, so IDF is generally defined as:
based on the definition of TF and IDF, the score of term i in document j can be defined as TF (i, j) IDF (i), that is:
and the similarity value of the first keyword set and the second keyword set is obtained through the Jacaded distance and the word frequency-reverse file frequency weight, so that the obtained candidate answers are more accurate.
In one embodiment, the obtaining the similarity value between the first keyword set and each of the second keyword sets includes: and acquiring the editing distance between the first keyword set and each second keyword set, and acquiring a similarity value of the first keyword set and each second keyword set according to the editing distance, wherein the similarity value is inversely proportional to the editing distance.
In this embodiment, the edit distance refers to a Levenshtein distance, and refers to the minimum number of editing operations required to change from one string to another string. The edit distance is obtained using (longest string length-number of editing operations)/longest string length. The smaller the editing distance is selected, the larger the similarity value is, and the higher the score is.
Fig. 5 is a flowchart of acquiring candidate answers and corresponding credibility of the response of the knowledge base-based question-answering system to the questions to be answered in one embodiment. As shown in fig. 5, in one embodiment, the obtaining the candidate answers and the corresponding credibility of the response of the question to be answered by the question answering system based on the knowledge base includes:
and 502, carrying out semantic analysis on the to-be-solved problem to generate a question vector.
In this embodiment, a plurality of question templates may be preset, and semantic analysis is performed on the questions to be solved using the plurality of question templates to generate question vectors. Question templates, for example, [ model ]? Matching problems such as "what are the appearance colors of matcing? What are "[ model ]? Matching problems such as "what is the minimum ground clearance of matcing? ". The problem template may be represented using a regular expression. The problem can be matched with the template by adopting a regular matching mode.
In one embodiment, NLP (Natural Language Processing ) may be used to semantically analyze the question to be solved to obtain a corresponding question vector.
Step 504, converting the question vector into a query sentence.
In this embodiment, the question vector is converted into the query statement sparQL. Based on the vector statements, a query statement is generated using a slot fill method. The sparQL statement is generated as: "Selectvalue { < http:// auto-home/series/Maiteng > < http:// auto-home/property/model >? value }).
Step 506, searching a query result corresponding to the query statement from a knowledge base-based question-answering system according to the query statement.
In this embodiment, the query statement is in accordance with a knowledge base based question-answering system. According to the query sentences, the query sentences can be searched from a question-answering system based on a knowledge base, and if the query sentences exist, the corresponding query results are searched. If not, no result is returned.
And step 508, taking the query result as a candidate answer of the response of the question answering system based on the knowledge base to the question to be answered, and obtaining the credibility corresponding to the candidate answer.
If the candidate answer is found, the reliability corresponding to the candidate answer is 1, and if the candidate answer is not found, the question-answering system based on the knowledge base does not have the candidate answer to the question to be answered.
Because the question answering system based on the knowledge base has high answer accuracy, the question answering system based on the knowledge base searches the corresponding answer, thereby improving the accuracy of the answer corresponding to the to-be-answered question, and for complex questions, the question answering system based on the knowledge base can not be established, so that the cost is saved.
It should be noted that the above-mentioned intelligent question-answering method can be applied To the framework of intelligent customer service systems in all limited fields, and can provide services in BS structure (Browser/Server, browser/Server mode) or can add Speech recognition and TTS (Text To Speech) at input and output terminals To provide services in communication networks such as telephones. Has strong portability and can be quickly migrated from one limited field to another limited field.
Fig. 6 is a block diagram of the structure of the intelligent question answering apparatus in one embodiment. As shown in fig. 6, an intelligent question-answering apparatus 600, which is operated on a server, includes a question acquisition module 602, a transmission module 604, a candidate answer acquisition module 606, a comparison module 608, and an answer determination module 610. Wherein:
the problem obtaining module 602 is configured to obtain a problem to be solved.
In this embodiment, the to-be-solved problem refers to a problem consulted by the user. The questions to be answered may be entered through a web portal, or through an application App, etc. The format of the questions to be answered may be at least one of voice, text, picture, etc.
The sending module 604 is configured to send the questions to be answered to a question answering system based on common question answering and a question answering system based on knowledge base, respectively.
The candidate answer obtaining module 606 is configured to obtain a candidate answer and a corresponding confidence level of the question answering system based on the common question answering to the to-be-answered question response, and obtain a candidate answer and a corresponding confidence level of the question answering system based on the knowledge base to the to-be-answered question response.
In this embodiment, a question-answering system based on common question answering retrieves and searches questions to be answered to obtain corresponding candidate answers, and calculates the credibility of the candidate answers. The question-answering system based on the knowledge base carries out semantic analysis on the questions to be answered, matches the analyzed questions to be answered to obtain corresponding candidate answers, and calculates the credibility of the candidate answers. The confidence level of the answers of the FAQ-based question-answering system can be calculated to obtain a similarity value by adopting a method of similarity measurement between texts, the similarity value is normalized to be between 0 and 1, and 1 is the most reliable as the confidence level. The reliability of answers of the question-answering system based on the knowledge base is 1 if answers exist in the knowledge base, and 0 if no answers exist.
The comparing module 608 is configured to obtain a highest reliability of the credibilities, and compare the highest reliability with a credibility threshold.
The answer determining module 610 is configured to, if the highest confidence level is greater than or equal to the confidence level threshold, use the candidate answer corresponding to the highest confidence level as the answer corresponding to the to-be-solved question.
According to the intelligent question-answering device, the question-answering system based on common question answering and the question-answering system based on the knowledge base are sent to obtain the candidate answers fed back by the question-answering system based on the common question answering and the corresponding credibility, the candidate answers fed back by the question-answering system based on the knowledge base and the corresponding credibility are screened out, if the highest credibility is greater than or equal to the credibility threshold, the candidate answers corresponding to the highest credibility are used as the answers of the questions to be answered, the credibility comparison is carried out based on the answers obtained by the two different question-answering systems, and the accuracy of the obtained answers of the questions to be answered is high. In addition, for complex questions, corresponding answers can be obtained through searching and statistics of a question-answering system based on common question solutions, so that the corresponding answers can be quickly searched, and labor is saved; for simple questions, more accurate answers can be obtained through the accuracy of a knowledge base-based question-answering system. Furthermore, the intelligent question-answering method effectively relieves the problem of insufficient FAQ problem sets in a plurality of limited fields, on the other hand, the implementation complexity of a question-answering system based on a knowledge base is effectively reduced, and the cost of constructing the question-answering system of the knowledge base for complex problems is reduced.
In one embodiment, the answer determining module 610 is further configured to obtain a manual answer if the highest confidence level is less than the confidence level threshold, and use the manual answer as the answer corresponding to the to-be-solved question.
Fig. 7 is a block diagram of the structure of the intelligent question answering apparatus in one embodiment. As shown in fig. 7, an intelligent question-answering apparatus 600, which runs on a server, includes an update module 612 in addition to a question acquisition module 602, a transmission module 604, a candidate answer acquisition module 606, a comparison module 608, and an answer determination module 610. Wherein:
the updating module 612 is configured to update the to-be-answered questions and the corresponding manual answers to the question answering system based on the common question solutions.
In this embodiment, the questions to be answered and the corresponding manual answers are updated into the question answering system based on the common question answering, so that the questions to be answered can be answered through the question answering system of the common question answering when the same or similar questions to be answered are encountered next time, and the labor cost is reduced.
In one embodiment, the candidate answer obtaining module 606 is further configured to segment the to-be-answered question, extract keywords, and expand the keywords to form a first keyword set; word segmentation is carried out on each question in the question-answering system based on common question answering, keywords are extracted, and a second keyword set corresponding to each question is generated; obtaining similarity values of the first keyword set and each second keyword set; and selecting an answer corresponding to a second keyword set with the maximum similarity value of the first keyword set as a candidate answer of the question answering system responding to the to-be-answered question based on the common question answering, and acquiring the credibility of the candidate answer.
In one embodiment, the candidate answer obtaining module 606 is further configured to obtain a jaccard distance between the first keyword set and each of the second keyword sets, and obtain a similarity value between the first keyword set and each of the second keyword sets according to the jaccard distance, where the similarity value is proportional to the jaccard distance.
In one embodiment, the candidate answer obtaining module 606 is further configured to obtain a jaccard distance between the first keyword set and each of the second keyword sets, and obtain a word frequency-reverse file frequency weight of each of the second keyword sets, and obtain a similarity value between the first keyword set and each of the second keyword sets according to the jaccard distance and the word frequency-reverse file frequency weight, where the similarity value is proportional to a product of the jaccard distance and the word frequency-reverse file frequency weight.
In one embodiment, the candidate answer obtaining module 606 is further configured to obtain an edit distance between the first keyword set and each of the second keyword sets, and obtain a similarity value between the first keyword set and each of the second keyword sets according to the edit distance, where the similarity value is inversely proportional to the edit distance.
In one embodiment, the candidate answer obtaining module 606 is further configured to perform semantic analysis on the to-be-solved question to generate a question vector; converting the question vectors into query sentences; searching a query result corresponding to the query statement from a knowledge base-based question-answer system according to the query statement; and taking the query result as a candidate answer of the response of the question to be solved by the question answering system based on the knowledge base, and acquiring the credibility corresponding to the candidate answer.
Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), or the like.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (10)

1. An intelligent question-answering method, comprising:
acquiring a to-be-solved problem;
the questions to be solved are respectively sent to a question answering system based on common question solutions and a question answering system based on a knowledge base; the knowledge base includes structured knowledge within a defined domain;
segmenting the to-be-solved problem, extracting keywords, and expanding the keywords based on a limited field synonym library to form a first keyword set;
word segmentation is carried out on each question in the question-answering system based on common question answering, keywords are extracted, and a second keyword set corresponding to each question is generated;
obtaining similarity values of the first keyword set and each second keyword set; and
selecting an answer corresponding to a second keyword set with the maximum similarity value of the first keyword set as a candidate answer of the response of the question answering system to the to-be-answered question based on the common question answering, and taking the maximum similarity as the credibility of the candidate answer;
acquiring candidate answers of the question answering system based on the knowledge base to the response of the questions to be answered and corresponding credibility;
obtaining the highest credibility in the credibility, and comparing the highest credibility with a credibility threshold;
if the highest credibility is larger than or equal to the credibility threshold, taking the candidate answer corresponding to the highest credibility as the answer corresponding to the to-be-solved question;
if the highest credibility is smaller than the credibility threshold, acquiring a manual answer, and taking the manual answer as an answer corresponding to the to-be-solved question;
and updating the questions to be answered and the corresponding manual answers into the question answering system based on the common question answering.
2. The method of claim 1, wherein said obtaining similarity values for said first set of keywords and each of said second set of keywords comprises:
obtaining the Jacquard distance between the first keyword set and each second keyword set, and obtaining a similarity value between the first keyword set and each second keyword set according to the Jacquard distance, wherein the similarity value is in direct proportion to the Jacquard distance.
3. The method of claim 1, wherein said obtaining similarity values for said first set of keywords and each of said second set of keywords comprises:
obtaining a Jacquard distance between the first keyword set and each second keyword set, obtaining word frequency-reverse file frequency weights of each second keyword set, and obtaining similarity values of the first keyword set and each second keyword set according to the Jacquard distance and the word frequency-reverse file frequency weights, wherein the similarity values are in direct proportion to products of the Jacquard distance and the word frequency-reverse file frequency weights;
or, acquiring the editing distance between the first keyword set and each second keyword set, and obtaining the similarity value of the first keyword set and each second keyword set according to the editing distance, wherein the similarity value is inversely proportional to the editing distance.
4. A method according to any one of claims 1 to 3, wherein said obtaining candidate answers and corresponding trustworthiness of the knowledge base based question response by the question answering system comprises:
carrying out semantic analysis on the to-be-solved problem to generate a question vector;
converting the question vectors into query sentences;
searching a query result corresponding to the query statement from a knowledge base-based question-answer system according to the query statement; and
and taking the query result as a candidate answer of the response of the question to be solved by the question answering system based on the knowledge base, and acquiring the credibility corresponding to the candidate answer.
5. An intelligent question-answering device, comprising:
the problem acquisition module is used for acquiring a problem to be solved;
the sending module is used for respectively sending the questions to be answered to a question answering system based on common question answering and a question answering system based on a knowledge base; the knowledge base includes structured knowledge within a defined domain;
the candidate answer acquisition module is used for segmenting the to-be-solved problem, extracting keywords, and expanding the keywords based on a limited field synonym library to form a first keyword set; word segmentation is carried out on each question in the question-answering system based on common question answering, keywords are extracted, and a second keyword set corresponding to each question is generated; obtaining similarity values of the first keyword set and each second keyword set; and
selecting an answer corresponding to a second keyword set with the maximum similarity value of the first keyword set as a candidate answer of the response of the question answering system to the to-be-answered question based on the common question answering, and taking the maximum similarity as the credibility of the candidate answer;
the candidate answer acquisition module is further used for acquiring candidate answers and corresponding credibility of the response of the question answering system to the questions to be answered based on the knowledge base;
the comparison module is used for acquiring the highest credibility in the credibility and comparing the highest credibility with a credibility threshold;
the answer determining module is used for taking the candidate answer corresponding to the highest credibility as the answer corresponding to the to-be-solved question if the highest credibility is greater than or equal to the credibility threshold; if the highest credibility is smaller than the credibility threshold, acquiring a manual answer, and taking the manual answer as an answer corresponding to the to-be-solved question;
and the updating module is used for updating the to-be-solved questions and the corresponding manual answers into the question answering system based on the common question solutions.
6. The apparatus of claim 5, wherein the candidate answer acquisition module is further configured to acquire a jaccard distance between the first set of keywords and each of the second set of keywords, and obtain a similarity value between the first set of keywords and each of the second set of keywords based on the jaccard distance, the similarity value being proportional to the jaccard distance.
7. The apparatus of claim 5, wherein the candidate answer acquisition module is further configured to acquire a jaccard distance between the first keyword set and each of the second keyword sets, and acquire a word frequency-reverse document frequency weight for each of the second keyword sets, and obtain a similarity value between the first keyword set and each of the second keyword sets according to the jaccard distance and the word frequency-reverse document frequency weight, the similarity value being proportional to a product of the jaccard distance and the word frequency-reverse document frequency weight;
or, acquiring the editing distance between the first keyword set and each second keyword set, and obtaining the similarity value of the first keyword set and each second keyword set according to the editing distance, wherein the similarity value is inversely proportional to the editing distance.
8. The apparatus according to any one of claims 5 to 7, wherein the candidate answer acquisition module is further configured to perform semantic analysis on the question to be answered to generate a question vector; converting the question vectors into query sentences; searching a query result corresponding to the query statement from a knowledge base-based question-answer system according to the query statement; and taking the query result as a candidate answer of the response of the question to be solved by the question answering system based on the knowledge base, and acquiring the credibility corresponding to the candidate answer.
9. A terminal or server comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 4.
10. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method of any one of claims 1 to 4.
CN201710066973.9A 2017-02-07 2017-02-07 Intelligent question-answering method and device Active CN106874441B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710066973.9A CN106874441B (en) 2017-02-07 2017-02-07 Intelligent question-answering method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710066973.9A CN106874441B (en) 2017-02-07 2017-02-07 Intelligent question-answering method and device

Publications (2)

Publication Number Publication Date
CN106874441A CN106874441A (en) 2017-06-20
CN106874441B true CN106874441B (en) 2024-03-05

Family

ID=59167443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710066973.9A Active CN106874441B (en) 2017-02-07 2017-02-07 Intelligent question-answering method and device

Country Status (1)

Country Link
CN (1) CN106874441B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019000240A1 (en) * 2017-06-27 2019-01-03 华为技术有限公司 Question answering system and question answering method
CN110069607B (en) * 2017-12-14 2024-03-05 株式会社日立制作所 Method, apparatus, electronic device, and computer-readable storage medium for customer service
CN110019838A (en) * 2017-12-25 2019-07-16 上海智臻智能网络科技股份有限公司 Intelligent Answer System and intelligent terminal
CN108170780A (en) * 2017-12-26 2018-06-15 北京邦邦共赢网络科技有限公司 A kind of the problem of self-service question and answer matching process and device
CN108959421B (en) * 2018-06-08 2021-04-13 腾讯科技(深圳)有限公司 Candidate reply evaluation device, query reply device, method thereof, and storage medium
CN108932323A (en) * 2018-06-29 2018-12-04 北京百度网讯科技有限公司 Determination method, apparatus, server and the storage medium of entity answer
CN109063035B (en) * 2018-07-16 2021-11-09 哈尔滨工业大学 Man-machine multi-turn dialogue method for trip field
CN109376298B (en) * 2018-09-14 2022-01-25 阿里巴巴(中国)有限公司 Data processing method and device, terminal equipment and computer storage medium
CN110209768B (en) * 2019-05-31 2021-08-10 中国联合网络通信集团有限公司 Question processing method and device for automatic question answering
CN110263051A (en) * 2019-06-11 2019-09-20 出门问问信息科技有限公司 Question and answer for question answering system are to update method, device, equipment and storage medium
CN111339254A (en) * 2020-02-26 2020-06-26 常州市贝叶斯智能科技有限公司 Intelligent voice processing method and device, intelligent equipment and medium
CN111782794A (en) * 2020-05-29 2020-10-16 北京沃东天骏信息技术有限公司 Question-answer response method and device
CN111984703A (en) * 2020-08-19 2020-11-24 中国银行股份有限公司 Method and device for positioning problems in knowledge base
CN112800177B (en) * 2020-12-31 2021-09-07 北京智源人工智能研究院 FAQ knowledge base automatic generation method and device based on complex data types
CN114116994A (en) * 2021-06-30 2022-03-01 同济人工智能研究院(苏州)有限公司 Welcome robot dialogue method
CN114238611B (en) * 2021-12-23 2023-05-16 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for outputting information
CN116860951B (en) * 2023-09-04 2023-11-14 贵州中昂科技有限公司 Information consultation service management method and management system based on artificial intelligence

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101695082A (en) * 2009-09-30 2010-04-14 北京航空航天大学 Service organization method based on relation mining and device thereof
CN102262634A (en) * 2010-05-24 2011-11-30 北京大学深圳研究生院 Automatic questioning and answering method and system
CN102760134A (en) * 2011-04-28 2012-10-31 北京百度网讯科技有限公司 Method and device for mining synonyms
CN103177108A (en) * 2013-03-26 2013-06-26 中山大学 Medical treatment automatic question answering method based on internet and system thereof
CN104216913A (en) * 2013-06-04 2014-12-17 Sap欧洲公司 Problem answering frame
CN104573028A (en) * 2015-01-14 2015-04-29 百度在线网络技术(北京)有限公司 Intelligent question-answer implementing method and system
CN104850539A (en) * 2015-05-28 2015-08-19 宁波薄言信息技术有限公司 Natural language understanding method and travel question-answering system based on same
CN105022827A (en) * 2015-07-23 2015-11-04 合肥工业大学 Field subject-oriented Web news dynamic aggregation method
CN105068661A (en) * 2015-09-07 2015-11-18 百度在线网络技术(北京)有限公司 Man-machine interaction method and system based on artificial intelligence
CN105760523A (en) * 2016-02-29 2016-07-13 百度在线网络技术(北京)有限公司 Information push method and information push device
CN105843897A (en) * 2016-03-23 2016-08-10 青岛海尔软件有限公司 Vertical domain-oriented intelligent question and answer system
CN105912645A (en) * 2016-04-08 2016-08-31 上海智臻智能网络科技股份有限公司 Intelligent question and answer method and apparatus
KR101662450B1 (en) * 2015-05-29 2016-10-05 포항공과대학교 산학협력단 Multi-source hybrid question answering method and system thereof
CN106055628A (en) * 2016-05-27 2016-10-26 大连楼兰科技股份有限公司 Intelligent communication method, device, system and application for automobile maintenance direction

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006252382A (en) * 2005-03-14 2006-09-21 Fuji Xerox Co Ltd Question answering system, data retrieval method and computer program
US20140006012A1 (en) * 2012-07-02 2014-01-02 Microsoft Corporation Learning-Based Processing of Natural Language Questions
US9471689B2 (en) * 2014-05-29 2016-10-18 International Business Machines Corporation Managing documents in question answering systems
US9690862B2 (en) * 2014-10-18 2017-06-27 International Business Machines Corporation Realtime ingestion via multi-corpus knowledge base with weighting

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101695082A (en) * 2009-09-30 2010-04-14 北京航空航天大学 Service organization method based on relation mining and device thereof
CN102262634A (en) * 2010-05-24 2011-11-30 北京大学深圳研究生院 Automatic questioning and answering method and system
CN102760134A (en) * 2011-04-28 2012-10-31 北京百度网讯科技有限公司 Method and device for mining synonyms
CN103177108A (en) * 2013-03-26 2013-06-26 中山大学 Medical treatment automatic question answering method based on internet and system thereof
CN104216913A (en) * 2013-06-04 2014-12-17 Sap欧洲公司 Problem answering frame
CN104573028A (en) * 2015-01-14 2015-04-29 百度在线网络技术(北京)有限公司 Intelligent question-answer implementing method and system
CN104850539A (en) * 2015-05-28 2015-08-19 宁波薄言信息技术有限公司 Natural language understanding method and travel question-answering system based on same
KR101662450B1 (en) * 2015-05-29 2016-10-05 포항공과대학교 산학협력단 Multi-source hybrid question answering method and system thereof
CN105022827A (en) * 2015-07-23 2015-11-04 合肥工业大学 Field subject-oriented Web news dynamic aggregation method
CN105068661A (en) * 2015-09-07 2015-11-18 百度在线网络技术(北京)有限公司 Man-machine interaction method and system based on artificial intelligence
CN105760523A (en) * 2016-02-29 2016-07-13 百度在线网络技术(北京)有限公司 Information push method and information push device
CN105843897A (en) * 2016-03-23 2016-08-10 青岛海尔软件有限公司 Vertical domain-oriented intelligent question and answer system
CN105912645A (en) * 2016-04-08 2016-08-31 上海智臻智能网络科技股份有限公司 Intelligent question and answer method and apparatus
CN106055628A (en) * 2016-05-27 2016-10-26 大连楼兰科技股份有限公司 Intelligent communication method, device, system and application for automobile maintenance direction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
程贤禄 等.《北京市农林科学院新品种、新技术、新产品科技成果汇编》.中国农业大学出版社,2016,(第1版),335-336. *
董守斌等著.《网络信息检索》.西安电子科技大学出版社,2010,341-344. *

Also Published As

Publication number Publication date
CN106874441A (en) 2017-06-20

Similar Documents

Publication Publication Date Title
CN106874441B (en) Intelligent question-answering method and device
US11232140B2 (en) Method and apparatus for processing information
US10430255B2 (en) Application program interface mashup generation
US9875296B2 (en) Information extraction from question and answer websites
CN110929038B (en) Knowledge graph-based entity linking method, device, equipment and storage medium
CN112667794A (en) Intelligent question-answer matching method and system based on twin network BERT model
CN106815252A (en) A kind of searching method and equipment
US11017002B2 (en) Description matching for application program interface mashup generation
US20180210897A1 (en) Model generation method, word weighting method, device, apparatus, and computer storage medium
CN115328756A (en) Test case generation method, device and equipment
US20150348061A1 (en) Crm account to company mapping
CN109522397B (en) Information processing method and device
US11379527B2 (en) Sibling search queries
US20180285742A1 (en) Learning method, learning apparatus, and storage medium
CN110795942B (en) Keyword determination method and device based on semantic recognition and storage medium
CN111324705B (en) System and method for adaptively adjusting associated search terms
CN114116997A (en) Knowledge question answering method, knowledge question answering device, electronic equipment and storage medium
CN110427626B (en) Keyword extraction method and device
CN116431912A (en) User portrait pushing method and device
CN116049370A (en) Information query method and training method and device of information generation model
CN114391142A (en) Parsing queries using structured and unstructured data
US20200192922A1 (en) System and method for adaptively adjusting related search words
CN116501841B (en) Fuzzy query method, system and storage medium for data model
CN114925185B (en) Interaction method, model training method, device, equipment and medium
CN116610782B (en) Text retrieval method, device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant