WO2016112558A1 - 智能交互系统中的问题匹配方法和系统 - Google Patents

智能交互系统中的问题匹配方法和系统 Download PDF

Info

Publication number
WO2016112558A1
WO2016112558A1 PCT/CN2015/071314 CN2015071314W WO2016112558A1 WO 2016112558 A1 WO2016112558 A1 WO 2016112558A1 CN 2015071314 W CN2015071314 W CN 2015071314W WO 2016112558 A1 WO2016112558 A1 WO 2016112558A1
Authority
WO
WIPO (PCT)
Prior art keywords
question
similarity
candidate
matching
answer
Prior art date
Application number
PCT/CN2015/071314
Other languages
English (en)
French (fr)
Inventor
张贯京
陈兴明
葛新科
张少鹏
方静芳
Original Assignee
深圳市前海安测信息技术有限公司
深圳市易特科信息技术有限公司
深圳市贝沃德克生物技术研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市前海安测信息技术有限公司, 深圳市易特科信息技术有限公司, 深圳市贝沃德克生物技术研究院有限公司 filed Critical 深圳市前海安测信息技术有限公司
Publication of WO2016112558A1 publication Critical patent/WO2016112558A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures

Definitions

  • the present invention relates to the field of information technology, and in particular, to a problem matching method and system in an intelligent interactive system.
  • the most important step in the intelligent interaction system is to match the similarity between the questions raised by the user and each question in the question and answer database (hereinafter referred to as the question and answer library), and then find the problem with the highest similarity and output the corresponding answer.
  • Most of the existing methods first calculate the similarity of words, and then combine the characteristics of words, weights, and dependencies to finally calculate the similarity of sentences.
  • the existing technology in the intelligent interaction system in a specific field the problem matching result is not ideal.
  • the main object of the present invention is to overcome the shortcomings of the prior art that the problem matching result in the intelligent interactive system is not ideal.
  • the present invention provides a method for matching a problem in an intelligent interactive system, the method comprising the following steps:
  • the answer corresponding to the question matching the question question is output according to a preset rule.
  • the step of outputting an answer corresponding to the question matching the question question according to the preset rule according to the calculated similarity includes:
  • the answer corresponding to the candidate question with the greatest similarity to the question question is output;
  • the candidate problem with the similarity of the questioning problem in the preset range is output for the user to select, and the answer corresponding to the candidate question selected by the user is output.
  • the question question is added to the question and answer library, and a prompt with a null match is output.
  • the present invention also provides a problem matching system in an intelligent interaction system, the system comprising:
  • a problem obtaining module for obtaining a question question input by a user
  • a problem pre-processing module configured to perform word segmentation, de-stop words, and query extension processing on the question question, to obtain an index word of the question question;
  • a candidate question matching module configured to match, from the question and answer library, a candidate question related to the index word of the question question according to a preset index file;
  • a similar problem matching module configured to calculate a similarity between the question problem and the similar problem set problem corresponding to the candidate question
  • a result output module configured to output an answer corresponding to the question that matches the question question according to the calculated rule according to the calculated similarity.
  • the technical solution of the present invention adopts the above technical solution, and the technical effect of the invention is that the word segmentation, the de-stop word and the query expansion process are performed on the user questioning problem, thereby avoiding the influence of the less relevant word in the complex sentence on the problem matching result; a preset index file, first matching candidate questions related to the question question from the question and answer library, reducing the calculation amount of the problem matching; and calculating the similarity degree of the similar problem concentration problem corresponding to the question problem and the candidate problem To avoid the same problem, because there are many different ways to match the problem, and improve the problem matching result; finally, according to the calculated similarity, the problem matching the question is output according to the preset rule. The corresponding answer.
  • the entire solution improves the accuracy of problem matching results in intelligent interactive systems.
  • FIG. 1 is a schematic flowchart of a problem matching method in an intelligent interaction system according to a first preferred embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a problem matching method in an intelligent interaction system according to a second preferred embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a problem matching method in an intelligent interaction system according to a third preferred embodiment of the present invention.
  • step S50 is a detailed flowchart of step S50 shown in FIG. 1 in the intelligent interaction system according to the fourth preferred embodiment of the present invention.
  • FIG. 5 is a detailed flowchart of step S50 shown in FIG. 1 in the intelligent interaction system according to the fifth preferred embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a problem matching system in an intelligent interaction system according to a first preferred embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a problem matching system in an intelligent interaction system according to a second preferred embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a problem matching system in an intelligent interaction system according to a third preferred embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a result output module shown in FIG. 6 in an intelligent interaction system according to a fourth preferred embodiment of the present invention.
  • embodiments of the present invention can be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of full hardware, complete software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.
  • the main object of the present invention is to overcome the shortcomings of the prior art that the problem matching result in the intelligent interactive system is not ideal.
  • FIG. 1 is a schematic flowchart of a problem matching method in an intelligent interaction system according to a first preferred embodiment of the present invention
  • the method includes the following steps:
  • the user refers to obtaining a question question input by a user from an input interface of the smart interaction system.
  • the user inputs a question question on the client, which may be an audio form.
  • the intelligent interactive system converts the question question into a text form in the case where the question question is in the form of audio or picture, so as to facilitate the search of the most matching question.
  • the input interface may be a client APP
  • the carrier of the method may be a server
  • the server may be a web server, or may be another type of server, such as an APP server.
  • S20 Perform word segmentation, de-stop words, and query extension processing on the question question to obtain an index word of the question question;
  • the word segmentation of the question question refers to dividing the question question into a plurality of words, and the word segmentation process can call the Chinese word breaker tool ICTCLAS;
  • the stop word refers to removing some words that have been deactivated, and can establish a stop in advance.
  • Use the word lexicon to match, remove the words that have been deactivated, and to stop the words can also include removing the set of words (such as ask, ask, etc.), auxiliary words (such as, ah, ah, ah, etc.) Words with little meaning but high frequency; query expansion processing mainly refers to synonym expansion (such as doctors and doctors, fathers and fathers, etc.), and synonyms can be used for synonym query expansion, which will be related to the question.
  • the candidate question related to the index word of the question question is matched from the question and answer library according to the preset index file, and the candidate question is obtained for the subsequent calculation similarity calculation.
  • Complex processes are performed within a smaller problem area.
  • the question question index word as a basic unit, the first 10 questions that have the largest number of overlapping words with the question question are found out from the preset index file (this number can be designed according to the actual intelligent interactive system requirements). , here is just an example) as a candidate.
  • the question question is calculated, "What is the business process of your company's viewing film?” and the candidate question "What kind of specific process is the film viewing business?", "What is the vRad? "?”, “What is your business doing?”, "When did you see the film business started?", “What company are you?”, “How is the diagnostic part of the film business realized?” What is the similarity of what is the problem of the similar problem in the "What is the service business platform?"
  • the similar problem set may be established in advance when designing the intelligent interactive system, or may be established and improved during use of the subsequent system.
  • the similar problem set contains at least the problem itself and similar questions with the same answer.
  • the method of calculating the similarity between sentences can be based on HowNet (HowNet, a concept represented by Chinese and English words) to reveal the relationship between concepts and concepts and the attributes of concepts.
  • HowNet HowNet, a concept represented by Chinese and English words
  • the word similarity algorithm for the common sense knowledge base of basic content calculates the similarity between words and words, and then calculates the similarity of sentences by the similarity of words.
  • the calculation method of the similarity between sentences belongs to the prior art, and will not be described here.
  • S50 Output, according to the calculated similarity, an answer corresponding to the question that matches the question question according to a preset rule.
  • an answer corresponding to the question matching the question question is output according to a preset rule.
  • the preset rule may be an output rule in a preset range according to the requirements of the intelligent interactive system in the specific application domain and the similarity degree of the question and answer library design.
  • the process of outputting an answer corresponding to the question matching the question question according to a preset rule includes first determining a candidate question that matches the question question according to the calculated similarity, and then outputting an answer corresponding to the candidate question .
  • the problem if it exists, outputs a corresponding answer of the candidate question corresponding to the similar problem set in which the problem corresponding to the similarity is located; if not, the question question is directly added to the question and answer library.
  • the embodiment of the present invention avoids the influence of the less relevant words in the complex statement on the problem matching result by performing word segmentation, de-stopping words and query expansion processing on the user questioning question; according to the preset index file, the question database is first Matching the candidate questions related to the questioning problem, reducing the calculation amount of the problem matching; recalculating the similarity of the problem problem and the similar problem concentration problem corresponding to the candidate problem, avoiding the same problem because there are many differences
  • the question is not matched to the problem, and the problem matching result is improved; finally, according to the calculated similarity, the answer corresponding to the question matching the question is output according to a preset rule.
  • the whole scheme improves the correct rate of problem matching results in the intelligent interactive system.
  • FIG. 2 is a schematic flowchart of a problem matching method in an intelligent interaction system according to a second preferred embodiment of the present invention
  • the problem matching method in the smart interaction system further includes:
  • S60 Extending each question in the question and answer library to form a similar problem set, the similar problem set including at least the problem itself and a similar problem with the same answer.
  • a similar problem set is pre-established for all questions in the question and answer library, and the similar problem set includes at least the problem. It's a similar problem with itself and the same answer. For example, to match the candidate question, “What kind of specific process is the viewing business?” Established by “How does the film business complete the entire process?”, “What are the main processes of the film business?”, “How to operate the film business?”, “How is the film business completed?” and "What kind of specific process is the film business?” The candidate problem itself constitutes a similar problem set; and so on, other candidates for matching Question What kind of company is the film company vRad?
  • FIG. 3 is a schematic flowchart of a problem matching method in an intelligent interaction system according to a third preferred embodiment of the present invention.
  • the problem matching method in the smart interaction system further includes:
  • a question answer database that is, a question and answer library
  • a question and answer library is established in advance.
  • the questions and answers in the question and answer library have a one-to-one relationship.
  • S80 Perform word segmentation on the question in the question and answer library to obtain an index word, and establish an index file of the index word corresponding to the problem.
  • the word segmentation process is performed on all the questions in the question and answer library.
  • the question in the question and answer library is "What is the business process of your company's viewing film?”
  • the word segmentation process can call the Chinese Academy of Sciences
  • the word segmentation tool ICTCLAS get the sentence after the word segmentation What is the company's business process?”
  • Calling the open source full-text search engine toolkit Lucene the sentence after the word segmentation “What is your company’s business process?
  • step S30 according to the index established here a file, by using the index word of the question question, matching candidate questions related to the question question from the question answer library.
  • the candidate question is based on an index word of the question question as a basic unit, and is found from an index file.
  • the top 10 questions with the largest number of overlapping words with the question question (this number can be designed according to the needs of the actual intelligent interactive system, here is just an example). So it is only a preliminary matching process. In this process The index word of the question question is processed by word segmentation, de-stop words, and query expansion, thereby improving the correctness of the problem matching result.
  • FIG. 4 is a detailed flowchart of step S50 shown in FIG. 1 in the intelligent interaction system according to the fourth preferred embodiment of the present invention.
  • the step S50: the step of outputting an answer corresponding to the question matching the question question according to the preset rule according to the calculated similarity includes:
  • the similarity with the question question is greater than the preset range upper limit value.
  • the problem directly outputs the answer corresponding to the candidate question with the greatest similarity to the question question. For example, after calculating, the question asked “What is the business process of your company's viewing film?” and the matching candidate question “What kind of specific process is the film business?” How does the business complete the entire process? The similarity is 90.5% (assuming the default range is 75% to 90%), then directly output the candidate question "What is the specific process of watching the film business?”
  • the similarity of all the questions in the similar problem set of all the candidate questions that match the question “What is your company's viewing business process?” does not exceed the upper limit of the preset range by 90% (hypothesis)
  • the default range is 75% ⁇ 90%)
  • the problem if any, is to output a candidate question corresponding to the similar problem set within the preset range with the similarity of the question question for the user to select. For example, if you have a question related to the question “What is your company’s business process?”, the similar question “What kind of specific process is the film business?” How to complete the whole process? The similarity is 81%.
  • the question question is directly added to The question and answer library. For example, after the calculation, the question asked “What is your company's viewing business process?” The similarity of the problem in the similar problem set corresponding to all the matching candidate questions is less than 75% (assuming the preset range is 75%) ⁇ 90%), indicating that there is no candidate question in the question and answer library that matches the question question.
  • the question question is added to the question and answer library to enrich the question in the question and answer library, and the output match is empty. prompt.
  • FIG. 5 is a detailed flowchart of step S50 shown in FIG. 1 in the intelligent interaction system according to the fifth preferred embodiment of the present invention.
  • the method further includes:
  • step S502 outputs a candidate question for the user to select, if the candidate question selected by the user has a problem that the user needs, the system acquires the candidate question selected by the user, and performs step S505; if the user selects If there is no problem in the candidate question that the user needs, the question question is added to the question and answer library, and the prompt with the match is null.
  • the system acquires the candidate question selected by the user, and adds the question question to the similar problem set corresponding to the candidate question selected by the user to enrich the candidate.
  • the similar problem set corresponding to the problem by enriching the similar problem set in this way, can enable the next time the user asks a similar question, and can quickly match the candidate problem with high similarity, and improve the correct rate of the matching result.
  • the answer corresponding to the candidate question selected by the user is output to the client for reference by the user.
  • the present invention also provides a problem matching system in an intelligent interactive system.
  • FIG. 6 is a schematic structural diagram of a problem matching system in an intelligent interaction system according to a first preferred embodiment of the present invention, where the system includes:
  • the problem obtaining module 10 is configured to obtain a question question input by the user;
  • the user refers to obtaining a question question input by a user from an input interface of the smart interaction system.
  • the user inputs a question question on the client, which may be an audio form.
  • the intelligent interactive system converts the question question into a text form in the case where the question question is in the form of audio or picture, so as to facilitate the search of the most matching question.
  • the input interface may be a client APP
  • the carrier of the method may be a server
  • the server may be a web server, or may be another type of server, such as an APP server.
  • the problem pre-processing module 20 is configured to perform word segmentation, de-stop words, and query extension processing on the question question to obtain an index word of the question question;
  • the word segmentation of the question question refers to dividing the question question into a plurality of words, and the word segmentation process can call the Chinese word breaker tool ICTCLAS;
  • the stop word refers to removing some words that have been deactivated, and can establish a stop in advance.
  • Use the word lexicon to match, remove the words that have been deactivated, and to stop the words can also include removing the set of words (such as ask, ask, etc.), auxiliary words (such as, ah, ah, ah, etc.) Words with little meaning but high frequency; query expansion processing mainly refers to synonym expansion (such as doctors and doctors, fathers and fathers, etc.), and synonyms can be used for synonym query expansion, which will be related to the question.
  • the synonym of the word after the word segmentation is expanded; after the word segmentation, the de-stop word, and the query expansion process are performed on the question question, an index word substantially related to the question question in the question question can be obtained.
  • the candidate question matching module 30 is configured to match, from the question and answer library, a candidate question related to the index word of the question question according to the preset index file;
  • the candidate question related to the index word of the question question is matched from the question and answer library according to the preset index file, and the candidate question is obtained.
  • the purpose is to make complex processes such as subsequent calculation of similarity calculations within a small problem range.
  • the index word of the question question as a basic unit, the first 10 questions that have the largest number of overlapping words with the question question are found out from the preset index file (this number can be based on the actual intelligent interactive system requirements). Design, here is just an example) as a candidate.
  • the similarity problem matching module 40 is configured to calculate a similarity between the question problem and the similar problem concentration problem corresponding to the candidate problem;
  • the question question is calculated, "What is the business process of your company's viewing film?” and the candidate question "What kind of specific process is the film viewing business?", "What is the vRad? "?”, “What is your business doing?”, "When did you see the film business started?", “What company are you?”, “How is the diagnostic part of the film business realized?” What is the similarity of what is the problem of the similar problem in the "What is the service business platform?"
  • the similar problem set may be established in advance when designing the intelligent interactive system, or may be established and improved during use of the subsequent system.
  • the similar problem set contains at least the problem itself and similar questions with the same answer.
  • Calculating the similarity of all the questions in the similar problem set corresponding to the question question and the matched candidate question That is, after the candidate question is matched, the similarity of all similar problems in the similar problem set corresponding to the question problem and the all candidate questions is calculated.
  • the method of calculating the similarity between sentences can first calculate the similarity between words and words through HowNet's word similarity algorithm, and then calculate the similarity of sentences by the similarity of words.
  • the calculation method of the similarity between sentences belongs to the prior art, and will not be described here.
  • the result output module 50 is configured to output an answer corresponding to the question that matches the question question according to the calculated rule according to the calculated similarity.
  • the answer corresponding to the question matching the question question is output according to a preset rule.
  • the preset rule may be an output rule in a preset range according to the requirements of the intelligent interactive system in the specific application domain and the similarity degree of the question and answer library design.
  • the process of outputting an answer corresponding to the question matching the question question according to a preset rule includes first determining a candidate question that matches the question question according to the calculated similarity, and then outputting an answer corresponding to the candidate question .
  • the problem if it exists, outputs a corresponding answer of the candidate question corresponding to the similar problem set in which the problem corresponding to the similarity is located; if not, the question question is directly added to the question and answer library.
  • the embodiment of the present invention avoids the influence of the less relevant words in the complex statement on the problem matching result by performing word segmentation, de-stopping words and query expansion processing on the user questioning question; according to the preset index file, the question database is first Matching the candidate questions related to the questioning problem, reducing the calculation amount of the problem matching; recalculating the similarity of the problem problem and the similar problem concentration problem corresponding to the candidate problem, avoiding the same problem because there are many differences
  • the question is not matched to the problem, and the problem matching result is improved; finally, according to the calculated similarity, the answer corresponding to the question matching the question is output according to a preset rule.
  • the entire solution improves the accuracy of problem matching results in intelligent interactive systems.
  • FIG. 7 is a schematic structural diagram of a problem matching system in an intelligent interaction system according to a second preferred embodiment of the present invention, where the system further includes:
  • the similar problem set expansion module 60 is configured to expand each question in the question and answer library to form a similar problem set, the similar problem set including at least the problem itself and a similar problem with the same answer.
  • FIG. 8 is a schematic structural diagram of a problem matching system in an intelligent interaction system according to a third preferred embodiment of the present invention, and the system further includes:
  • the index file construction module 70 is configured to establish a question and answer library; perform word segmentation processing on the questions in the question and answer library to obtain an index word, and establish an index file of the index word corresponding to the problem.
  • FIG. 9 is a schematic structural diagram of a result output module shown in FIG. 6 in an intelligent interaction system according to a fourth preferred embodiment of the present invention, where the result output module 50 includes:
  • the full match output module 501 is configured to output an answer corresponding to the candidate problem with the highest similarity of the question question when the similarity problem has a problem that the similarity of the question question is greater than the upper limit of the preset range;
  • the similar matching output module 502 is configured to: when the similarity problem exists in the similarity problem, the candidate problem with the similarity of the questioning problem in the preset range is selected for the user to select, and output the user selection The answer to the candidate question;
  • the matching is an empty output module 503, configured to add the question question to the question and answer library when the similarity between the problem in the similar problem set and the question question is less than a preset range lower limit value, and output Matches an empty prompt.
  • the similar match output module 502 is further configured to add the question question to a similar problem set corresponding to the candidate question selected by the user:
  • the similar match output module 502 outputs a candidate question for the user to select. If the candidate question selected by the user has a problem that the user needs, the system acquires the candidate question selected by the user, and adds the question question to the The similar problem set corresponding to the candidate question selected by the user is set; if there is no problem required by the user among the candidate questions selected by the user, the question question is added to the question answering library, and the prompt with the matching null is output.
  • the answer corresponding to the candidate question selected by the user is output to the client for reference by the user.
  • the specific embodiment of the present invention avoids the influence of the less relevant words in the complex sentence on the problem matching result by performing word segmentation, de-stopping words and query expansion processing on the user's questioning question; according to the preset index file, the question and answer first Matching the candidate questions related to the question question in the library reduces the calculation amount of the problem matching; recalculating the similarity between the question problem and the similar problem concentration problem corresponding to the candidate problem, avoiding the same problem because there are multiple Different questions are not matched to the problem, and the problem matching result is improved; finally, according to the calculated similarity, the answer corresponding to the question matching the question is output according to a preset rule.
  • the entire solution improves the accuracy of problem matching results in intelligent interactive systems.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种智能交互系统中的问题匹配方法和系统,通过对用户提问问题进行分词、去停用词以及查询扩展处理,避免了复杂语句中相关性不大的词对问题匹配结果的影响;根据预设的索引文件,先从问答库中匹配与所述提问问题相关的候选问题,减少了问题匹配的计算量;再计算所述提问问题与所述候选问题对应的相似问题集中问题的相似度,避免同样的问题因为有多种不同的问法而匹配不到问题的情况,提高问题匹配结果;最后根据计算出的所述相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案。整个方案提高了智能交互系统中问题匹配结果的正确率。

Description

智能交互系统中的问题匹配方法和系统
技术领域
本发明涉及信息技术领域,尤其涉及一种智能交互系统中的问题匹配方法和系统。
背景技术
智能交互系统中最重要的一步是匹配用户提出的问题与问答数据库(以下简称问答库)中每个问题的相似度,然后找出相似度最高的问题并输出对应的答案。现有的手段大多是先计算词的相似度,然后结合词的位置、权重、依存关系等各项特征,最终计算句子的相似度。但是现有的技术在特定领域智能交互系统中,其问题匹配结果并不理想。
主要原因有以下几点:
(1)同样的问题可以有许多不同的问法。
(2)针对句子的语义特征,由于汉语句子较为复杂,且大多数问题在语法结构上并不正确,所以分析的结果很难起到应有的作用。
发明内容
本发明的主要目的在于克服现有技术存在的在智能交互系统中问题匹配结果不理想的缺陷。
为实现上述目的,本发明提供了一种智能交互系统中的问题匹配方法,所述方法包括如下步骤:
获取用户输入的提问问题;
对所述提问问题进行分词、去停用词以及查询扩展处理,得到所述提问问题的索引词;
根据预设的索引文件,从问答库中匹配与所述提问问题的索引词相关的候选问题;
计算所述提问问题与所述候选问题对应的相似问题集中问题的相似度;
根据计算出的所述相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案。
优选地,所述根据计算出的所述相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案的步骤包括:
若所述相似问题集中存在与所述提问问题相似度大于预设范围上限值的问题,则输出与所述提问问题相似度最大的候选问题对应的答案;
否则,若所述相似问题集中存在相似度为预设范围内的问题,则输出与所述提问问题相似度在预设范围内的候选问题供用户选择,并输出用户选择的候选问题对应的答案;
否则,若所述相似问题集中的问题与所述提问问题的相似度均小于预设范围下限值,则将所述提问问题添加至所述问答库中,并输出匹配为空的提示。
为实现上述目的,本发明还提供了一种智能交互系统中的问题匹配系统,所述系统包括:
问题获取模块,用于获取用户输入的提问问题;
问题预处理模块,用于对所述提问问题进行分词、去停用词以及查询扩展处理,得到所述提问问题的索引词;
候选问题匹配模块,用于根据预设的索引文件,从问答库中匹配与所述提问问题的索引词相关的候选问题;
相似问题匹配模块,用于计算所述提问问题与所述候选问题对应的相似问题集中问题的相似度;
结果输出模块,用于根据计算出的所述相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案。
本发明采用上述技术方案,带来的技术效果为:通过对用户提问问题进行分词、去停用词以及查询扩展处理,避免了复杂语句中相关性不大的词对问题匹配结果的影响;根据预设的索引文件,先从问答库中匹配与所述提问问题相关的候选问题,减少了问题匹配的计算量;再计算所述提问问题与所述候选问题对应的相似问题集中问题的相似度,避免同样的问题因为有多种不同的问法而匹配不到问题的情况,提高问题匹配结果;最后根据计算出的所述相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案。整个方案提高了智能交互系统中问题匹配结果的正确率。
附图说明
图1为本发明第一优选实施例智能交互系统中的问题匹配方法流程示意图;
图2为本发明第二优选实施例智能交互系统中的问题匹配方法流程示意图;
图3为本发明第三优选实施例智能交互系统中的问题匹配方法流程示意图;
图4为本发明第四优选实施例智能交互系统中图1所示步骤S50的细化流程图;
图5为本发明第五优选实施例智能交互系统中图1所示步骤S50的细化流程图;
图6为本发明第一优选实施例智能交互系统中的问题匹配系统结构示意图;
图7为本发明第二优选实施例智能交互系统中的问题匹配系统结构示意图;
图8为本发明第三优选实施例智能交互系统中的问题匹配系统结构示意图;
图9为本发明第四优选实施例智能交互系统中图6所示结果输出模块的结构示意图。
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
下面将参考若干示例性实施方式来描述本发明的原理和精神。应当理解,给出这些实施方式仅仅是为了使本领域技术人员能够更好地理解进而实现本发明,而并非以任何方式限制本发明的范围。相反,提供这些实施方式是为了使本公开更加透彻和完整,并且能够将本公开的范围完整地传达给本领域的技术人员。
本领域技术技术人员知道,本发明的实施方式可以实现为一种系统、装置、设备、方法或计算机程序产品。因此,本公开可以具体实现为以下形式,即:完全的硬件、完全的软件(包括固件、驻留软件、微代码等),或者硬件和软件结合的形式。
本发明的主要目的在于克服现有技术存在的在智能交互系统中问题匹配结果不理想的缺陷。
参照图1,图1为本发明第一优选实施例智能交互系统中的问题匹配方法流程示意图;
在一实施例中,如图1所示,所述方法包括如下步骤:
S10:获取用户输入的提问问题;
具体地,是指获取用户从智能交互系统的输入界面输入的提问问题,当用户需要从智能交互系统中获取想要的答案时,用户在客户端上输入一个提问问题,其可以是音频形式、文字形式或者图片形式,此时智能交互系统在提问问题是音频形式或图片形式的情况下,对提问问题进行格式转换,转换成文字形式,以有利于后面的最匹配问题的查找。所述输入界面可以是客户端APP,所述方法的载体可以是服务器,服务器可以是Web服务器,也可以是其他类型的服务器,例如APP服务器。
S20:对所述提问问题进行分词、去停用词以及查询扩展处理,得到所述提问问题的索引词;
具体地,对所述提问问题进行分词是指将所述提问问题分成多个词,分词过程可以调用中科院的分词工具ICTCLAS;去停用词是指去掉一些已经停用的词,可以提前建立停用词词库进行匹配,将已经停用的词去掉,去停用词还可以包括去掉客套词(如请问、请问一下等)、助词(如的、吗、呢、啊等)等对提问问题意义关系不大但出现频率较高的词;查询扩展处理主要是指进行同义词扩展(如大夫和医生、父亲和爸爸等),可以采用《同义词林》进行同义词查询扩展,将与所述提问问题分词后的词的同义词进行扩展;对所述提问问题进行分词、去停用词以及查询扩展处理后,能够得到提问问题中与所述提问问题本质上相关的索引词。例如,提问问题为“你们公司看片业务流程是什么样的?”,经过分词后得到“你们”、“公司”、“看片”、“业务”、“流程”、“是”、“什么样”、“的”,去停用词后将“的”去掉;然后逐词进行查询扩展“你们”扩展为“尔等”,“公司”扩展为“商店”、“铺子”、“店铺”、“铺户”、“店家”、“商行”、“商号”、“店”、“铺”、“号”、“庄”、“局”、“柜洋行”、“代销店”、“店堂”、“商社”、“铺面”、“营业所”、“合作社”、“商家”、“企业”等,“看片”扩展为“看医学影像文件”,“业务”扩展为“工作”、“作业”、“事务”、“事情”、“事体”、“务”、“政工”等,“流程”扩展为“流水线”、“工艺流程”等,“是”扩展为“正确”、“对”、“然”、“不错”、“无误”、“对头”等,“什么样”扩展为“怎样”、“怎么”、“怎的”、“怎么样”、“怎么着”、“怎”、“哪样”、“何如”、“何等”、“什么”、“如何”、“何以”、“什么样”、“咋样”、“该当何论”等。得到处理后的“你们”、“公司”、“看片”、“业务”、“流程”、“是”、“什么样”以及每个词扩展的同义词作为索引词。
S30:根据预设的索引文件,从问答库中匹配与所述提问问题的索引词相关的候选问题;
具体地,通过S20的处理,得到索引词后,根据预设的索引文件,从问答库中匹配与所述提问问题的索引词相关的候选问题,得到候选问题的目的是使后续计算相似度计算等复杂的过程在较小的问题范围内进行。以所述提问问题索引词为基本单元,从预设的索引文件中找出与所述提问问题之间重叠词语数量最大的前10个问题(此数量可根据实际的智能交互系统的需求进行设计,在此只是做个示例)作为候选问题。仍然以提问问题“你们公司看片业务流程是什么样的?”为例,根据“你们”、“公司”、“看片”、“业务”、“流程”、“是”、“什么样”以及每个词扩展的同义词等索引词,根据预设的索引文件,从问答库中匹配出“看片业务大概是什么样一个具体过程啊?”、“看片公司vRad是什么样的公司?”、“你们的业务是做什么?”、“你们看片业务是什么时候开始?”、“你们是什么公司”、“看片业务诊断部分是如何实现”、“看片业务全称是什么”、“看片业务服务平台是什么”等候选问题。
S40:计算所述提问问题与所述候选问题对应的相似问题集中问题的相似度;
具体地,计算所述提问问题“你们公司看片业务流程是什么样的?”与所述候选问题“看片业务大概是什么样一个具体过程啊?”、“看片公司vRad是什么样的公司?”、“你们的业务是做什么?”、“你们看片业务是什么时候开始?”、“你们是什么公司”、“看片业务诊断部分是如何实现”、“看片业务全称是什么”、“看片业务服务平台是什么”对应的相似问题集中问题的相似度。在实际设计时,所述相似问题集可以在设计所述智能交互系统时提前建立,也可以在后续系统的使用过程中进行建立和完善。所述相似问题集包含至少所述问题本身和具有相同答案的相似问题。计算所述提问问题与所述匹配出的候选问题对应的相似问题集中所有问题的相似度。即当匹配出候选问题后,计算所述提问问题与所述所有候选问题对应的相似问题集中所有相似问题的相似度。计算句子之间相似度的方法可以先通过基于HowNet(知网,是一个以汉语和英语的词语所代表的概念为描述对象,以揭示概念与概念之间以及概念所具有的属性之间的关系为基本内容的常识知识库)的词语相似度算法计算词与词之间的相似度,再通过词的相似度计算句子的相似度。句子之间相似度的计算方法属于现有技术,在此不做赘述。
S50:根据计算出的所述相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案。
具体地,根据步骤S40计算出的相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案。所述预设的规则可以是根据具体应用领域中智能交互系统的需求,以及问答库的完善程度设计的相似度的在预设范围的输出规则。根据预设的规则输出与所述提问问题匹配的问题对应的答案的过程包括先根据计算出的所述相似度,确定与所述提问问题匹配的候选问题,再输出所述候选问题对应的答案。根据计算出的所述相似度,确定与所述提问问题匹配的候选问题时,根据相似度的预设范围,判断所述相似问题集中是否存在与所述提问问题相似度在预设范围内的问题,若存在,则输出与所述相似度对应的问题所在的相似问题集对应的候选问题的对应答案;若不存在,则直接将所述提问问题添加至问答库中。
本发明实施例通过对用户提问问题进行分词、去停用词以及查询扩展处理,避免了复杂语句中相关性不大的词对问题匹配结果的影响;根据预设的索引文件,先从问答库中匹配与所述提问问题相关的候选问题,减少了问题匹配的计算量;再计算所述提问问题与所述候选问题对应的相似问题集中问题的相似度,避免同样的问题因为有多种不同的问法而匹配不到问题的情况,提高问题匹配结果;最后根据计算出的所述相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案。,整个方案提高了智能交互系统中问题匹配结果的正确率。
参照图2,图2为本发明第二优选实施例智能交互系统中的问题匹配方法流程示意图;
在一实施例中,如图2所示,基于图1所示的第一优选实施例,在所述步骤S10之前,所述智能交互系统中的问题匹配方法还包括:
S60:对所述问答库中的每个问题进行扩展构成相似问题集,所述相似问题集包含至少所述问题本身和具有相同答案的相似问题。
具体地,为了保证用户使用时,提高所述提问问题匹配结果的正确率,在设计智能交互系统时,为问答库中的所有问题预先建立相似问题集,所述相似问题集包含至少所述问题本身和具有相同答案的相似问题。例如为匹配出的候选问题“看片业务大概是什么样一个具体过程啊?”建立由“看片业务是如何完成整个流程?”、“看片业务主要流程有哪几个部分?”、“看片业务如何操作?”、“看片业务怎么完成?”以及“看片业务大概是什么样一个具体过程啊?”候选问题本身构成的相似问题集;以此类推,为匹配出的其他候选问题看片公司vRad是什么样的公司?”、“你们的业务是做什么?”、“你们看片业务是什么时候开始?”、“你们是什么公司”、“看片业务诊断部分是如何实现”、“看片业务全称是什么”、“看片业务服务平台是什么”分别建立相似问题集。
参照图3,图3为本发明第三优选实施例智能交互系统中的问题匹配方法流程示意图;
在一实施例中,如图3所示,基于图2所示的第二优选实施例,在所述步骤S60之前,所述智能交互系统中的问题匹配方法还包括:
S70:建立问答库;
具体地,根据所述智能交互系统的需求,预先建立问题答案数据库,即问答库。在一优选实施例中,为避免数据的冗余,所述问答库中的问题与答案具有一对一的关系。
S80:对所述问答库中的问题进行分词处理得到索引词,建立所述索引词与所述问题对应关系的索引文件。
具体地,建立问答库之后,对问答库中的所有问题进行分词处理,例如对问答库中的问题“你们公司看片业务流程是什么样的?”,经过分词后得到“你们”、“公司”、“看片”、“业务”、“流程”、“是”、“什么样”、“的”,以此类推,对问答库中的多有问题进行分词处理,分词过程可以调用中科院的分词工具ICTCLAS,得到分词后的句子“你们 公司 看片 业务 流程 是 什么样 的”。调用开放源代码的全文检索引擎工具包Lucene,将分词后的句子“你们 公司 看片 业务 流程 是 什么样 的”以及原提问问题“你们公司看片业务流程是什么样的”作为参数输入,即可得到所述索引词与所述问题对应关系的索引文件。在步骤S30中,根据此处建立的索引文件,通过所述提问问题的索引词从所述问答库中匹配与所述提问问题相关的候选问题。所述候选问题是以所述提问问题的索引词为基本单元,从索引文件中找出与所述提问问题之间重叠词语数量最大的前10个问题(此数量可根据实际的智能交互系统的需求进行设计,在此只是做个示例)。因此只是一个初步的匹配过程。在此过程中,所述提问问题的索引词经过了分词、去停用词以及查询扩展的处理,因此提高了问题匹配结果的正确性。
参照图4,图4为本发明第四优选实施例智能交互系统中图1所示步骤S50的细化流程图;
在一实施例中,如图4所示,所述步骤S50:根据计算出的所述相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案的步骤包括:
S501:若所述相似问题集中存在与所述提问问题相似度大于预设范围上限值的问题,则输出与所述提问问题相似度最大的候选问题对应的答案;
具体地,经过对所述提问问题与匹配出的所有候选问题对应的相似问题集中问题相似度的计算,若判断出所述相似问题集中存在与所述提问问题相似度大于预设范围上限值的问题,则直接输出与所述提问问题相似度最大的候选问题对应的答案。例如:经过计算得出提问问题“你们公司看片业务流程是什么样的?”与匹配出的候选问题“看片业务大概是什么样一个具体过程啊?”的相似问题集中的问题“看片业务是如何完成整个流程?”的相似度为90.5%(假设预设范围为75%~90%),则直接输出候选问题“看片业务大概是什么样一个具体过程啊?”对应的答案。
S502:否则,若所述相似问题集中存在相似度为预设范围内的问题,则输出与所述提问问题相似度在预设范围内的候选问题供用户选择,并输出用户选择的候选问题对应的答案;
具体地,倘若不满足S501的条件,则判断所述相似问题集中是否存在与所述提问问题相似度为预设范围内的问题,若存在,则输出与所述提问问题相似度在预设范围内的候选问题供用户选择。例如:经过计算得出与提问问题“你们公司看片业务流程是什么样的?”匹配的所有候选问题的相似问题集中的所有问题的相似度均没有超过预设范围上限值90%(假设预设范围为75%~90%),则判断与所述提问问题“你们公司看片业务流程是什么样的?”匹配的所有候选问题的相似问题集中是否存在相似度为预设范围内的问题,若存在,则输出存在与所述提问问题相似度为预设范围内的相似问题集对应的候选问题供用户选择。例如若与所述提问问题“你们公司看片业务流程是什么样的?”匹配的候选问题“看片业务大概是什么样一个具体过程啊?”的相似问题集中的相似问题“看片业务是如何完成整个流程?”的相似度为81%,与所述提问问题“你们公司看片业务流程是什么样的?”匹配的候选问题“看片业务大概是什么样一个具体过程啊?”的相似问题集中的相似问题“看片业务主要流程有哪几个部分?”的相似度为78%,两个相似度在预设范围内的问题在同一个候选问题“看片业务大概是什么样一个具体过程啊?”对应的相似问题集中,则只输出一次该候选问题供用户选择;以此类推,与所述提问问题“你们公司看片业务流程是什么样的?”匹配的其他候选问题对应的相似问题集中存在与所述提问问题相似度为预设范围内的相似问题时,按照上述原则输出该候选问题供用户选择,并输出用户选择的候选问题对应的答案。给用户足够的选择权限确保匹配结果的正确率。
S503:否则,若所述相似问题集中的问题与所述提问问题的相似度均小于预设范围下限值,则将所述提问问题添加至所述问答库中,并输出匹配为空的提示。
具体地,若上述步骤S501和步骤S502的条件均不满足,即所述相似问题集中的问题与所述提问问题的相似度均小于预设范围下限值,则将所述提问问题直接添加至所述问答库中。例如:经过计算得出提问问题“你们公司看片业务流程是什么样的?”与匹配出的所有候选问题对应的相似问题集中的问题的相似度均小于75%(假设预设范围为75%~90%),则说明问答库中没有与所述提问问题匹配的候选问题,此时,将所述提问问题添加至所述问答库中以丰富问答库中的问题,并输出匹配为空的提示。通过此种方式丰富问答库中的问题,能够使得下次用户再提问类似的问题时,匹配到相似度高的候选问题,提高匹配结果的正确率。
参照图5,图5为本发明第五优选实施例智能交互系统中图1所示步骤S50的细化流程图;
在一实施例中,如图5所示,基于图4第五优选实施例所示的流程图,在所示所述步骤S502:若所述相似问题集中存在相似度为预设范围内的相似问题,则输出与所述提问问题相似度在预设范围内的候选问题供用户选择,并输出用户选择的候选问题对应的答案的步骤之后,还包括:
S504:将所述提问问题添加至所述用户选择的候选问题对应的相似问题集中;
具体地,步骤S502输出供用户选择的候选问题,若所述供用户选择的候选问题中有用户需要的问题,则系统获取用户选择的候选问题,并执行步骤S505;若所述供用户选择的候选问题中没有用户需要的问题,则将所述提问问题添加至所述问答库中,并输出匹配为空的提示。
若所述供用户选择的候选问题中有用户需要的问题,则系统获取用户选择的候选问题,并将所述提问问题添加至所述用户选择的候选问题对应的相似问题集中以丰富所述候选问题对应的相似问题集,通过此种方式丰富相似问题集,能够使得下次用户再提问类似的问题时,能够快速匹配到相似度高的候选问题,提高匹配结果的正确率。例如若输出供用户选择的与提问问题“你们公司看片业务流程是什么样的?”匹配的候选问题为“看片业务大概是什么样一个具体过程啊?”和“看片业务主要流程有哪几个部分?”,用户选择的是“看片业务大概是什么样一个具体过程啊?”,则将提问问题“你们公司看片业务流程是什么样的?”添加至候选问题“看片业务大概是什么样一个具体过程啊?”对应的相似问题集中。当下次有用户的提问问题为“你们公司看片业务流程是什么样的?”时,能够准确匹配到相似度高的候选问题“看片业务大概是什么样一个具体过程啊?”,提高问题匹配结果的正确率。
当用户选择了具体候选问题后,将所述用户选择的候选问题对应的答案输出给客户供用户参考。
为实现上述目的,本发明还提供了一种智能交互系统中的问题匹配系统。
参照图6,图6为本发明第一优选实施例智能交互系统中问题匹配系统结构示意图,所述系统包括:
问题获取模块10,用于获取用户输入的提问问题;
具体地,是指获取用户从智能交互系统的输入界面输入的提问问题,当用户需要从智能交互系统中获取想要的答案时,用户在客户端上输入一个提问问题,其可以是音频形式、文字形式或者图片形式,此时智能交互系统在提问问题是音频形式或图片形式的情况下,对提问问题进行格式转换,转换成文字形式,以有利于后面的最匹配问题的查找。所述输入界面可以是客户端APP,所述方法的载体可以是服务器,服务器可以是Web服务器,也可以是其他类型的服务器,例如APP服务器。
问题预处理模块20,用于对所述提问问题进行分词、去停用词以及查询扩展处理,得到所述提问问题的索引词;
具体地,对所述提问问题进行分词是指将所述提问问题分成多个词,分词过程可以调用中科院的分词工具ICTCLAS;去停用词是指去掉一些已经停用的词,可以提前建立停用词词库进行匹配,将已经停用的词去掉,去停用词还可以包括去掉客套词(如请问、请问一下等)、助词(如的、吗、呢、啊等)等对提问问题意义关系不大但出现频率较高的词;查询扩展处理主要是指进行同义词扩展(如大夫和医生、父亲和爸爸等),可以采用《同义词林》进行同义词查询扩展,将与所述提问问题分词后的词的同义词进行扩展;对所述提问问题进行分词、去停用词以及查询扩展处理后,能够得到提问问题中与所述提问问题本质上相关的索引词。
候选问题匹配模块30,用于根据预设的索引文件,从问答库中匹配与所述提问问题的索引词相关的候选问题;
具体地,通过问题预处理模块20的处理,得到所述提问问题的索引词后,根据预设的索引文件,从问答库中匹配与所述提问问题的索引词相关的候选问题,得到候选问题的目的是使后续计算相似度计算等复杂的过程在较小的问题范围内进行。以所述提问问题的索引词为基本单元,从预设的索引文件中找出与所述提问问题之间重叠词语数量最大的前10个问题(此数量可根据实际的智能交互系统的需求进行设计,在此只是做个示例)作为候选问题。
相似问题匹配模块40,用于计算所述提问问题与所述候选问题对应的相似问题集中问题的相似度;
具体地,计算所述提问问题“你们公司看片业务流程是什么样的?”与所述候选问题“看片业务大概是什么样一个具体过程啊?”、“看片公司vRad是什么样的公司?”、“你们的业务是做什么?”、“你们看片业务是什么时候开始?”、“你们是什么公司”、“看片业务诊断部分是如何实现”、“看片业务全称是什么”、“看片业务服务平台是什么”对应的相似问题集中问题的相似度。在实际设计时,所述相似问题集可以在设计所述智能交互系统时提前建立,也可以在后续系统的使用过程中进行建立和完善。所述相似问题集包含至少所述问题本身和具有相同答案的相似问题。计算所述提问问题与所述匹配出的候选问题对应的相似问题集中所有问题的相似度。即当匹配出候选问题后,计算所述提问问题与所述所有候选问题对应的相似问题集中所有相似问题的相似度。计算句子之间相似度的方法可以先通过基于HowNet的词语相似度算法计算词与词之间的相似度,再通过词的相似度计算句子的相似度。句子之间相似度的计算方法属于现有技术,在此不做赘述。
结果输出模块50,用于根据计算出的所述相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案。
具体地,根据相似问题匹配模块40计算出的相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案。所述预设的规则可以是根据具体应用领域中智能交互系统的需求,以及问答库的完善程度设计的相似度的在预设范围的输出规则。根据预设的规则输出与所述提问问题匹配的问题对应的答案的过程包括先根据计算出的所述相似度,确定与所述提问问题匹配的候选问题,再输出所述候选问题对应的答案。根据计算出的所述相似度,确定与所述提问问题匹配的候选问题时,根据相似度的预设范围,判断所述相似问题集中是否存在与所述提问问题相似度在预设范围内的问题,若存在,则输出与所述相似度对应的问题所在的相似问题集对应的候选问题的对应答案;若不存在,则直接将所述提问问题添加至问答库中。
本发明实施例通过对用户提问问题进行分词、去停用词以及查询扩展处理,避免了复杂语句中相关性不大的词对问题匹配结果的影响;根据预设的索引文件,先从问答库中匹配与所述提问问题相关的候选问题,减少了问题匹配的计算量;再计算所述提问问题与所述候选问题对应的相似问题集中问题的相似度,避免同样的问题因为有多种不同的问法而匹配不到问题的情况,提高问题匹配结果;最后根据计算出的所述相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案。整个方案提高了智能交互系统中问题匹配结果的正确率。
参照图7,图7为本发明第二优选实施例智能交互系统中问题匹配系统结构示意图,所述系统还包括:
相似问题集扩展模块60,用于对所述问答库中的每个问题进行扩展构成相似问题集,所述相似问题集包含至少所述问题本身和具有相同答案的相似问题。
参照图8,图8为本发明第三优选实施例智能交互系统中问题匹配系统结构示意图,所述系统还包括:
索引文件构建模块70,用于建立问答库;对所述问答库中的问题进行分词处理得到索引词,建立所述索引词与所述问题对应关系的索引文件。
参照图9,图9为本发明第四优选实施例智能交互系统中图6所示结果输出模块的结构示意图,所述结果输出模块50包括:
完全匹配输出模块501,用于当所述相似问题集中存在与所述提问问题相似度大于预设范围上限值的问题时,输出与所述提问问题相似度最大的候选问题对应的答案;
相似匹配输出模块502,用于当所述相似问题集中存在相似度为预设范围内的问题时,输出与所述提问问题相似度在预设范围内的候选问题供用户选择,并输出用户选择的候选问题对应的答案;
匹配为空输出模块503,用于当所述相似问题集中的问题与所述提问问题的相似度均小于预设范围下限值时,将所述提问问题添加至所述问答库中,并输出匹配为空的提示。
在一实施例中,所述相似匹配输出模块502还用于将所述提问问题添加至所述用户选择的候选问题对应的相似问题集中:
具体地,相似匹配输出模块502输出供用户选择的候选问题,若所述供用户选择的候选问题中有用户需要的问题,则系统获取用户选择的候选问题,并将所述提问问题添加至所述用户选择的候选问题对应的相似问题集中;若所述供用户选择的候选问题中没有用户需要的问题,则将所述提问问题添加至所述问答库中,并输出匹配为空的提示。
当用户选择了具体候选问题后,将所述用户选择的候选问题对应的答案输出给客户供用户参考。
本发明具体实施方式通过对用户提问问题进行分词、去停用词以及查询扩展处理,避免了复杂语句中相关性不大的词对问题匹配结果的影响;根据预设的索引文件,先从问答库中匹配与所述提问问题相关的候选问题,减少了问题匹配的计算量;再计算所述提问问题与所述候选问题对应的相似问题集中问题的相似度,避免同样的问题因为有多种不同的问法而匹配不到问题的情况,提高问题匹配结果;最后根据计算出的所述相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案。整个方案提高了智能交互系统中问题匹配结果的正确率。
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。

Claims (14)

  1. 一种智能交互系统中的问题匹配方法,其特征在于,所述方法包括如下步骤:
    获取用户输入的提问问题;
    对所述提问问题进行分词、去停用词以及查询扩展处理,得到所述提问问题的索引词;
    根据预设的索引文件,从问答库中匹配与所述提问问题的索引词相关的候选问题;
    计算所述提问问题与所述候选问题对应的相似问题集中问题的相似度;
    根据计算出的所述相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案。
  2. 如权利要求1所述的智能交互系统中的问题匹配方法,其特征在于,所述根据计算出的所述相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案的步骤包括:
    若所述相似问题集中存在与所述提问问题相似度大于预设范围上限值的问题,则输出与所述提问问题相似度最大的候选问题对应的答案;
    否则,若所述相似问题集中存在相似度为预设范围内的问题,则输出与所述提问问题相似度在预设范围内的候选问题供用户选择,并输出用户选择的候选问题对应的答案;
    否则,若所述相似问题集中的问题与所述提问问题的相似度均小于预设范围下限值,则将所述提问问题添加至所述问答库中,并输出匹配为空的提示。
  3. 如权利要求1所述的智能交互系统中的问题匹配方法,其特征在于,所述方法还包括:
    对所述问答库中的每个问题进行扩展构成相似问题集,所述相似问题集包含至少所述问题本身和具有相同答案的相似问题。
  4. 如权利要求3所述的智能交互系统中的问题匹配方法,其特征在于,所述根据计算出的所述相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案的步骤包括:
    若所述相似问题集中存在与所述提问问题相似度大于预设范围上限值的问题,则输出与所述提问问题相似度最大的候选问题对应的答案;
    否则,若所述相似问题集中存在相似度为预设范围内的问题,则输出与所述提问问题相似度在预设范围内的候选问题供用户选择,并输出用户选择的候选问题对应的答案;
    否则,若所述相似问题集中的问题与所述提问问题的相似度均小于预设范围下限值,则将所述提问问题添加至所述问答库中,并输出匹配为空的提示。
  5. 如权利要求1所述的智能交互系统中的问题匹配方法,其特征在于,所述方法还包括:
    建立问答库;
    对所述问答库中的问题进行分词处理得到索引词,建立所述索引词与所述问题对应关系的索引文件。
  6. 如权利要求5所述的智能交互系统中的问题匹配方法,其特征在于,所述根据计算出的所述相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案的步骤包括:
    若所述相似问题集中存在与所述提问问题相似度大于预设范围上限值的问题,则输出与所述提问问题相似度最大的候选问题对应的答案;
    否则,若所述相似问题集中存在相似度为预设范围内的问题,则输出与所述提问问题相似度在预设范围内的候选问题供用户选择,并输出用户选择的候选问题对应的答案;
    否则,若所述相似问题集中的问题与所述提问问题的相似度均小于预设范围下限值,则将所述提问问题添加至所述问答库中,并输出匹配为空的提示。
  7. 如权利要求6所述的智能交互系统中的问题匹配方法,其特征在于,所述若所述相似问题集中存在相似度为预设范围内的相似问题,则输出与所述提问问题相似度在预设范围内的候选问题供用户选择,并输出用户选择的候选问题对应的答案的步骤之后,还包括:
    将所述提问问题添加至所述用户选择的候选问题对应的相似问题集中。
  8. 一种智能交互系统中的问题匹配系统,其特征在于,所述系统包括:
    问题获取模块,用于获取用户输入的提问问题;
    问题预处理模块,用于对所述提问问题进行分词、去停用词以及查询扩展处理,得到所述提问问题的索引词;
    候选问题匹配模块,用于根据预设的索引文件,从问答库中匹配与所述提问问题的索引词相关的候选问题;
    相似问题匹配模块,用于计算所述提问问题与所述候选问题对应的相似问题集中问题的相似度;
    结果输出模块,用于根据计算出的所述相似度,按照预设的规则输出与所述提问问题匹配的问题对应的答案。
  9. 如权利要求8所述的智能交互系统中的问题匹配系统,其特征在于,所述结果输出模块包括:
    完全匹配输出模块,用于当所述相似问题集中存在与所述提问问题相似度大于预设范围上限值的问题时,输出与所述提问问题相似度最大的候选问题对应的答案;
    相似匹配输出模块,用于当所述相似问题集中存在相似度为预设范围内的问题时,输出与所述提问问题相似度在预设范围内的候选问题供用户选择,并输出用户选择的候选问题对应的答案;
    匹配为空输出模块,用于当所述相似问题集中的问题与所述提问问题的相似度均小于预设范围下限值时,将所述提问问题添加至所述问答库中,并输出匹配为空的提示。
  10. 如权利要求8所述的智能交互系统中的问题匹配系统,其特征在于,所述系统还包括:
    相似问题集扩展模块,用于对所述问答库中的每个问题进行扩展构成相似问题集,所述相似问题集包含至少所述问题本身和具有相同答案的相似问题。
  11. 如权利要求10所述的智能交互系统中的问题匹配系统,其特征在于,所述结果输出模块包括:
    完全匹配输出模块,用于当所述相似问题集中存在与所述提问问题相似度大于预设范围上限值的问题时,输出与所述提问问题相似度最大的候选问题对应的答案;
    相似匹配输出模块,用于当所述相似问题集中存在相似度为预设范围内的问题时,输出与所述提问问题相似度在预设范围内的候选问题供用户选择,并输出用户选择的候选问题对应的答案;
    匹配为空输出模块,用于当所述相似问题集中的问题与所述提问问题的相似度均小于预设范围下限值时,将所述提问问题添加至所述问答库中,并输出匹配为空的提示。
  12. 如权利要求8所述的智能交互系统中的问题匹配系统,其特征在于,所述系统还包括:
    索引文件构建模块,用于建立问答库;对所述问答库中的问题进行分词处理得到索引词,建立所述索引词与所述问题对应关系的索引文件。
  13. 如权利要求12所述的智能交互系统中的问题匹配系统,其特征在于,所述结果输出模块包括:
    完全匹配输出模块,用于当所述相似问题集中存在与所述提问问题相似度大于预设范围上限值的问题时,输出与所述提问问题相似度最大的候选问题对应的答案;
    相似匹配输出模块,用于当所述相似问题集中存在相似度为预设范围内的问题时,输出与所述提问问题相似度在预设范围内的候选问题供用户选择,并输出用户选择的候选问题对应的答案;
    匹配为空输出模块,用于当所述相似问题集中的问题与所述提问问题的相似度均小于预设范围下限值时,将所述提问问题添加至所述问答库中,并输出匹配为空的提示。
  14. 如权利要求13所述的智能交互系统中的问题匹配系统,其特征在于,所述相似匹配输出模块还用于将所述提问问题添加至所述用户选择的候选问题对应的相似问题集中。
PCT/CN2015/071314 2015-01-15 2015-01-22 智能交互系统中的问题匹配方法和系统 WO2016112558A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510020532.6 2015-01-15
CN201510020532.6A CN104657346A (zh) 2015-01-15 2015-01-15 智能交互系统中的问题匹配方法和系统

Publications (1)

Publication Number Publication Date
WO2016112558A1 true WO2016112558A1 (zh) 2016-07-21

Family

ID=53248495

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/071314 WO2016112558A1 (zh) 2015-01-15 2015-01-22 智能交互系统中的问题匹配方法和系统

Country Status (2)

Country Link
CN (1) CN104657346A (zh)
WO (1) WO2016112558A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117474A (zh) * 2018-06-25 2019-01-01 广州多益网络股份有限公司 语句相似度的计算方法、装置及存储介质
CN109829048A (zh) * 2019-01-23 2019-05-31 平安科技(深圳)有限公司 电子装置、访谈辅助方法和计算机可读存储介质
CN110765247A (zh) * 2019-09-30 2020-02-07 支付宝(杭州)信息技术有限公司 一种用于问答机器人的输入提示方法及装置
CN111177379A (zh) * 2019-12-20 2020-05-19 深圳市优必选科技股份有限公司 低精度问题的归类方法、智能终端及计算机可读存储介质
CN113807512A (zh) * 2020-06-12 2021-12-17 株式会社理光 机器阅读理解模型的训练方法、装置及可读存储介质

Families Citing this family (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302859B (zh) * 2015-09-21 2018-11-30 上海智臻智能网络科技股份有限公司 一种基于互联网的智能交互系统
CN108845992B (zh) * 2015-10-30 2022-08-26 上海智臻智能网络科技股份有限公司 计算机可读存储介质及问答交互方法
CN105354300B (zh) * 2015-11-05 2019-04-05 上海智臻智能网络科技股份有限公司 一种信息推荐方法及装置
CN106844400A (zh) * 2015-12-07 2017-06-13 南京中兴新软件有限责任公司 智能应答方法及装置
CN105653576A (zh) * 2015-12-16 2016-06-08 上海智臻智能网络科技股份有限公司 信息搜索的方法及装置、人工座席服务方法及系统
CN105653619B (zh) * 2015-12-25 2019-01-25 上海智臻智能网络科技股份有限公司 智能问答系统中正确日志库的更新方法和装置
CN107153639A (zh) * 2016-03-04 2017-09-12 北大方正集团有限公司 智能问答方法及系统
CN106339429A (zh) * 2016-08-17 2017-01-18 浪潮电子信息产业股份有限公司 一种实现智能客服的方法、装置和系统
CN106485370B (zh) * 2016-11-03 2019-09-06 上海智臻智能网络科技股份有限公司 一种信息预测的方法和装置
CN106874406A (zh) * 2017-01-18 2017-06-20 北京光年无限科技有限公司 一种用于机器人的交互输出方法
CN108509463B (zh) 2017-02-28 2022-03-29 华为技术有限公司 一种问题的应答方法及装置
CN107423326B (zh) * 2017-04-11 2020-01-24 广州亿码科技有限公司 一种基于回答方式的推荐及结果查询方法及系统
CN107133299B (zh) * 2017-04-26 2019-11-19 消检通(深圳)科技有限公司 基于人工智能的消防应答方法、移动终端及可读存储介质
CN107220380A (zh) * 2017-06-27 2017-09-29 北京百度网讯科技有限公司 基于人工智能的问答推荐方法、装置和计算机设备
CN107688604A (zh) * 2017-07-26 2018-02-13 阿里巴巴集团控股有限公司 数据应答处理方法、装置及服务器
CN107644012B (zh) * 2017-08-29 2019-03-01 平安科技(深圳)有限公司 电子装置、问题识别确认方法和计算机可读存储介质
CN108304437B (zh) * 2017-09-25 2020-01-31 腾讯科技(深圳)有限公司 一种自动问答方法、装置及存储介质
CN107807960B (zh) * 2017-09-30 2020-05-12 平安科技(深圳)有限公司 智能客服方法、电子装置及计算机可读存储介质
CN107729510B (zh) * 2017-10-23 2021-07-06 深圳市前海众兴科研有限公司 信息交互方法、信息交互终端及存储介质
CN108021691B (zh) * 2017-12-18 2021-09-07 深圳前海微众银行股份有限公司 答案查找方法、客服机器人以及计算机可读存储介质
CN108256009B (zh) * 2018-01-03 2022-02-15 国网江苏省电力有限公司电力科学研究院 一种提高电力智能应答机器人回答准确率的方法
CN108595695B (zh) * 2018-05-08 2021-03-16 和美(深圳)信息技术股份有限公司 数据处理方法、装置、计算机设备和存储介质
CN109241249B (zh) * 2018-07-16 2021-09-14 创新先进技术有限公司 一种确定突发问题的方法及装置
CN109327631A (zh) * 2018-10-24 2019-02-12 深圳市万屏时代科技有限公司 一种智能人工客服系统
CN110059172B (zh) * 2019-04-19 2021-09-21 北京百度网讯科技有限公司 基于自然语言理解的推荐答案的方法和装置
CN110245219A (zh) * 2019-04-25 2019-09-17 义语智能科技(广州)有限公司 一种基于自动扩展问答数据库的问答方法及设备
CN110275946A (zh) * 2019-05-14 2019-09-24 闽江学院 一种faq自动问答方法和装置
CN110263141A (zh) * 2019-06-25 2019-09-20 杭州微洱网络科技有限公司 一种基于bert的客服问答系统
CN110737759B (zh) * 2019-09-06 2023-07-25 中国平安人寿保险股份有限公司 客服机器人的评测方法、装置、计算机设备及存储介质
CN110866089B (zh) * 2019-11-14 2023-04-28 国家电网有限公司 基于同义多语境分析的机器人知识库构建系统及方法
CN111858846A (zh) * 2020-03-05 2020-10-30 北京嘀嘀无限科技发展有限公司 一种信息处理方法及装置
CN111737449B (zh) * 2020-08-03 2020-12-11 腾讯科技(深圳)有限公司 相似问题的确定方法和装置、存储介质及电子装置
CN112073741B (zh) * 2020-08-31 2023-11-17 腾讯科技(深圳)有限公司 直播信息的处理方法、装置、电子设备及存储介质
CN112685545A (zh) * 2020-12-29 2021-04-20 浙江力石科技股份有限公司 一种基于多核心词匹配的智能语音交互方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5802493A (en) * 1994-12-07 1998-09-01 Aetna Life Insurance Company Method and apparatus for generating a proposal response
CN101174259A (zh) * 2007-09-17 2008-05-07 张琰亮 一种智能互动式问答系统
CN101257512A (zh) * 2008-02-02 2008-09-03 黄伟才 用于问答系统的问答匹配方法及问答方法和系统
CN101286161A (zh) * 2008-05-28 2008-10-15 华中科技大学 一种基于概念的智能中文问答系统

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005157547A (ja) * 2003-11-21 2005-06-16 Fujitsu Ltd 類似記事抽出方法及びプログラム
CN1952928A (zh) * 2005-10-20 2007-04-25 梁威 建立自然语言知识库及其自动问答检索的计算机系统
CN101339551B (zh) * 2007-07-05 2013-01-30 日电(中国)有限公司 自然语言查询需求扩展设备及其方法
US8468143B1 (en) * 2010-04-07 2013-06-18 Google Inc. System and method for directing questions to consultants through profile matching

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5802493A (en) * 1994-12-07 1998-09-01 Aetna Life Insurance Company Method and apparatus for generating a proposal response
CN101174259A (zh) * 2007-09-17 2008-05-07 张琰亮 一种智能互动式问答系统
CN101257512A (zh) * 2008-02-02 2008-09-03 黄伟才 用于问答系统的问答匹配方法及问答方法和系统
CN101286161A (zh) * 2008-05-28 2008-10-15 华中科技大学 一种基于概念的智能中文问答系统

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117474A (zh) * 2018-06-25 2019-01-01 广州多益网络股份有限公司 语句相似度的计算方法、装置及存储介质
CN109829048A (zh) * 2019-01-23 2019-05-31 平安科技(深圳)有限公司 电子装置、访谈辅助方法和计算机可读存储介质
CN109829048B (zh) * 2019-01-23 2023-06-23 平安科技(深圳)有限公司 电子装置、访谈辅助方法和计算机可读存储介质
CN110765247A (zh) * 2019-09-30 2020-02-07 支付宝(杭州)信息技术有限公司 一种用于问答机器人的输入提示方法及装置
CN110765247B (zh) * 2019-09-30 2022-10-25 支付宝(杭州)信息技术有限公司 一种用于问答机器人的输入提示方法及装置
CN111177379A (zh) * 2019-12-20 2020-05-19 深圳市优必选科技股份有限公司 低精度问题的归类方法、智能终端及计算机可读存储介质
CN111177379B (zh) * 2019-12-20 2023-05-23 深圳市优必选科技股份有限公司 低精度问题的归类方法、智能终端及计算机可读存储介质
CN113807512A (zh) * 2020-06-12 2021-12-17 株式会社理光 机器阅读理解模型的训练方法、装置及可读存储介质
CN113807512B (zh) * 2020-06-12 2024-01-23 株式会社理光 机器阅读理解模型的训练方法、装置及可读存储介质

Also Published As

Publication number Publication date
CN104657346A (zh) 2015-05-27

Similar Documents

Publication Publication Date Title
WO2016112558A1 (zh) 智能交互系统中的问题匹配方法和系统
WO2021132927A1 (en) Computing device and method of classifying category of data
WO2022042512A1 (zh) 文本处理方法、装置、电子设备及介质
WO2020034526A1 (zh) 保险录音的质检方法、装置、设备和计算机存储介质
WO2020107761A1 (zh) 广告文案处理方法、装置、设备及计算机可读存储介质
WO2020143322A1 (zh) 用户请求的检测方法、装置、计算机设备及存储介质
WO2020107765A1 (zh) 语句分析处理方法、装置、设备以及计算机可读存储介质
WO2015068947A1 (ko) 녹취된 음성 데이터에 대한 핵심어 추출 기반 발화 내용 파악 시스템과, 이 시스템을 이용한 인덱싱 방법 및 발화 내용 파악 방법
WO2020215681A1 (zh) 指示信息生成方法、装置、终端及存储介质
WO2021051558A1 (zh) 基于知识图谱的问答方法、装置和存储介质
WO2012155709A1 (zh) 一种动态推送用户个人标签的方法和系统、存储介质
WO2020258656A1 (zh) 代码段的生成方法、装置、存储介质及计算机设备
JPH11203311A (ja) 関連語抽出装置および関連語抽出方法および関連語抽出プログラムが記録されたコンピュータ読取可能な記録媒体
WO2020113959A1 (zh) 医疗机构透析水平的考核方法、装置、设备及存储介质
WO2020253115A1 (zh) 基于语音识别的产品推荐方法、装置、设备和存储介质
WO2020062640A1 (zh) 终端应用动态文案的语言切换方法、服务器及存储介质
US20200097150A1 (en) Dynamic system and method for content and topic based synchronization during presentations
WO2021003956A1 (zh) 产品信息的管理方法、装置、设备及存储介质
WO2020087981A1 (zh) 风控审核模型生成方法、装置、设备及可读存储介质
WO2021010744A1 (ko) 음성 인식 기반의 세일즈 대화 분석 방법 및 장치
CN110753269B (zh) 视频摘要生成方法、智能终端及存储介质
WO2015050321A1 (ko) 자율학습 정렬 기반의 정렬 코퍼스 생성 장치 및 그 방법과, 정렬 코퍼스를 사용한 파괴 표현 형태소 분석 장치 및 그 형태소 분석 방법
WO2020119384A1 (zh) 基于大数据分析的医保异常检测方法、装置、设备和介质
WO2020082766A1 (zh) 输入法的联想方法、装置、设备及可读存储介质
WO2019164119A1 (ko) 전자 장치 및 그 제어 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15877476

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15877476

Country of ref document: EP

Kind code of ref document: A1