CN104572868A - Method and device for information matching based on questioning and answering system - Google Patents

Method and device for information matching based on questioning and answering system Download PDF

Info

Publication number
CN104572868A
CN104572868A CN201410800479.7A CN201410800479A CN104572868A CN 104572868 A CN104572868 A CN 104572868A CN 201410800479 A CN201410800479 A CN 201410800479A CN 104572868 A CN104572868 A CN 104572868A
Authority
CN
China
Prior art keywords
candidate result
threshold
highest
marking
problem candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410800479.7A
Other languages
Chinese (zh)
Other versions
CN104572868B (en
Inventor
王东
游世学
刘荣
杜新凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING ZHONGKE HUILIAN INFORMATION TECHNOLOGY Co Ltd
Tsinghua University
Original Assignee
BEIJING ZHONGKE HUILIAN INFORMATION TECHNOLOGY Co Ltd
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING ZHONGKE HUILIAN INFORMATION TECHNOLOGY Co Ltd, Tsinghua University filed Critical BEIJING ZHONGKE HUILIAN INFORMATION TECHNOLOGY Co Ltd
Priority to CN201410800479.7A priority Critical patent/CN104572868B/en
Publication of CN104572868A publication Critical patent/CN104572868A/en
Application granted granted Critical
Publication of CN104572868B publication Critical patent/CN104572868B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2468Fuzzy queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Automation & Control Theory (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for information matching based on a questioning and answering system. The method comprises the following steps of obtaining a fuzzy problem candidate result with highest score and a Lucene problem candidate result with highest score of user input information; judging whether the fuzzy problem candidate result with the highest score is greater than a first threshold value of the fuzzy problem candidate result or not; when the fuzzy problem candidate result with the highest score is greater than a first threshold value of the fuzzy problem candidate result, using the fuzzy problem candidate result with the highest score as output result of the user input information; when the fuzzy problem candidate result with the highest score is smaller than or equal to the first threshold value of the fuzzy problem candidate result, judging whether the Lucene problem candidate result with the highest score is greater than a second threshold value of the Lucene problem candidate result or not; when the Lucene problem candidate result with the highest score is greater than the second threshold value of the Lucene problem candidate result, using the Lucene problem candidate result with the highest score as the output result of the user input information. The method solves the problem of the prior art that the information matching of the questioning and answering system cannot be efficiently and accurately realized.

Description

Based on the method and apparatus of the information matches of question answering system
Technical field
The application relates to question and answer areas of information technology, particularly relates to a kind of method and apparatus of the information matches based on question answering system.
Background technology
In recent years, along with the development of natural language processing technique, Intelligent Answer System has been a great concern, fashionable from chat software ' little Huang chicken ', and to being popular in the response robot of each macroreticular platform, Intelligent Answer System is applied in various fields.
In prior art, usually adopt the process of question answering system realization to user's input information as shown in Figure 1.
Step 101: question answering system receives user's input information.
Step 102: analyze the user's input information received, analyzes and comprises a series of pre-service such as keyword extraction and keyword expansion, obtain pretreated user's input information.
Step 103: pretreated user's input information is carried out lucene retrieval, obtains multiple problem candidate result of user's input information, selects the optimum answer of user's input information from multiple problem candidate result.
Obtained the optimum answer of user's input information by Keywords matching and keyword expansion in above-mentioned retrieval, but because in Chinese, expression way is flexible, the position with the appearance of identical its keyword of semantic sentence is also indefinite, keyword Match in sequence often can not meet retrieval requirement, therefore, undertaken retrieving and mating by keyword, although this algorithm is simple, rest on the top layer of language after all, and do not touch semanteme, therefore, the optimum answer that question answering system provides is not often the answer required for user.
Summary of the invention
The application provides a kind of method and apparatus of the information matches based on question answering system, cannot the problem realizing the information matches of question answering system of efficiently and accurately to solve in prior art.
In order to solve the problem, this application discloses a kind of information matching method based on question answering system, comprising: obtain the highest fuzzy problem candidate result of the marking of user's input information and the highest Lucene problem candidate result of marking;
Judge whether the highest fuzzy problem candidate result of described marking is greater than the first threshold of fuzzy problem candidate result;
If the highest fuzzy problem candidate result of giving a mark is greater than first threshold, then using the Output rusults of fuzzy problem candidate result the highest for described marking as described user's input information;
If when the highest fuzzy problem candidate result of described marking is less than or equal to first threshold, then obtain the highest Lucene problem candidate result of marking;
Judge whether the highest Lucene problem candidate result of marking is greater than the Second Threshold of Lucene problem candidate result, if so, then using the Output rusults of Lucene problem candidate result the highest for described marking as described user's input information.
Preferably, if when the highest Lucene problem candidate result of described marking is less than or equal to Second Threshold, then exports and do not find user's input information.
Preferably, comprising: use genetic algorithm to calculate the value of first threshold and Second Threshold, wherein, described genetic algorithm comprises: selection opertor, crossover operator and mutation operator.
Preferably, the value step using genetic algorithm to calculate first threshold and Second Threshold comprises:
Described first threshold and Second Threshold are carried out random assignment, obtains multiple one-dimension array;
Selection opertor is used to select to meet multiple one-dimension array of certain standard;
Use crossover operator process to meet multiple one-dimension array of certain standard, obtain the one-dimension array after multiple intersection;
Use the one-dimension array after multiple intersection described in mutation operator process, obtain the one-dimension array after multiple variation;
One-dimension array after meeting multiple one-dimension array of certain standard, multiple intersection and the one-dimension array after multiple variation are input in question answering system, obtain the accuracy rate of user's input information;
At least two accuracys rate are filtered out from the accuracy rate after sequence;
One-dimension array corresponding at least two accuracys rate is repeated above operation, until find one to restrain accuracy rate;
Using the value of one-dimension array corresponding for described convergence accuracy rate as first threshold and Second Threshold.
This application discloses a kind of information matching method based on question answering system, comprising: the multiple Lucene problem candidate result obtained are divided into field question candidate result and the problem of chat candidate result according to dissimilar;
Utilize fuzzy algorithm to give a mark to described multiple field question candidate result, obtain the field question candidate result that marking is the highest;
Judge whether the highest field question candidate result of described marking is greater than the first threshold of fuzzy problem candidate result;
If the highest field question candidate result of giving a mark is greater than first threshold, then using the Output rusults of field question candidate result the highest for described marking as described user's input information;
If when the highest field question candidate result of giving a mark is less than or equal to first threshold, then exports and do not find user's input information;
Utilize fuzzy algorithm to give a mark to described multiple chat problem candidate result, obtain the chat problem candidate result that marking is the highest;
Judge whether the highest chat problem candidate result of marking is greater than the 3rd threshold value of chat problem candidate result;
If give a mark, the highest chat problem candidate result is greater than the 3rd threshold value, then using the Output rusults of chat problem candidate result the highest for described marking as user's input information;
When the highest chat problem candidate result is less than or equal to the 3rd threshold value if give a mark, then obtain the highest Lucene problem candidate result of marking;
Judge whether the highest Lucene problem candidate result of marking is greater than the Second Threshold of Lucene problem candidate result, if so, then using the Output rusults of Lucene problem candidate result the highest for described marking as described user's input information.
Preferably, comprising: if when the highest Lucene problem candidate result of described marking is less than or equal to Second Threshold, then exports and do not find user's input information.
Preferably, comprising: use genetic algorithm to calculate the value of first threshold, Second Threshold and the 3rd threshold value, wherein, described genetic algorithm comprises: selection opertor, crossover operator and mutation operator.
Preferably, comprising: the value step using genetic algorithm to calculate first threshold, Second Threshold and the 3rd threshold value comprises:
Described first threshold, Second Threshold and the 3rd threshold value are carried out random assignment, obtains multiple one-dimension array;
Use selection opertor from described multiple one-dimension array, select the multiple one-dimension array meeting certain standard;
Use crossover operator process meet certain standard multiple one-dimension array, obtain the one-dimension array after multiple intersection;
Use the one-dimension array after multiple intersection described in mutation operator process, obtain the one-dimension array after multiple variation;
One-dimension array after meeting multiple one-dimension array of certain standard, multiple intersection and the one-dimension array after multiple variation are input in question answering system, obtain the accuracy rate of user's input information;
In-service evaluation function filters out at least two accuracys rate from described accuracy rate;
One-dimension array corresponding at least two accuracys rate is repeated above operation, until find one to restrain accuracy rate;
Using the value of one-dimension array corresponding for described convergence accuracy rate as first threshold, Second Threshold and the 3rd threshold value.
Preferably, the function formula of described evaluation function is:
Acc = 2 TP 2 TP + FP + FN + TR SUM
Wherein, TP is the number that field question candidate result is determined field question candidate result, FP is the number that chat problem candidate result is judged to be field question candidate result, FN is the number that field question candidate result is judged to be chat problem candidate result, TR is the problem number that question answering system is correctly answered, SUM is Issue Totals, and Acc is accuracy rate.
In order to solve the problem, disclosed herein as well is a kind of information matches device based on question answering system, comprising: the first acquisition module, obtain the highest fuzzy problem candidate result of the marking of user's input information and the highest Lucene problem candidate result of marking;
First judge module, for judging whether the highest fuzzy problem candidate result of described marking is greater than the first threshold of fuzzy problem candidate result;
If the highest fuzzy problem candidate result of giving a mark is greater than first threshold, then using the Output rusults of fuzzy problem candidate result the highest for described marking as described user's input information;
If when the highest fuzzy problem candidate result of described marking is less than or equal to first threshold, then obtain the highest Lucene problem candidate result of marking;
Second judge module, for judging whether the Lucene problem candidate result of giving a mark the highest is greater than the Second Threshold of Lucene problem candidate result, if so, then using the Output rusults of Lucene problem candidate result the highest for described marking as described user's input information.
Compared with prior art, the application comprises following advantage:
First, user's input information is carried out fuzzy search and obtain multiple fuzzy problem candidate result, obtain the fuzzy problem candidate result that marking is the highest, by comparing of the highest fuzzy problem candidate result of marking and first threshold, if the highest fuzzy problem candidate result of giving a mark is greater than first threshold, then using the highest fuzzy problem candidate result of marking as the Output rusults of described user's input information, can be retrieved the different word orders of user's input information by fuzzy search.
Secondly, the application is by carrying out fuzzy search to user's input information, obtain multiple fuzzy problem candidate result, and carry out comparing of Lucene problem candidate result according to the fuzzy problem candidate of the highest marking obtained with the comparative result of first threshold, according to the comparative result of the highest Lucene problem candidate result of marking obtained and the Second Threshold Output rusults as user's input information, compared by the threshold value of the problem candidate recognition result of user's input information being carried out to twice, ensure that the accuracy of match information, thus improve the accuracy rate to user's input information coupling of question answering system.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of question answering system in prior art;
Fig. 2 is the process flow diagram of a kind of information matching method based on question answering system in the embodiment of the present application one;
Fig. 3 is the process flow diagram of a kind of information matching method based on question answering system in the embodiment of the present application two;
Fig. 4 is the genetic algorithm process flow diagram of the application;
Fig. 5 is the process flow diagram of a kind of information matching method based on question answering system in the embodiment of the present application three;
Fig. 6 is the process flow diagram of a kind of information matching method based on question answering system in the embodiment of the present application four;
Fig. 7 is that the application uses genetic algorithm to calculate the schematic diagram of the example of the value application of first threshold, Second Threshold and the 3rd threshold value;
Fig. 8 is the structured flowchart of the embodiment of the present application kind on May Day based on the information matches device of question answering system;
Fig. 9 is the structured flowchart of the embodiment of the present application 61 kinds based on the information matches device of question answering system.
Embodiment
For enabling above-mentioned purpose, the feature and advantage of the application more become apparent, below in conjunction with the drawings and specific embodiments, the application is described in further detail.
With reference to Fig. 2, show a kind of information matching method based on question answering system of the application, comprising:
Step 201: obtain fuzzy problem candidate result.
Carry out Lucene retrieval according to user's input information, obtaining multiple Lucene problem candidate result, multiple Lucene problem candidate result is being carried out fuzzy search, obtain multiple fuzzy problem candidate result of user's input information.
Obtain the highest fuzzy problem candidate result of the marking of user's input information and the highest Lucene problem candidate result of marking.
Step 202: judge whether the highest fuzzy problem candidate result of described marking is greater than the first threshold of fuzzy problem candidate result, if the highest fuzzy problem candidate result of marking is greater than first threshold, then performs step 203; If the highest fuzzy problem candidate result of giving a mark is less than or equal to first threshold, then perform step 204.
Step 203: using the Output rusults of fuzzy problem candidate result the highest for marking as described user's input information, flow process terminates.
Step 204: obtain the Lucene problem candidate result that marking is the highest, and perform step 205.
Step 205: judge whether the highest Lucene problem candidate result of marking is greater than the Second Threshold of Lucene problem candidate result, if so, then performs step 206, if not, then perform step 207.
Step 206: using the Output rusults of Lucene problem candidate result the highest for marking as user's input information.
Step 207: if when the highest Lucene problem candidate result of marking is less than or equal to Second Threshold, then exports and do not find user's input information.
Pass through the present embodiment, first, user's input information is carried out fuzzy search and obtain multiple fuzzy problem candidate result, obtain the fuzzy problem candidate result that marking is the highest, by comparing of the highest fuzzy problem candidate result of marking and first threshold, if the highest fuzzy problem candidate result of giving a mark is greater than first threshold, then using the highest fuzzy problem candidate result of marking as the Output rusults of described user's input information, can be retrieved the different word orders of user's input information by fuzzy search.
Secondly, the application is by carrying out fuzzy search to user's input information, obtain multiple fuzzy problem candidate result, and carry out comparing of Lucene problem candidate result according to the fuzzy problem candidate of the highest marking obtained with the comparative result of first threshold, according to the comparative result of the highest Lucene problem candidate result of marking obtained and the Second Threshold Output rusults as user's input information, compared by the threshold value of the problem candidate recognition result of user's input information being carried out to twice, ensure that the accuracy of match information, thus improve the accuracy rate to user's input information coupling of question answering system.
With reference to Fig. 3, show the process flow diagram of a kind of information matching method based on question answering system in the embodiment of the present application two.
In the present embodiment, a kind of information matching method based on question answering system, comprising:
Step 301: obtain fuzzy problem candidate result.
Step 302: judge whether the highest fuzzy problem candidate result of described marking is greater than the first threshold of fuzzy problem candidate result.
Step 303: if when the highest fuzzy problem candidate result of described marking is less than or equal to first threshold, obtain the Lucene problem candidate result that marking is the highest.
Step 304: judge whether the highest Lucene problem candidate result of marking is greater than the Second Threshold of Lucene problem candidate result.
Step 305: use genetic algorithm to calculate the value of first threshold and Second Threshold, wherein, described genetic algorithm comprises: selection opertor, crossover operator and mutation operator.
Genetic algorithm (Genetic Algorithms, GA) is the randomization searching algorithm that a class uses for reference organic sphere natural selection and natural genetic mechanism.This genetic algorithm simulating nature select and occur in natural genetic process breeding, intersection and gene mutation phenomenon, all retain one group of candidate solution in each iteration, and choose preferably individual from Xie Qunzhong by appropriateness value valuation functions, genetic operator (selection, crossover and mutation) is utilized to combine these individualities, produce the candidate solution group of a new generation, repeat this process, until meet certain convergence index.
With reference to Fig. 4, show the process flow diagram of the genetic algorithm realizing the application's scheme, the value step using genetic algorithm to calculate first threshold and Second Threshold comprises:
Step 401: described first threshold and Second Threshold are carried out random assignment, obtains multiple one-dimension array, such as: [0.5 0.5], [0.6 0.7], [0.3 0.7].
Step 402: use selection opertor to select to meet multiple one-dimension array of certain standard.
Selection opertor is the ability of searching optimum in order to ensure genetic algorithm, and adopt optimum individual to preserve operator, the optimum individual namely in parent colony directly enters in progeny population, ensure the individuality that obtains in genetic process can not by crossover and mutation operate destroy.
Multiple one-dimension array of certain standard can be understood as the one-dimension array that ideal adaptation degree is high or ideal adaptation degree is good.
Such as: [0.5 0.5], [0.6 0.7], [0.3 0.7], [0.5 0.1], [0.6 0.4], [0.3 0.2] etc., use to think after selection opertor [0.5 0.5], [0.6 0.7], [0.3 0.7], [0.5 0.1] ideal adaptation degree high, then from multiple one-dimension array, select [0.5 0.5], [0.6 0.7], [0.3 0.7], [0.5 0.1] as follow-up multiple one-dimension array of carrying out the certain standard intersected.
Step 403: use crossover operator process to meet multiple one-dimension array of certain standard, obtain the one-dimension array after multiple intersection.
Crossover operator produces new individual main method, determines the ability of searching optimum of genetic algorithm, play a crucial role in genetic algorithm.Because parameter is not very complicated, variation is more single, so select simple and effective single-point commutating operator.
Step 404: use the one-dimension array after multiple intersection described in mutation operator process, obtain the one-dimension array after multiple variation.
Mutation operator produces new individual householder method, and plain ability is searched in the local which determining genetic algorithm.Mutation operator and crossover operator cooperatively interact, and jointly can complete the global search to search volume and Local Search.
Step 405: the one-dimension array after meeting multiple one-dimension array of certain standard, multiple intersection and the one-dimension array after multiple variation are input in question answering system, obtain the accuracy rate of user's input information.
Step 406: filter out at least two accuracys rate from the accuracy rate after sequence.
Step 407: one-dimension array corresponding at least two accuracys rate is repeated above operation (step 403 is to step 406), until find one to restrain accuracy rate.
Step 408: using the value of one-dimension array corresponding to convergence accuracy rate as first threshold and Second Threshold.
It should be noted that, it is identical that this enforcement and the embodiment of the present application one perform step, and the present embodiment two is mainly discussed in detail for the part different from embodiment one.
Pass through the present embodiment, use selection opertor, crossover operator and mutation operator process multiple one-dimension array, and use the multiple one-dimension array meeting certain standard, one-dimension array after one-dimension array after multiple intersection and multiple variation is input in question answering system, obtain the accuracy rate of user's input information, one-dimension array corresponding at least two accuracys rate is repeated above operation, until find one to restrain accuracy rate, using the value of one-dimension array corresponding to convergence accuracy rate as first threshold and Second Threshold, thus ensure that the accuracy of user's input information.
With reference to Fig. 5, show a kind of information matching method based on question answering system in the embodiment of the present application three, comprising:
Step 501: obtain problem candidate result.
The field database retrieved by Lucene and chat database obtain field question candidate result and the problem of chat candidate result respectively.
The multiple Lucene problem candidate result obtained are divided into field question candidate result and the problem of chat candidate result according to dissimilar.
Step 502: multiple field question candidate result is given a mark.
Utilize fuzzy algorithm to give a mark to described multiple field question candidate result, obtain the mark of multiple fuzzy problem candidate result, mark is sorted according to order from high to low, obtain the field question candidate result that marking is the highest.
Step 503: judge whether the highest field question candidate result of marking is greater than the first threshold of fuzzy problem candidate result, if the highest field question candidate result of marking is greater than first threshold, then perform step 504; If the highest field question candidate result of giving a mark is less than or equal to first threshold, then perform step 505.
Step 504: using the Output rusults of field question candidate result the highest for marking as user's input information, flow process terminates.
Step 505: export and do not find user's input information, flow process terminates.
Step 506: multiple chat problem candidate result is given a mark.
Utilize fuzzy algorithm to give a mark to multiple chat problem candidate result, obtain the mark of multiple chat problem candidate result, mark is sorted according to order from high to low, obtain the chat problem candidate result that marking is the highest.
Step 507: judge whether the highest chat problem candidate result of marking is greater than the 3rd threshold value of chat problem candidate result, if the highest chat problem candidate result of marking is greater than the 3rd threshold value, then perform step 508; If give a mark, the highest chat problem candidate result is less than or equal to the 3rd threshold value, then perform step 509.
Step 508: using the Output rusults of chat problem candidate result the highest for marking as user's input information, flow process terminates.
Step 509: obtain the Lucene problem candidate result that marking is the highest, and perform step 510.
Step 510: judge whether the highest Lucene problem candidate result of marking is greater than the Second Threshold of Lucene problem candidate result, if so, then performs step 511; If not, then step 503 is performed.
Step 511: using the Output rusults of Lucene problem candidate result the highest for marking as user's input information.
Pass through the present embodiment, first, the multiple Lucene problem candidate result obtained are divided into field question candidate result and the problem of chat candidate result according to dissimilar, by comparing of field question candidate result and first threshold, if field question candidate result is greater than first threshold, then using the Output rusults of field question candidate result the highest for this marking as user's input information, owing to adding the retrieval of field question candidate result to Lucene problem candidate result, thus improve the precision ratio to user's input information.
Secondly, the application, by carrying out repeatedly the comparison of threshold value to the problem of chat candidate result, ensure that the accuracy to user's input information coupling, thus improves the accuracy rate to user's input information coupling of question answering system.
With reference to Fig. 6, show the process flow diagram of a kind of information matching method based on question answering system in the embodiment of the present application four.
In the present embodiment, a kind of information matching method based on question answering system, comprise: use genetic algorithm to calculate the value of first threshold, Second Threshold and the 3rd threshold value, wherein, genetic algorithm comprises: selection opertor, crossover operator and mutation operator, first threshold and the 3rd threshold value are set between 0-1 and change, and Second Threshold is set between 0-3 and changes.
The value step using genetic algorithm to calculate first threshold, Second Threshold and the 3rd threshold value comprises:
Step 601: described first threshold, Second Threshold and the 3rd threshold value are carried out random assignment, obtains multiple one-dimension array.
Step 602: use selection opertor to select the multiple one-dimension array meeting certain standard from described multiple one-dimension array.
Step 603: use crossover operator process to meet multiple one-dimension array of certain standard, obtain the one-dimension array after multiple intersection.
Step 604: use the one-dimension array after multiple intersection described in mutation operator process, obtain the one-dimension array after multiple variation.
Step 605: the one-dimension array after meeting multiple one-dimension array of certain standard, multiple intersection and the one-dimension array after multiple variation are input in question answering system, obtain the accuracy rate of user's input information.
Step 606: in-service evaluation function filters out at least two accuracys rate from described accuracy rate.
Step 607: one-dimension array corresponding at least two accuracys rate is repeated above operation (step 603-step 606), until find one to restrain accuracy rate.
Step 608: using the value of one-dimension array corresponding for described convergence accuracy rate as first threshold, Second Threshold and the 3rd threshold value.
Preferably, the function formula of described evaluation function is:
Acc = 2 TP 2 TP + FP + FN + TR SUM
Wherein, TP is the number that field question candidate result is determined field question candidate result, FP is the number that chat problem candidate result is judged to be field question candidate result, FN is the number that field question candidate result is judged to be chat problem candidate result, TR is the problem number that question answering system is correctly answered, SUM is Issue Totals, and Acc is accuracy rate.
Pass through the present embodiment, genetic algorithm is used to calculate the value of first threshold, Second Threshold and the 3rd threshold value, and in-service evaluation function filters out at least two accuracys rate from accuracy rate, one-dimension array corresponding at least two accuracys rate is repeated the operations such as selection opertor, crossover operator and mutation operator, until find one to restrain accuracy rate, using the value of one-dimension array corresponding for convergence accuracy rate as first threshold, Second Threshold and the 3rd threshold value, thus ensure that the accuracy of user's input information.
In order to those skilled in the art better understand the value technical scheme of the calculating of use genetic algorithm first threshold, Second Threshold and the 3rd threshold value that the application limits, show the application see Fig. 7 and use genetic algorithm to calculate the schematic diagram of the example of the value application of first threshold, Second Threshold and the 3rd threshold value.
The application uses the basic ideas of genetic algorithm technical solution problem to be: for given F1, F2 and L1 produces four rational numbers at random, form initial parent sequence, then new progeny sequences is produced by the crossover and mutation of genetic algorithm, all filial generations and parent are imported in question answering system, draw the accuracy rate (Acc) of question answering system on exploitation collection, obtain four optimum composite sequences by selection opertor, then carry out repetitive exercise as initial parent.
Step 701: to first threshold F1, Second Threshold L1 and the 3rd threshold value F2, carry out random assignment, obtain multiple one-dimension array [0.5 0.5 0], [0.6 0.5 2], [0.5 0.6 0.4], [0.5 0.7 0.5].
Step 702: use crossover operator process to meet multiple one-dimension array of certain standard, obtain the one-dimension array after multiple intersection
Step 703: use mutation operator process obtain the one-dimension array [0.5 0.6 2] after multiple variation, [0.7 0.8 0.5], [0.5 0.5 0], [0.6 0.5 0.4].
Step 704: by the one-dimension array after multiple one-dimension array [0.5 0.5 0] of certain standard, [0.6 0.5 2], [0.5 0.6 0.4], [0.5 0.7 0.5], multiple intersection be input in question answering system with the one-dimension array [0.5 0.6 2] after multiple variation, [0.7 0.8 0.5], [0.5 0.5 0], [0.6 0.5 0.4], obtain the accuracy rate Acc1 to Acc8 of user's input information.
Step 705: by accuracy rate according to sorting from high to low, in-service evaluation function from Acc1 to Acc8 filter out one-dimension array [0.5 0.6 2] corresponding to 4 accuracys rate, [0.7 0.8 0.5], [0.5 0.5 0], [0.6 0.5 0.4.
Step 706: obtain restraining accuracy rate.
[0.5 0.6 2], [0.7 0.8 0.5], [0.5 0.5 0], [0.6 0.5 0.4] repeated execution of steps 701 to step 705 are obtained restraining accuracy rate.
Step 707: using the value of one-dimension array corresponding for convergence accuracy rate as first threshold, Second Threshold and the 3rd threshold value.
Based on the explanation of said method embodiment, present invention also provides the embodiment of a kind of information matches device based on question answering system accordingly, realize the content described in said method embodiment.
See Fig. 8, show the structured flowchart of a kind of information matches device based on question answering system in the embodiment of the present application five, specifically can comprise:
First acquisition module 801, obtains the highest fuzzy problem candidate result of the marking of user's input information and the highest Lucene problem candidate result of marking.
First judge module 802, for judging whether the highest fuzzy problem candidate result of described marking is greater than the first threshold of fuzzy problem candidate result.
If the highest fuzzy problem candidate result of giving a mark is greater than first threshold, then using the Output rusults of fuzzy problem candidate result the highest for described marking as described user's input information.
If when the highest fuzzy problem candidate result of described marking is less than or equal to first threshold, then obtain the highest Lucene problem candidate result of marking.
Second judge module 803, for judging whether the Lucene problem candidate result of giving a mark the highest is greater than the Second Threshold of Lucene problem candidate result, if so, then using the Output rusults of Lucene problem candidate result the highest for described marking as described user's input information.
Preferably, if when the Lucene problem candidate result that the described marking in the second judge module is the highest is less than or equal to Second Threshold, then exports and do not find user's input information.
Preferably, use genetic algorithm to calculate the value of first threshold and Second Threshold, wherein, described genetic algorithm comprises: selection opertor, crossover operator and mutation operator.
Preferably, the value using genetic algorithm to calculate first threshold and Second Threshold comprises:
Described first threshold and Second Threshold are carried out random assignment, obtains multiple one-dimension array.
Selection opertor is used to select to meet multiple one-dimension array of certain standard.
Use crossover operator process to meet multiple one-dimension array of certain standard, obtain the one-dimension array after multiple intersection.
Use the one-dimension array after multiple intersection described in mutation operator process, obtain the one-dimension array after multiple variation.
One-dimension array after meeting multiple one-dimension array of certain standard, multiple intersection and the one-dimension array after multiple variation are input in question answering system, obtain the accuracy rate of user's input information.
At least two accuracys rate are filtered out from the accuracy rate after sequence.
One-dimension array corresponding at least two accuracys rate is repeated above operation, until find one to restrain accuracy rate.
Using the value of one-dimension array corresponding for described convergence accuracy rate as first threshold and Second Threshold.
In sum, the information matches device of a kind of question answering system of the embodiment of the present application mainly comprises following advantage:
First, user's input information is carried out fuzzy search and obtain multiple fuzzy problem candidate result, obtain the fuzzy problem candidate result that marking is the highest, by comparing of the highest fuzzy problem candidate result of marking and first threshold, if the highest fuzzy problem candidate result of giving a mark is greater than first threshold, then using the highest fuzzy problem candidate result of marking as the Output rusults of described user's input information, can be retrieved the different word orders of user's input information by fuzzy search.
Secondly, the application is by carrying out fuzzy search to user's input information, obtain multiple fuzzy problem candidate result, and carry out comparing of Lucene problem candidate result according to the fuzzy problem candidate of the highest marking obtained with the comparative result of first threshold, according to the comparative result of the highest Lucene problem candidate result of marking obtained and the Second Threshold Output rusults as user's input information, compared by the threshold value of the problem candidate recognition result of user's input information being carried out to twice, ensure that the accuracy of match information, thus improve the accuracy rate to user's input information coupling of question answering system
See Fig. 9, show the structured flowchart of a kind of information matches device based on question answering system in the embodiment of the present application six, specifically can comprise: the second acquisition module 901, for the multiple Lucene problem candidate result obtained are divided into field question candidate result and the problem of chat candidate result according to dissimilar.
3rd scoring modules 902, for utilizing fuzzy algorithm to give a mark to described multiple field question candidate result, obtains the field question candidate result that marking is the highest.
Second judge module 903, for judging whether the highest field question candidate result of described marking is greater than the first threshold of fuzzy problem candidate result.
If the highest field question candidate result of giving a mark is greater than first threshold, then using the Output rusults of field question candidate result the highest for described marking as described user's input information.
If when the highest field question candidate result of giving a mark is less than or equal to first threshold, then exports and do not find user's input information.
4th scoring modules 904, for utilizing fuzzy algorithm to give a mark to described multiple chat problem candidate result, obtains the chat problem candidate result that marking is the highest.
3rd judge module 905, for judging whether the chat problem candidate result of giving a mark the highest is greater than the 3rd threshold value of chat problem candidate result.
If give a mark, the highest chat problem candidate result is greater than the 3rd threshold value, then using the Output rusults of chat problem candidate result the highest for described marking as user's input information.
When the highest chat problem candidate result is less than or equal to the 3rd threshold value if give a mark, then obtain the highest Lucene problem candidate result of marking.
4th judge module 906, for judging whether the Lucene problem candidate result of giving a mark the highest is greater than the Second Threshold of Lucene problem candidate result, if so, then using the Output rusults of Lucene problem candidate result the highest for described marking as described user's input information.
Preferably, if when the Lucene problem candidate result that the described marking in the 3rd judge module is the highest is less than or equal to Second Threshold, then exports and do not find user's input information.
Preferably, use genetic algorithm to calculate the value of first threshold, Second Threshold and the 3rd threshold value, wherein, described genetic algorithm comprises: selection opertor, crossover operator and mutation operator.
Preferably, the value using genetic algorithm to calculate first threshold, Second Threshold and the 3rd threshold value comprises:
Described first threshold, Second Threshold and the 3rd threshold value are carried out random assignment, obtains multiple one-dimension array.
Use selection opertor from described multiple one-dimension array, select the multiple one-dimension array meeting certain standard.
Use crossover operator process meet certain standard multiple one-dimension array, obtain the one-dimension array after multiple intersection.
Use the one-dimension array after multiple intersection described in mutation operator process, obtain the one-dimension array after multiple variation.
One-dimension array after meeting multiple one-dimension array of certain standard, multiple intersection and the one-dimension array after multiple variation are input in question answering system, obtain the accuracy rate of user's input information.
In-service evaluation function filters out at least two accuracys rate from described accuracy rate.
One-dimension array corresponding at least two accuracys rate is repeated above operation, until find one to restrain accuracy rate.
Using the value of one-dimension array corresponding for described convergence accuracy rate as first threshold, Second Threshold and the 3rd threshold value.
Preferably, the function formula of described evaluation function is:
Acc = 2 TP 2 TP + FP + FN + TR SUM
Wherein, TP is the number that field question candidate result is determined field question candidate result, FP is the number that chat problem candidate result is judged to be field question candidate result, FN is the number that field question candidate result is judged to be chat problem candidate result, TR is the problem number that question answering system is correctly answered, SUM is Issue Totals, and Acc is accuracy rate.
In sum, a kind of information matches device based on question answering system of the embodiment of the present application mainly comprises following advantage:
First, the multiple Lucene problem candidate result obtained are divided into field question candidate result and the problem of chat candidate result according to dissimilar, by comparing of field question candidate result and first threshold, if field question candidate result is greater than first threshold, then using the Output rusults of field question candidate result the highest for this marking as user's input information, owing to adding the retrieval of field question candidate result to Lucene problem candidate result, thus improve the precision ratio to user's input information.
Secondly, the application, by carrying out repeatedly the comparison of threshold value to the problem of chat candidate result, ensure that the accuracy to user's input information coupling, thus improves the accuracy rate to user's input information coupling of question answering system.
For device embodiment, due to itself and embodiment of the method basic simlarity, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar part mutually see.
A kind of information matching method based on question answering system above the application provided and device, be described in detail, apply specific case herein to set forth the principle of the application and embodiment, the explanation of above embodiment is just for helping method and the core concept thereof of understanding the application; Meanwhile, for one of ordinary skill in the art, according to the thought of the application, all will change in specific embodiments and applications, in sum, this description should not be construed as the restriction to the application.

Claims (10)

1. based on an information matching method for question answering system, it is characterized in that, comprising:
Obtain the highest fuzzy problem candidate result of the marking of user's input information and the highest Lucene problem candidate result of marking;
Judge whether the highest fuzzy problem candidate result of described marking is greater than the first threshold of fuzzy problem candidate result;
If the highest fuzzy problem candidate result of giving a mark is greater than first threshold, then using the Output rusults of fuzzy problem candidate result the highest for described marking as described user's input information;
If when the highest fuzzy problem candidate result of described marking is less than or equal to first threshold, then obtain the highest Lucene problem candidate result of marking;
Judge whether the highest Lucene problem candidate result of marking is greater than the Second Threshold of Lucene problem candidate result, if so, then using the Output rusults of Lucene problem candidate result the highest for described marking as described user's input information.
2. method according to claim 1, is characterized in that, comprising:
If when the highest Lucene problem candidate result of described marking is less than or equal to Second Threshold, then exports and do not find user's input information.
3. method according to claim 1, is characterized in that, comprising:
Use genetic algorithm to calculate the value of first threshold and Second Threshold, wherein, described genetic algorithm comprises: selection opertor, crossover operator and mutation operator.
4. method according to claim 3, is characterized in that, the value step using genetic algorithm to calculate first threshold and Second Threshold comprises:
Described first threshold and Second Threshold are carried out random assignment, obtains multiple one-dimension array;
Selection opertor is used to select to meet multiple one-dimension array of certain standard;
Use crossover operator process to meet multiple one-dimension array of certain standard, obtain the one-dimension array after multiple intersection;
Use the one-dimension array after multiple intersection described in mutation operator process, obtain the one-dimension array after multiple variation;
One-dimension array after meeting multiple one-dimension array of certain standard, multiple intersection and the one-dimension array after multiple variation are input in question answering system, obtain the accuracy rate of user's input information;
At least two accuracys rate are filtered out from the accuracy rate after sequence;
One-dimension array corresponding at least two accuracys rate is repeated above operation, until find one to restrain accuracy rate;
Using the value of one-dimension array corresponding for described convergence accuracy rate as first threshold and Second Threshold.
5. an information matching method for question answering system, is characterized in that, comprising:
The multiple Lucene problem candidate result obtained are divided into field question candidate result and the problem of chat candidate result according to dissimilar;
Utilize fuzzy algorithm to give a mark to described multiple field question candidate result, obtain the field question candidate result that marking is the highest;
Judge whether the highest field question candidate result of described marking is greater than the first threshold of fuzzy problem candidate result;
If the highest field question candidate result of giving a mark is greater than first threshold, then using the Output rusults of field question candidate result the highest for described marking as described user's input information;
If when the highest field question candidate result of giving a mark is less than or equal to first threshold, then exports and do not find user's input information;
Utilize fuzzy algorithm to give a mark to described multiple chat problem candidate result, obtain the chat problem candidate result that marking is the highest;
Judge whether the highest chat problem candidate result of marking is greater than the 3rd threshold value of chat problem candidate result;
If give a mark, the highest chat problem candidate result is greater than the 3rd threshold value, then using the Output rusults of chat problem candidate result the highest for described marking as user's input information;
When the highest chat problem candidate result is less than or equal to the 3rd threshold value if give a mark, then obtain the highest Lucene problem candidate result of marking;
Judge whether the highest Lucene problem candidate result of marking is greater than the Second Threshold of Lucene problem candidate result, if so, then using the Output rusults of Lucene problem candidate result the highest for described marking as described user's input information.
6. method according to claim 5, is characterized in that, comprising:
If when the highest Lucene problem candidate result of described marking is less than or equal to Second Threshold, then exports and do not find user's input information.
7. method according to claim 5, is characterized in that, comprising:
Use genetic algorithm to calculate the value of first threshold, Second Threshold and the 3rd threshold value, wherein, described genetic algorithm comprises: selection opertor, crossover operator and mutation operator.
8. method according to claim 7, is characterized in that, comprising:
The value step using genetic algorithm to calculate first threshold, Second Threshold and the 3rd threshold value comprises:
Described first threshold, Second Threshold and the 3rd threshold value are carried out random assignment, obtains multiple one-dimension array;
Use selection opertor from described multiple one-dimension array, select the multiple one-dimension array meeting certain standard;
Use crossover operator process meet certain standard multiple one-dimension array, obtain the one-dimension array after multiple intersection;
Use the one-dimension array after multiple intersection described in mutation operator process, obtain the one-dimension array after multiple variation;
One-dimension array after meeting multiple one-dimension array of certain standard, multiple intersection and the one-dimension array after multiple variation are input in question answering system, obtain the accuracy rate of user's input information;
In-service evaluation function filters out at least two accuracys rate from described accuracy rate;
One-dimension array corresponding at least two accuracys rate is repeated above operation, until find one to restrain accuracy rate;
Using the value of one-dimension array corresponding for described convergence accuracy rate as first threshold, Second Threshold and the 3rd threshold value.
9. method according to claim 8, is characterized in that, the function formula of described evaluation function is:
Acc = 2 TP 2 TP + FP + FN + TR SUM
Wherein, TP is the number that field question candidate result is determined field question candidate result, FP is the number that chat problem candidate result is judged to be field question candidate result, FN is the number that field question candidate result is judged to be chat problem candidate result, TR is the problem number that question answering system is correctly answered, SUM is Issue Totals, and Acc is accuracy rate.
10., based on an information matches device for question answering system, it is characterized in that, comprising:
First acquisition module, obtains the highest fuzzy problem candidate result of the marking of user's input information and the highest Lucene problem candidate result of marking;
First judge module, for judging whether the highest fuzzy problem candidate result of described marking is greater than the first threshold of fuzzy problem candidate result;
If the highest fuzzy problem candidate result of giving a mark is greater than first threshold, then using the Output rusults of fuzzy problem candidate result the highest for described marking as described user's input information;
If when the highest fuzzy problem candidate result of described marking is less than or equal to first threshold, then obtain the highest Lucene problem candidate result of marking;
Second judge module, for judging whether the Lucene problem candidate result of giving a mark the highest is greater than the Second Threshold of Lucene problem candidate result, if so, then using the Output rusults of Lucene problem candidate result the highest for described marking as described user's input information.
CN201410800479.7A 2014-12-18 2014-12-18 The method and apparatus of information matches based on question answering system Active CN104572868B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410800479.7A CN104572868B (en) 2014-12-18 2014-12-18 The method and apparatus of information matches based on question answering system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410800479.7A CN104572868B (en) 2014-12-18 2014-12-18 The method and apparatus of information matches based on question answering system

Publications (2)

Publication Number Publication Date
CN104572868A true CN104572868A (en) 2015-04-29
CN104572868B CN104572868B (en) 2017-11-03

Family

ID=53088930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410800479.7A Active CN104572868B (en) 2014-12-18 2014-12-18 The method and apparatus of information matches based on question answering system

Country Status (1)

Country Link
CN (1) CN104572868B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271459A (en) * 2018-09-18 2019-01-25 四川长虹电器股份有限公司 Chat robots and its implementation based on Lucene and grammer networks
CN110019736A (en) * 2017-12-29 2019-07-16 北京京东尚科信息技术有限公司 Question and answer matching process, system, equipment and storage medium based on language model
CN114845128A (en) * 2022-04-22 2022-08-02 咪咕文化科技有限公司 Bullet screen interaction method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101086843A (en) * 2006-06-07 2007-12-12 中国科学院自动化研究所 A sentence similarity recognition method for voice answer system
CN102236677A (en) * 2010-04-28 2011-11-09 北京大学深圳研究生院 Question answering system-based information matching method and system
US20120078902A1 (en) * 2010-09-24 2012-03-29 International Business Machines Corporation Providing question and answers with deferred type evaluation using text with limited structure
CN104050224A (en) * 2013-03-15 2014-09-17 国际商业机器公司 Combining different type coercion components for deferred type evaluation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101086843A (en) * 2006-06-07 2007-12-12 中国科学院自动化研究所 A sentence similarity recognition method for voice answer system
CN102236677A (en) * 2010-04-28 2011-11-09 北京大学深圳研究生院 Question answering system-based information matching method and system
US20120078902A1 (en) * 2010-09-24 2012-03-29 International Business Machines Corporation Providing question and answers with deferred type evaluation using text with limited structure
CN104050224A (en) * 2013-03-15 2014-09-17 国际商业机器公司 Combining different type coercion components for deferred type evaluation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
吴全娥: ""汉语句子相似度计算及其在自动问答系统中的应用"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
宗裕朋: ""基于本体的中文智能答疑系统研究与实现"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
郑诚 等: ""改进的VSM算法及其在FAQ中的应用"", 《计算机工程》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019736A (en) * 2017-12-29 2019-07-16 北京京东尚科信息技术有限公司 Question and answer matching process, system, equipment and storage medium based on language model
CN110019736B (en) * 2017-12-29 2021-10-01 北京京东尚科信息技术有限公司 Question-answer matching method, system, equipment and storage medium based on language model
CN109271459A (en) * 2018-09-18 2019-01-25 四川长虹电器股份有限公司 Chat robots and its implementation based on Lucene and grammer networks
CN114845128A (en) * 2022-04-22 2022-08-02 咪咕文化科技有限公司 Bullet screen interaction method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN104572868B (en) 2017-11-03

Similar Documents

Publication Publication Date Title
CN106815252B (en) Searching method and device
CN101751455B (en) Method for automatically generating title by adopting artificial intelligence technology
CN108846029B (en) Information correlation analysis method based on knowledge graph
US20150074112A1 (en) Multimedia Question Answering System and Method
CN104598611B (en) The method and system being ranked up to search entry
CN105659225A (en) Query expansion and query-document matching using path-constrained random walks
CN107832432A (en) A kind of search result ordering method, device, server and storage medium
CN105653706A (en) Multilayer quotation recommendation method based on literature content mapping knowledge domain
CN104360994A (en) Natural language understanding method and natural language understanding system
CN105224648A (en) A kind of entity link method and system
CN103235812B (en) Method and system for identifying multiple query intents
CN108509409A (en) A method of automatically generating semantic similarity sentence sample
CN102023986A (en) Method and equipment for constructing text classifier by referencing external knowledge
CN112328800A (en) System and method for automatically generating programming specification question answers
CN105975457A (en) Information classification prediction system based on full-automatic learning
CN112307182B (en) Question-answering system-based pseudo-correlation feedback extended query method
CN107169043A (en) A kind of knowledge point extraction method and system based on model answer
CN101719129A (en) Method for automatically extracting key words by adopting artificial intelligence technology
CN105787097A (en) Distributed index establishment method and system based on text clustering
CN101763395A (en) Method for automatically generating webpage by adopting artificial intelligence technology
CN104008187A (en) Semi-structured text matching method based on the minimum edit distance
CN109299357B (en) Laos language text subject classification method
CN101324926B (en) Method for selecting characteristic facing to complicated mode classification
Nunthanid et al. Parameter-free motif discovery for time series data
CN105260746A (en) Expandable multilayer integrated multi-mark learning system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant