CN104572868A - Method and device for information matching based on questioning and answering system - Google Patents
Method and device for information matching based on questioning and answering system Download PDFInfo
- Publication number
- CN104572868A CN104572868A CN201410800479.7A CN201410800479A CN104572868A CN 104572868 A CN104572868 A CN 104572868A CN 201410800479 A CN201410800479 A CN 201410800479A CN 104572868 A CN104572868 A CN 104572868A
- Authority
- CN
- China
- Prior art keywords
- candidate result
- threshold
- highest
- marking
- problem candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2468—Fuzzy queries
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Automation & Control Theory (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method and a device for information matching based on a questioning and answering system. The method comprises the following steps of obtaining a fuzzy problem candidate result with highest score and a Lucene problem candidate result with highest score of user input information; judging whether the fuzzy problem candidate result with the highest score is greater than a first threshold value of the fuzzy problem candidate result or not; when the fuzzy problem candidate result with the highest score is greater than a first threshold value of the fuzzy problem candidate result, using the fuzzy problem candidate result with the highest score as output result of the user input information; when the fuzzy problem candidate result with the highest score is smaller than or equal to the first threshold value of the fuzzy problem candidate result, judging whether the Lucene problem candidate result with the highest score is greater than a second threshold value of the Lucene problem candidate result or not; when the Lucene problem candidate result with the highest score is greater than the second threshold value of the Lucene problem candidate result, using the Lucene problem candidate result with the highest score as the output result of the user input information. The method solves the problem of the prior art that the information matching of the questioning and answering system cannot be efficiently and accurately realized.
Description
Technical field
The application relates to question and answer areas of information technology, particularly relates to a kind of method and apparatus of the information matches based on question answering system.
Background technology
In recent years, along with the development of natural language processing technique, Intelligent Answer System has been a great concern, fashionable from chat software ' little Huang chicken ', and to being popular in the response robot of each macroreticular platform, Intelligent Answer System is applied in various fields.
In prior art, usually adopt the process of question answering system realization to user's input information as shown in Figure 1.
Step 101: question answering system receives user's input information.
Step 102: analyze the user's input information received, analyzes and comprises a series of pre-service such as keyword extraction and keyword expansion, obtain pretreated user's input information.
Step 103: pretreated user's input information is carried out lucene retrieval, obtains multiple problem candidate result of user's input information, selects the optimum answer of user's input information from multiple problem candidate result.
Obtained the optimum answer of user's input information by Keywords matching and keyword expansion in above-mentioned retrieval, but because in Chinese, expression way is flexible, the position with the appearance of identical its keyword of semantic sentence is also indefinite, keyword Match in sequence often can not meet retrieval requirement, therefore, undertaken retrieving and mating by keyword, although this algorithm is simple, rest on the top layer of language after all, and do not touch semanteme, therefore, the optimum answer that question answering system provides is not often the answer required for user.
Summary of the invention
The application provides a kind of method and apparatus of the information matches based on question answering system, cannot the problem realizing the information matches of question answering system of efficiently and accurately to solve in prior art.
In order to solve the problem, this application discloses a kind of information matching method based on question answering system, comprising: obtain the highest fuzzy problem candidate result of the marking of user's input information and the highest Lucene problem candidate result of marking;
Judge whether the highest fuzzy problem candidate result of described marking is greater than the first threshold of fuzzy problem candidate result;
If the highest fuzzy problem candidate result of giving a mark is greater than first threshold, then using the Output rusults of fuzzy problem candidate result the highest for described marking as described user's input information;
If when the highest fuzzy problem candidate result of described marking is less than or equal to first threshold, then obtain the highest Lucene problem candidate result of marking;
Judge whether the highest Lucene problem candidate result of marking is greater than the Second Threshold of Lucene problem candidate result, if so, then using the Output rusults of Lucene problem candidate result the highest for described marking as described user's input information.
Preferably, if when the highest Lucene problem candidate result of described marking is less than or equal to Second Threshold, then exports and do not find user's input information.
Preferably, comprising: use genetic algorithm to calculate the value of first threshold and Second Threshold, wherein, described genetic algorithm comprises: selection opertor, crossover operator and mutation operator.
Preferably, the value step using genetic algorithm to calculate first threshold and Second Threshold comprises:
Described first threshold and Second Threshold are carried out random assignment, obtains multiple one-dimension array;
Selection opertor is used to select to meet multiple one-dimension array of certain standard;
Use crossover operator process to meet multiple one-dimension array of certain standard, obtain the one-dimension array after multiple intersection;
Use the one-dimension array after multiple intersection described in mutation operator process, obtain the one-dimension array after multiple variation;
One-dimension array after meeting multiple one-dimension array of certain standard, multiple intersection and the one-dimension array after multiple variation are input in question answering system, obtain the accuracy rate of user's input information;
At least two accuracys rate are filtered out from the accuracy rate after sequence;
One-dimension array corresponding at least two accuracys rate is repeated above operation, until find one to restrain accuracy rate;
Using the value of one-dimension array corresponding for described convergence accuracy rate as first threshold and Second Threshold.
This application discloses a kind of information matching method based on question answering system, comprising: the multiple Lucene problem candidate result obtained are divided into field question candidate result and the problem of chat candidate result according to dissimilar;
Utilize fuzzy algorithm to give a mark to described multiple field question candidate result, obtain the field question candidate result that marking is the highest;
Judge whether the highest field question candidate result of described marking is greater than the first threshold of fuzzy problem candidate result;
If the highest field question candidate result of giving a mark is greater than first threshold, then using the Output rusults of field question candidate result the highest for described marking as described user's input information;
If when the highest field question candidate result of giving a mark is less than or equal to first threshold, then exports and do not find user's input information;
Utilize fuzzy algorithm to give a mark to described multiple chat problem candidate result, obtain the chat problem candidate result that marking is the highest;
Judge whether the highest chat problem candidate result of marking is greater than the 3rd threshold value of chat problem candidate result;
If give a mark, the highest chat problem candidate result is greater than the 3rd threshold value, then using the Output rusults of chat problem candidate result the highest for described marking as user's input information;
When the highest chat problem candidate result is less than or equal to the 3rd threshold value if give a mark, then obtain the highest Lucene problem candidate result of marking;
Judge whether the highest Lucene problem candidate result of marking is greater than the Second Threshold of Lucene problem candidate result, if so, then using the Output rusults of Lucene problem candidate result the highest for described marking as described user's input information.
Preferably, comprising: if when the highest Lucene problem candidate result of described marking is less than or equal to Second Threshold, then exports and do not find user's input information.
Preferably, comprising: use genetic algorithm to calculate the value of first threshold, Second Threshold and the 3rd threshold value, wherein, described genetic algorithm comprises: selection opertor, crossover operator and mutation operator.
Preferably, comprising: the value step using genetic algorithm to calculate first threshold, Second Threshold and the 3rd threshold value comprises:
Described first threshold, Second Threshold and the 3rd threshold value are carried out random assignment, obtains multiple one-dimension array;
Use selection opertor from described multiple one-dimension array, select the multiple one-dimension array meeting certain standard;
Use crossover operator process meet certain standard multiple one-dimension array, obtain the one-dimension array after multiple intersection;
Use the one-dimension array after multiple intersection described in mutation operator process, obtain the one-dimension array after multiple variation;
One-dimension array after meeting multiple one-dimension array of certain standard, multiple intersection and the one-dimension array after multiple variation are input in question answering system, obtain the accuracy rate of user's input information;
In-service evaluation function filters out at least two accuracys rate from described accuracy rate;
One-dimension array corresponding at least two accuracys rate is repeated above operation, until find one to restrain accuracy rate;
Using the value of one-dimension array corresponding for described convergence accuracy rate as first threshold, Second Threshold and the 3rd threshold value.
Preferably, the function formula of described evaluation function is:
Wherein, TP is the number that field question candidate result is determined field question candidate result, FP is the number that chat problem candidate result is judged to be field question candidate result, FN is the number that field question candidate result is judged to be chat problem candidate result, TR is the problem number that question answering system is correctly answered, SUM is Issue Totals, and Acc is accuracy rate.
In order to solve the problem, disclosed herein as well is a kind of information matches device based on question answering system, comprising: the first acquisition module, obtain the highest fuzzy problem candidate result of the marking of user's input information and the highest Lucene problem candidate result of marking;
First judge module, for judging whether the highest fuzzy problem candidate result of described marking is greater than the first threshold of fuzzy problem candidate result;
If the highest fuzzy problem candidate result of giving a mark is greater than first threshold, then using the Output rusults of fuzzy problem candidate result the highest for described marking as described user's input information;
If when the highest fuzzy problem candidate result of described marking is less than or equal to first threshold, then obtain the highest Lucene problem candidate result of marking;
Second judge module, for judging whether the Lucene problem candidate result of giving a mark the highest is greater than the Second Threshold of Lucene problem candidate result, if so, then using the Output rusults of Lucene problem candidate result the highest for described marking as described user's input information.
Compared with prior art, the application comprises following advantage:
First, user's input information is carried out fuzzy search and obtain multiple fuzzy problem candidate result, obtain the fuzzy problem candidate result that marking is the highest, by comparing of the highest fuzzy problem candidate result of marking and first threshold, if the highest fuzzy problem candidate result of giving a mark is greater than first threshold, then using the highest fuzzy problem candidate result of marking as the Output rusults of described user's input information, can be retrieved the different word orders of user's input information by fuzzy search.
Secondly, the application is by carrying out fuzzy search to user's input information, obtain multiple fuzzy problem candidate result, and carry out comparing of Lucene problem candidate result according to the fuzzy problem candidate of the highest marking obtained with the comparative result of first threshold, according to the comparative result of the highest Lucene problem candidate result of marking obtained and the Second Threshold Output rusults as user's input information, compared by the threshold value of the problem candidate recognition result of user's input information being carried out to twice, ensure that the accuracy of match information, thus improve the accuracy rate to user's input information coupling of question answering system.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of question answering system in prior art;
Fig. 2 is the process flow diagram of a kind of information matching method based on question answering system in the embodiment of the present application one;
Fig. 3 is the process flow diagram of a kind of information matching method based on question answering system in the embodiment of the present application two;
Fig. 4 is the genetic algorithm process flow diagram of the application;
Fig. 5 is the process flow diagram of a kind of information matching method based on question answering system in the embodiment of the present application three;
Fig. 6 is the process flow diagram of a kind of information matching method based on question answering system in the embodiment of the present application four;
Fig. 7 is that the application uses genetic algorithm to calculate the schematic diagram of the example of the value application of first threshold, Second Threshold and the 3rd threshold value;
Fig. 8 is the structured flowchart of the embodiment of the present application kind on May Day based on the information matches device of question answering system;
Fig. 9 is the structured flowchart of the embodiment of the present application 61 kinds based on the information matches device of question answering system.
Embodiment
For enabling above-mentioned purpose, the feature and advantage of the application more become apparent, below in conjunction with the drawings and specific embodiments, the application is described in further detail.
With reference to Fig. 2, show a kind of information matching method based on question answering system of the application, comprising:
Step 201: obtain fuzzy problem candidate result.
Carry out Lucene retrieval according to user's input information, obtaining multiple Lucene problem candidate result, multiple Lucene problem candidate result is being carried out fuzzy search, obtain multiple fuzzy problem candidate result of user's input information.
Obtain the highest fuzzy problem candidate result of the marking of user's input information and the highest Lucene problem candidate result of marking.
Step 202: judge whether the highest fuzzy problem candidate result of described marking is greater than the first threshold of fuzzy problem candidate result, if the highest fuzzy problem candidate result of marking is greater than first threshold, then performs step 203; If the highest fuzzy problem candidate result of giving a mark is less than or equal to first threshold, then perform step 204.
Step 203: using the Output rusults of fuzzy problem candidate result the highest for marking as described user's input information, flow process terminates.
Step 204: obtain the Lucene problem candidate result that marking is the highest, and perform step 205.
Step 205: judge whether the highest Lucene problem candidate result of marking is greater than the Second Threshold of Lucene problem candidate result, if so, then performs step 206, if not, then perform step 207.
Step 206: using the Output rusults of Lucene problem candidate result the highest for marking as user's input information.
Step 207: if when the highest Lucene problem candidate result of marking is less than or equal to Second Threshold, then exports and do not find user's input information.
Pass through the present embodiment, first, user's input information is carried out fuzzy search and obtain multiple fuzzy problem candidate result, obtain the fuzzy problem candidate result that marking is the highest, by comparing of the highest fuzzy problem candidate result of marking and first threshold, if the highest fuzzy problem candidate result of giving a mark is greater than first threshold, then using the highest fuzzy problem candidate result of marking as the Output rusults of described user's input information, can be retrieved the different word orders of user's input information by fuzzy search.
Secondly, the application is by carrying out fuzzy search to user's input information, obtain multiple fuzzy problem candidate result, and carry out comparing of Lucene problem candidate result according to the fuzzy problem candidate of the highest marking obtained with the comparative result of first threshold, according to the comparative result of the highest Lucene problem candidate result of marking obtained and the Second Threshold Output rusults as user's input information, compared by the threshold value of the problem candidate recognition result of user's input information being carried out to twice, ensure that the accuracy of match information, thus improve the accuracy rate to user's input information coupling of question answering system.
With reference to Fig. 3, show the process flow diagram of a kind of information matching method based on question answering system in the embodiment of the present application two.
In the present embodiment, a kind of information matching method based on question answering system, comprising:
Step 301: obtain fuzzy problem candidate result.
Step 302: judge whether the highest fuzzy problem candidate result of described marking is greater than the first threshold of fuzzy problem candidate result.
Step 303: if when the highest fuzzy problem candidate result of described marking is less than or equal to first threshold, obtain the Lucene problem candidate result that marking is the highest.
Step 304: judge whether the highest Lucene problem candidate result of marking is greater than the Second Threshold of Lucene problem candidate result.
Step 305: use genetic algorithm to calculate the value of first threshold and Second Threshold, wherein, described genetic algorithm comprises: selection opertor, crossover operator and mutation operator.
Genetic algorithm (Genetic Algorithms, GA) is the randomization searching algorithm that a class uses for reference organic sphere natural selection and natural genetic mechanism.This genetic algorithm simulating nature select and occur in natural genetic process breeding, intersection and gene mutation phenomenon, all retain one group of candidate solution in each iteration, and choose preferably individual from Xie Qunzhong by appropriateness value valuation functions, genetic operator (selection, crossover and mutation) is utilized to combine these individualities, produce the candidate solution group of a new generation, repeat this process, until meet certain convergence index.
With reference to Fig. 4, show the process flow diagram of the genetic algorithm realizing the application's scheme, the value step using genetic algorithm to calculate first threshold and Second Threshold comprises:
Step 401: described first threshold and Second Threshold are carried out random assignment, obtains multiple one-dimension array, such as: [0.5 0.5], [0.6 0.7], [0.3 0.7].
Step 402: use selection opertor to select to meet multiple one-dimension array of certain standard.
Selection opertor is the ability of searching optimum in order to ensure genetic algorithm, and adopt optimum individual to preserve operator, the optimum individual namely in parent colony directly enters in progeny population, ensure the individuality that obtains in genetic process can not by crossover and mutation operate destroy.
Multiple one-dimension array of certain standard can be understood as the one-dimension array that ideal adaptation degree is high or ideal adaptation degree is good.
Such as: [0.5 0.5], [0.6 0.7], [0.3 0.7], [0.5 0.1], [0.6 0.4], [0.3 0.2] etc., use to think after selection opertor [0.5 0.5], [0.6 0.7], [0.3 0.7], [0.5 0.1] ideal adaptation degree high, then from multiple one-dimension array, select [0.5 0.5], [0.6 0.7], [0.3 0.7], [0.5 0.1] as follow-up multiple one-dimension array of carrying out the certain standard intersected.
Step 403: use crossover operator process to meet multiple one-dimension array of certain standard, obtain the one-dimension array after multiple intersection.
Crossover operator produces new individual main method, determines the ability of searching optimum of genetic algorithm, play a crucial role in genetic algorithm.Because parameter is not very complicated, variation is more single, so select simple and effective single-point commutating operator.
Step 404: use the one-dimension array after multiple intersection described in mutation operator process, obtain the one-dimension array after multiple variation.
Mutation operator produces new individual householder method, and plain ability is searched in the local which determining genetic algorithm.Mutation operator and crossover operator cooperatively interact, and jointly can complete the global search to search volume and Local Search.
Step 405: the one-dimension array after meeting multiple one-dimension array of certain standard, multiple intersection and the one-dimension array after multiple variation are input in question answering system, obtain the accuracy rate of user's input information.
Step 406: filter out at least two accuracys rate from the accuracy rate after sequence.
Step 407: one-dimension array corresponding at least two accuracys rate is repeated above operation (step 403 is to step 406), until find one to restrain accuracy rate.
Step 408: using the value of one-dimension array corresponding to convergence accuracy rate as first threshold and Second Threshold.
It should be noted that, it is identical that this enforcement and the embodiment of the present application one perform step, and the present embodiment two is mainly discussed in detail for the part different from embodiment one.
Pass through the present embodiment, use selection opertor, crossover operator and mutation operator process multiple one-dimension array, and use the multiple one-dimension array meeting certain standard, one-dimension array after one-dimension array after multiple intersection and multiple variation is input in question answering system, obtain the accuracy rate of user's input information, one-dimension array corresponding at least two accuracys rate is repeated above operation, until find one to restrain accuracy rate, using the value of one-dimension array corresponding to convergence accuracy rate as first threshold and Second Threshold, thus ensure that the accuracy of user's input information.
With reference to Fig. 5, show a kind of information matching method based on question answering system in the embodiment of the present application three, comprising:
Step 501: obtain problem candidate result.
The field database retrieved by Lucene and chat database obtain field question candidate result and the problem of chat candidate result respectively.
The multiple Lucene problem candidate result obtained are divided into field question candidate result and the problem of chat candidate result according to dissimilar.
Step 502: multiple field question candidate result is given a mark.
Utilize fuzzy algorithm to give a mark to described multiple field question candidate result, obtain the mark of multiple fuzzy problem candidate result, mark is sorted according to order from high to low, obtain the field question candidate result that marking is the highest.
Step 503: judge whether the highest field question candidate result of marking is greater than the first threshold of fuzzy problem candidate result, if the highest field question candidate result of marking is greater than first threshold, then perform step 504; If the highest field question candidate result of giving a mark is less than or equal to first threshold, then perform step 505.
Step 504: using the Output rusults of field question candidate result the highest for marking as user's input information, flow process terminates.
Step 505: export and do not find user's input information, flow process terminates.
Step 506: multiple chat problem candidate result is given a mark.
Utilize fuzzy algorithm to give a mark to multiple chat problem candidate result, obtain the mark of multiple chat problem candidate result, mark is sorted according to order from high to low, obtain the chat problem candidate result that marking is the highest.
Step 507: judge whether the highest chat problem candidate result of marking is greater than the 3rd threshold value of chat problem candidate result, if the highest chat problem candidate result of marking is greater than the 3rd threshold value, then perform step 508; If give a mark, the highest chat problem candidate result is less than or equal to the 3rd threshold value, then perform step 509.
Step 508: using the Output rusults of chat problem candidate result the highest for marking as user's input information, flow process terminates.
Step 509: obtain the Lucene problem candidate result that marking is the highest, and perform step 510.
Step 510: judge whether the highest Lucene problem candidate result of marking is greater than the Second Threshold of Lucene problem candidate result, if so, then performs step 511; If not, then step 503 is performed.
Step 511: using the Output rusults of Lucene problem candidate result the highest for marking as user's input information.
Pass through the present embodiment, first, the multiple Lucene problem candidate result obtained are divided into field question candidate result and the problem of chat candidate result according to dissimilar, by comparing of field question candidate result and first threshold, if field question candidate result is greater than first threshold, then using the Output rusults of field question candidate result the highest for this marking as user's input information, owing to adding the retrieval of field question candidate result to Lucene problem candidate result, thus improve the precision ratio to user's input information.
Secondly, the application, by carrying out repeatedly the comparison of threshold value to the problem of chat candidate result, ensure that the accuracy to user's input information coupling, thus improves the accuracy rate to user's input information coupling of question answering system.
With reference to Fig. 6, show the process flow diagram of a kind of information matching method based on question answering system in the embodiment of the present application four.
In the present embodiment, a kind of information matching method based on question answering system, comprise: use genetic algorithm to calculate the value of first threshold, Second Threshold and the 3rd threshold value, wherein, genetic algorithm comprises: selection opertor, crossover operator and mutation operator, first threshold and the 3rd threshold value are set between 0-1 and change, and Second Threshold is set between 0-3 and changes.
The value step using genetic algorithm to calculate first threshold, Second Threshold and the 3rd threshold value comprises:
Step 601: described first threshold, Second Threshold and the 3rd threshold value are carried out random assignment, obtains multiple one-dimension array.
Step 602: use selection opertor to select the multiple one-dimension array meeting certain standard from described multiple one-dimension array.
Step 603: use crossover operator process to meet multiple one-dimension array of certain standard, obtain the one-dimension array after multiple intersection.
Step 604: use the one-dimension array after multiple intersection described in mutation operator process, obtain the one-dimension array after multiple variation.
Step 605: the one-dimension array after meeting multiple one-dimension array of certain standard, multiple intersection and the one-dimension array after multiple variation are input in question answering system, obtain the accuracy rate of user's input information.
Step 606: in-service evaluation function filters out at least two accuracys rate from described accuracy rate.
Step 607: one-dimension array corresponding at least two accuracys rate is repeated above operation (step 603-step 606), until find one to restrain accuracy rate.
Step 608: using the value of one-dimension array corresponding for described convergence accuracy rate as first threshold, Second Threshold and the 3rd threshold value.
Preferably, the function formula of described evaluation function is:
Wherein, TP is the number that field question candidate result is determined field question candidate result, FP is the number that chat problem candidate result is judged to be field question candidate result, FN is the number that field question candidate result is judged to be chat problem candidate result, TR is the problem number that question answering system is correctly answered, SUM is Issue Totals, and Acc is accuracy rate.
Pass through the present embodiment, genetic algorithm is used to calculate the value of first threshold, Second Threshold and the 3rd threshold value, and in-service evaluation function filters out at least two accuracys rate from accuracy rate, one-dimension array corresponding at least two accuracys rate is repeated the operations such as selection opertor, crossover operator and mutation operator, until find one to restrain accuracy rate, using the value of one-dimension array corresponding for convergence accuracy rate as first threshold, Second Threshold and the 3rd threshold value, thus ensure that the accuracy of user's input information.
In order to those skilled in the art better understand the value technical scheme of the calculating of use genetic algorithm first threshold, Second Threshold and the 3rd threshold value that the application limits, show the application see Fig. 7 and use genetic algorithm to calculate the schematic diagram of the example of the value application of first threshold, Second Threshold and the 3rd threshold value.
The application uses the basic ideas of genetic algorithm technical solution problem to be: for given F1, F2 and L1 produces four rational numbers at random, form initial parent sequence, then new progeny sequences is produced by the crossover and mutation of genetic algorithm, all filial generations and parent are imported in question answering system, draw the accuracy rate (Acc) of question answering system on exploitation collection, obtain four optimum composite sequences by selection opertor, then carry out repetitive exercise as initial parent.
Step 701: to first threshold F1, Second Threshold L1 and the 3rd threshold value F2, carry out random assignment, obtain multiple one-dimension array [0.5 0.5 0], [0.6 0.5 2], [0.5 0.6 0.4], [0.5 0.7 0.5].
Step 702: use crossover operator process to meet multiple one-dimension array of certain standard, obtain the one-dimension array after multiple intersection
Step 703: use mutation operator process
obtain the one-dimension array [0.5 0.6 2] after multiple variation, [0.7 0.8 0.5], [0.5 0.5 0], [0.6 0.5 0.4].
Step 704: by the one-dimension array after multiple one-dimension array [0.5 0.5 0] of certain standard, [0.6 0.5 2], [0.5 0.6 0.4], [0.5 0.7 0.5], multiple intersection
be input in question answering system with the one-dimension array [0.5 0.6 2] after multiple variation, [0.7 0.8 0.5], [0.5 0.5 0], [0.6 0.5 0.4], obtain the accuracy rate Acc1 to Acc8 of user's input information.
Step 705: by accuracy rate according to sorting from high to low, in-service evaluation function from Acc1 to Acc8 filter out one-dimension array [0.5 0.6 2] corresponding to 4 accuracys rate, [0.7 0.8 0.5], [0.5 0.5 0], [0.6 0.5 0.4.
Step 706: obtain restraining accuracy rate.
[0.5 0.6 2], [0.7 0.8 0.5], [0.5 0.5 0], [0.6 0.5 0.4] repeated execution of steps 701 to step 705 are obtained restraining accuracy rate.
Step 707: using the value of one-dimension array corresponding for convergence accuracy rate as first threshold, Second Threshold and the 3rd threshold value.
Based on the explanation of said method embodiment, present invention also provides the embodiment of a kind of information matches device based on question answering system accordingly, realize the content described in said method embodiment.
See Fig. 8, show the structured flowchart of a kind of information matches device based on question answering system in the embodiment of the present application five, specifically can comprise:
First acquisition module 801, obtains the highest fuzzy problem candidate result of the marking of user's input information and the highest Lucene problem candidate result of marking.
First judge module 802, for judging whether the highest fuzzy problem candidate result of described marking is greater than the first threshold of fuzzy problem candidate result.
If the highest fuzzy problem candidate result of giving a mark is greater than first threshold, then using the Output rusults of fuzzy problem candidate result the highest for described marking as described user's input information.
If when the highest fuzzy problem candidate result of described marking is less than or equal to first threshold, then obtain the highest Lucene problem candidate result of marking.
Second judge module 803, for judging whether the Lucene problem candidate result of giving a mark the highest is greater than the Second Threshold of Lucene problem candidate result, if so, then using the Output rusults of Lucene problem candidate result the highest for described marking as described user's input information.
Preferably, if when the Lucene problem candidate result that the described marking in the second judge module is the highest is less than or equal to Second Threshold, then exports and do not find user's input information.
Preferably, use genetic algorithm to calculate the value of first threshold and Second Threshold, wherein, described genetic algorithm comprises: selection opertor, crossover operator and mutation operator.
Preferably, the value using genetic algorithm to calculate first threshold and Second Threshold comprises:
Described first threshold and Second Threshold are carried out random assignment, obtains multiple one-dimension array.
Selection opertor is used to select to meet multiple one-dimension array of certain standard.
Use crossover operator process to meet multiple one-dimension array of certain standard, obtain the one-dimension array after multiple intersection.
Use the one-dimension array after multiple intersection described in mutation operator process, obtain the one-dimension array after multiple variation.
One-dimension array after meeting multiple one-dimension array of certain standard, multiple intersection and the one-dimension array after multiple variation are input in question answering system, obtain the accuracy rate of user's input information.
At least two accuracys rate are filtered out from the accuracy rate after sequence.
One-dimension array corresponding at least two accuracys rate is repeated above operation, until find one to restrain accuracy rate.
Using the value of one-dimension array corresponding for described convergence accuracy rate as first threshold and Second Threshold.
In sum, the information matches device of a kind of question answering system of the embodiment of the present application mainly comprises following advantage:
First, user's input information is carried out fuzzy search and obtain multiple fuzzy problem candidate result, obtain the fuzzy problem candidate result that marking is the highest, by comparing of the highest fuzzy problem candidate result of marking and first threshold, if the highest fuzzy problem candidate result of giving a mark is greater than first threshold, then using the highest fuzzy problem candidate result of marking as the Output rusults of described user's input information, can be retrieved the different word orders of user's input information by fuzzy search.
Secondly, the application is by carrying out fuzzy search to user's input information, obtain multiple fuzzy problem candidate result, and carry out comparing of Lucene problem candidate result according to the fuzzy problem candidate of the highest marking obtained with the comparative result of first threshold, according to the comparative result of the highest Lucene problem candidate result of marking obtained and the Second Threshold Output rusults as user's input information, compared by the threshold value of the problem candidate recognition result of user's input information being carried out to twice, ensure that the accuracy of match information, thus improve the accuracy rate to user's input information coupling of question answering system
See Fig. 9, show the structured flowchart of a kind of information matches device based on question answering system in the embodiment of the present application six, specifically can comprise: the second acquisition module 901, for the multiple Lucene problem candidate result obtained are divided into field question candidate result and the problem of chat candidate result according to dissimilar.
3rd scoring modules 902, for utilizing fuzzy algorithm to give a mark to described multiple field question candidate result, obtains the field question candidate result that marking is the highest.
Second judge module 903, for judging whether the highest field question candidate result of described marking is greater than the first threshold of fuzzy problem candidate result.
If the highest field question candidate result of giving a mark is greater than first threshold, then using the Output rusults of field question candidate result the highest for described marking as described user's input information.
If when the highest field question candidate result of giving a mark is less than or equal to first threshold, then exports and do not find user's input information.
4th scoring modules 904, for utilizing fuzzy algorithm to give a mark to described multiple chat problem candidate result, obtains the chat problem candidate result that marking is the highest.
3rd judge module 905, for judging whether the chat problem candidate result of giving a mark the highest is greater than the 3rd threshold value of chat problem candidate result.
If give a mark, the highest chat problem candidate result is greater than the 3rd threshold value, then using the Output rusults of chat problem candidate result the highest for described marking as user's input information.
When the highest chat problem candidate result is less than or equal to the 3rd threshold value if give a mark, then obtain the highest Lucene problem candidate result of marking.
4th judge module 906, for judging whether the Lucene problem candidate result of giving a mark the highest is greater than the Second Threshold of Lucene problem candidate result, if so, then using the Output rusults of Lucene problem candidate result the highest for described marking as described user's input information.
Preferably, if when the Lucene problem candidate result that the described marking in the 3rd judge module is the highest is less than or equal to Second Threshold, then exports and do not find user's input information.
Preferably, use genetic algorithm to calculate the value of first threshold, Second Threshold and the 3rd threshold value, wherein, described genetic algorithm comprises: selection opertor, crossover operator and mutation operator.
Preferably, the value using genetic algorithm to calculate first threshold, Second Threshold and the 3rd threshold value comprises:
Described first threshold, Second Threshold and the 3rd threshold value are carried out random assignment, obtains multiple one-dimension array.
Use selection opertor from described multiple one-dimension array, select the multiple one-dimension array meeting certain standard.
Use crossover operator process meet certain standard multiple one-dimension array, obtain the one-dimension array after multiple intersection.
Use the one-dimension array after multiple intersection described in mutation operator process, obtain the one-dimension array after multiple variation.
One-dimension array after meeting multiple one-dimension array of certain standard, multiple intersection and the one-dimension array after multiple variation are input in question answering system, obtain the accuracy rate of user's input information.
In-service evaluation function filters out at least two accuracys rate from described accuracy rate.
One-dimension array corresponding at least two accuracys rate is repeated above operation, until find one to restrain accuracy rate.
Using the value of one-dimension array corresponding for described convergence accuracy rate as first threshold, Second Threshold and the 3rd threshold value.
Preferably, the function formula of described evaluation function is:
Wherein, TP is the number that field question candidate result is determined field question candidate result, FP is the number that chat problem candidate result is judged to be field question candidate result, FN is the number that field question candidate result is judged to be chat problem candidate result, TR is the problem number that question answering system is correctly answered, SUM is Issue Totals, and Acc is accuracy rate.
In sum, a kind of information matches device based on question answering system of the embodiment of the present application mainly comprises following advantage:
First, the multiple Lucene problem candidate result obtained are divided into field question candidate result and the problem of chat candidate result according to dissimilar, by comparing of field question candidate result and first threshold, if field question candidate result is greater than first threshold, then using the Output rusults of field question candidate result the highest for this marking as user's input information, owing to adding the retrieval of field question candidate result to Lucene problem candidate result, thus improve the precision ratio to user's input information.
Secondly, the application, by carrying out repeatedly the comparison of threshold value to the problem of chat candidate result, ensure that the accuracy to user's input information coupling, thus improves the accuracy rate to user's input information coupling of question answering system.
For device embodiment, due to itself and embodiment of the method basic simlarity, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar part mutually see.
A kind of information matching method based on question answering system above the application provided and device, be described in detail, apply specific case herein to set forth the principle of the application and embodiment, the explanation of above embodiment is just for helping method and the core concept thereof of understanding the application; Meanwhile, for one of ordinary skill in the art, according to the thought of the application, all will change in specific embodiments and applications, in sum, this description should not be construed as the restriction to the application.
Claims (10)
1. based on an information matching method for question answering system, it is characterized in that, comprising:
Obtain the highest fuzzy problem candidate result of the marking of user's input information and the highest Lucene problem candidate result of marking;
Judge whether the highest fuzzy problem candidate result of described marking is greater than the first threshold of fuzzy problem candidate result;
If the highest fuzzy problem candidate result of giving a mark is greater than first threshold, then using the Output rusults of fuzzy problem candidate result the highest for described marking as described user's input information;
If when the highest fuzzy problem candidate result of described marking is less than or equal to first threshold, then obtain the highest Lucene problem candidate result of marking;
Judge whether the highest Lucene problem candidate result of marking is greater than the Second Threshold of Lucene problem candidate result, if so, then using the Output rusults of Lucene problem candidate result the highest for described marking as described user's input information.
2. method according to claim 1, is characterized in that, comprising:
If when the highest Lucene problem candidate result of described marking is less than or equal to Second Threshold, then exports and do not find user's input information.
3. method according to claim 1, is characterized in that, comprising:
Use genetic algorithm to calculate the value of first threshold and Second Threshold, wherein, described genetic algorithm comprises: selection opertor, crossover operator and mutation operator.
4. method according to claim 3, is characterized in that, the value step using genetic algorithm to calculate first threshold and Second Threshold comprises:
Described first threshold and Second Threshold are carried out random assignment, obtains multiple one-dimension array;
Selection opertor is used to select to meet multiple one-dimension array of certain standard;
Use crossover operator process to meet multiple one-dimension array of certain standard, obtain the one-dimension array after multiple intersection;
Use the one-dimension array after multiple intersection described in mutation operator process, obtain the one-dimension array after multiple variation;
One-dimension array after meeting multiple one-dimension array of certain standard, multiple intersection and the one-dimension array after multiple variation are input in question answering system, obtain the accuracy rate of user's input information;
At least two accuracys rate are filtered out from the accuracy rate after sequence;
One-dimension array corresponding at least two accuracys rate is repeated above operation, until find one to restrain accuracy rate;
Using the value of one-dimension array corresponding for described convergence accuracy rate as first threshold and Second Threshold.
5. an information matching method for question answering system, is characterized in that, comprising:
The multiple Lucene problem candidate result obtained are divided into field question candidate result and the problem of chat candidate result according to dissimilar;
Utilize fuzzy algorithm to give a mark to described multiple field question candidate result, obtain the field question candidate result that marking is the highest;
Judge whether the highest field question candidate result of described marking is greater than the first threshold of fuzzy problem candidate result;
If the highest field question candidate result of giving a mark is greater than first threshold, then using the Output rusults of field question candidate result the highest for described marking as described user's input information;
If when the highest field question candidate result of giving a mark is less than or equal to first threshold, then exports and do not find user's input information;
Utilize fuzzy algorithm to give a mark to described multiple chat problem candidate result, obtain the chat problem candidate result that marking is the highest;
Judge whether the highest chat problem candidate result of marking is greater than the 3rd threshold value of chat problem candidate result;
If give a mark, the highest chat problem candidate result is greater than the 3rd threshold value, then using the Output rusults of chat problem candidate result the highest for described marking as user's input information;
When the highest chat problem candidate result is less than or equal to the 3rd threshold value if give a mark, then obtain the highest Lucene problem candidate result of marking;
Judge whether the highest Lucene problem candidate result of marking is greater than the Second Threshold of Lucene problem candidate result, if so, then using the Output rusults of Lucene problem candidate result the highest for described marking as described user's input information.
6. method according to claim 5, is characterized in that, comprising:
If when the highest Lucene problem candidate result of described marking is less than or equal to Second Threshold, then exports and do not find user's input information.
7. method according to claim 5, is characterized in that, comprising:
Use genetic algorithm to calculate the value of first threshold, Second Threshold and the 3rd threshold value, wherein, described genetic algorithm comprises: selection opertor, crossover operator and mutation operator.
8. method according to claim 7, is characterized in that, comprising:
The value step using genetic algorithm to calculate first threshold, Second Threshold and the 3rd threshold value comprises:
Described first threshold, Second Threshold and the 3rd threshold value are carried out random assignment, obtains multiple one-dimension array;
Use selection opertor from described multiple one-dimension array, select the multiple one-dimension array meeting certain standard;
Use crossover operator process meet certain standard multiple one-dimension array, obtain the one-dimension array after multiple intersection;
Use the one-dimension array after multiple intersection described in mutation operator process, obtain the one-dimension array after multiple variation;
One-dimension array after meeting multiple one-dimension array of certain standard, multiple intersection and the one-dimension array after multiple variation are input in question answering system, obtain the accuracy rate of user's input information;
In-service evaluation function filters out at least two accuracys rate from described accuracy rate;
One-dimension array corresponding at least two accuracys rate is repeated above operation, until find one to restrain accuracy rate;
Using the value of one-dimension array corresponding for described convergence accuracy rate as first threshold, Second Threshold and the 3rd threshold value.
9. method according to claim 8, is characterized in that, the function formula of described evaluation function is:
Wherein, TP is the number that field question candidate result is determined field question candidate result, FP is the number that chat problem candidate result is judged to be field question candidate result, FN is the number that field question candidate result is judged to be chat problem candidate result, TR is the problem number that question answering system is correctly answered, SUM is Issue Totals, and Acc is accuracy rate.
10., based on an information matches device for question answering system, it is characterized in that, comprising:
First acquisition module, obtains the highest fuzzy problem candidate result of the marking of user's input information and the highest Lucene problem candidate result of marking;
First judge module, for judging whether the highest fuzzy problem candidate result of described marking is greater than the first threshold of fuzzy problem candidate result;
If the highest fuzzy problem candidate result of giving a mark is greater than first threshold, then using the Output rusults of fuzzy problem candidate result the highest for described marking as described user's input information;
If when the highest fuzzy problem candidate result of described marking is less than or equal to first threshold, then obtain the highest Lucene problem candidate result of marking;
Second judge module, for judging whether the Lucene problem candidate result of giving a mark the highest is greater than the Second Threshold of Lucene problem candidate result, if so, then using the Output rusults of Lucene problem candidate result the highest for described marking as described user's input information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410800479.7A CN104572868B (en) | 2014-12-18 | 2014-12-18 | The method and apparatus of information matches based on question answering system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410800479.7A CN104572868B (en) | 2014-12-18 | 2014-12-18 | The method and apparatus of information matches based on question answering system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104572868A true CN104572868A (en) | 2015-04-29 |
CN104572868B CN104572868B (en) | 2017-11-03 |
Family
ID=53088930
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410800479.7A Active CN104572868B (en) | 2014-12-18 | 2014-12-18 | The method and apparatus of information matches based on question answering system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104572868B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109271459A (en) * | 2018-09-18 | 2019-01-25 | 四川长虹电器股份有限公司 | Chat robots and its implementation based on Lucene and grammer networks |
CN110019736A (en) * | 2017-12-29 | 2019-07-16 | 北京京东尚科信息技术有限公司 | Question and answer matching process, system, equipment and storage medium based on language model |
CN114845128A (en) * | 2022-04-22 | 2022-08-02 | 咪咕文化科技有限公司 | Bullet screen interaction method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101086843A (en) * | 2006-06-07 | 2007-12-12 | 中国科学院自动化研究所 | A sentence similarity recognition method for voice answer system |
CN102236677A (en) * | 2010-04-28 | 2011-11-09 | 北京大学深圳研究生院 | Question answering system-based information matching method and system |
US20120078902A1 (en) * | 2010-09-24 | 2012-03-29 | International Business Machines Corporation | Providing question and answers with deferred type evaluation using text with limited structure |
CN104050224A (en) * | 2013-03-15 | 2014-09-17 | 国际商业机器公司 | Combining different type coercion components for deferred type evaluation |
-
2014
- 2014-12-18 CN CN201410800479.7A patent/CN104572868B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101086843A (en) * | 2006-06-07 | 2007-12-12 | 中国科学院自动化研究所 | A sentence similarity recognition method for voice answer system |
CN102236677A (en) * | 2010-04-28 | 2011-11-09 | 北京大学深圳研究生院 | Question answering system-based information matching method and system |
US20120078902A1 (en) * | 2010-09-24 | 2012-03-29 | International Business Machines Corporation | Providing question and answers with deferred type evaluation using text with limited structure |
CN104050224A (en) * | 2013-03-15 | 2014-09-17 | 国际商业机器公司 | Combining different type coercion components for deferred type evaluation |
Non-Patent Citations (3)
Title |
---|
吴全娥: ""汉语句子相似度计算及其在自动问答系统中的应用"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
宗裕朋: ""基于本体的中文智能答疑系统研究与实现"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
郑诚 等: ""改进的VSM算法及其在FAQ中的应用"", 《计算机工程》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019736A (en) * | 2017-12-29 | 2019-07-16 | 北京京东尚科信息技术有限公司 | Question and answer matching process, system, equipment and storage medium based on language model |
CN110019736B (en) * | 2017-12-29 | 2021-10-01 | 北京京东尚科信息技术有限公司 | Question-answer matching method, system, equipment and storage medium based on language model |
CN109271459A (en) * | 2018-09-18 | 2019-01-25 | 四川长虹电器股份有限公司 | Chat robots and its implementation based on Lucene and grammer networks |
CN114845128A (en) * | 2022-04-22 | 2022-08-02 | 咪咕文化科技有限公司 | Bullet screen interaction method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN104572868B (en) | 2017-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106815252B (en) | Searching method and device | |
CN101751455B (en) | Method for automatically generating title by adopting artificial intelligence technology | |
CN108846029B (en) | Information correlation analysis method based on knowledge graph | |
US20150074112A1 (en) | Multimedia Question Answering System and Method | |
CN104598611B (en) | The method and system being ranked up to search entry | |
CN105659225A (en) | Query expansion and query-document matching using path-constrained random walks | |
CN107832432A (en) | A kind of search result ordering method, device, server and storage medium | |
CN105653706A (en) | Multilayer quotation recommendation method based on literature content mapping knowledge domain | |
CN104360994A (en) | Natural language understanding method and natural language understanding system | |
CN105224648A (en) | A kind of entity link method and system | |
CN103235812B (en) | Method and system for identifying multiple query intents | |
CN108509409A (en) | A method of automatically generating semantic similarity sentence sample | |
CN102023986A (en) | Method and equipment for constructing text classifier by referencing external knowledge | |
CN112328800A (en) | System and method for automatically generating programming specification question answers | |
CN105975457A (en) | Information classification prediction system based on full-automatic learning | |
CN112307182B (en) | Question-answering system-based pseudo-correlation feedback extended query method | |
CN107169043A (en) | A kind of knowledge point extraction method and system based on model answer | |
CN101719129A (en) | Method for automatically extracting key words by adopting artificial intelligence technology | |
CN105787097A (en) | Distributed index establishment method and system based on text clustering | |
CN101763395A (en) | Method for automatically generating webpage by adopting artificial intelligence technology | |
CN104008187A (en) | Semi-structured text matching method based on the minimum edit distance | |
CN109299357B (en) | Laos language text subject classification method | |
CN101324926B (en) | Method for selecting characteristic facing to complicated mode classification | |
Nunthanid et al. | Parameter-free motif discovery for time series data | |
CN105260746A (en) | Expandable multilayer integrated multi-mark learning system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |