CN110799970A - Question-answering system and question-answering method - Google Patents
Question-answering system and question-answering method Download PDFInfo
- Publication number
- CN110799970A CN110799970A CN201780092702.9A CN201780092702A CN110799970A CN 110799970 A CN110799970 A CN 110799970A CN 201780092702 A CN201780092702 A CN 201780092702A CN 110799970 A CN110799970 A CN 110799970A
- Authority
- CN
- China
- Prior art keywords
- question
- answer
- candidate
- alternative
- candidate answer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000004458 analytical method Methods 0.000 claims abstract description 77
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 48
- 238000012896 Statistical algorithm Methods 0.000 claims abstract description 42
- 230000003993 interaction Effects 0.000 claims abstract description 29
- 238000000605 extraction Methods 0.000 claims description 37
- 238000004891 communication Methods 0.000 claims description 18
- 230000002194 synthesizing effect Effects 0.000 claims description 14
- 238000001914 filtration Methods 0.000 claims description 11
- 230000011218 segmentation Effects 0.000 claims description 8
- 230000008569 process Effects 0.000 abstract description 9
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 27
- 238000010586 diagram Methods 0.000 description 9
- 230000015572 biosynthetic process Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 4
- 238000005859 coupling reaction Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 239000000203 mixture Substances 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000008570 general process Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A question-answering system (10) and a question-answering method relate to the field of artificial intelligence and NLP and solve the problem that an existing DeepQA system is low in answer accuracy. The specific scheme is as follows: the question-answering system comprises a user interaction module (101) for receiving questions posed by a user; the system comprises a discourse structure analysis module (102) for obtaining a first candidate answer set corresponding to the question based on a discourse structure analysis algorithm; the characteristic statistical module (103) is used for obtaining a second candidate answer set corresponding to the question based on a characteristic statistical algorithm; the combined processing module (104) is used for carrying out combined processing on the first candidate answer set and the second candidate answer set, and taking the candidate answer with the highest score after the combined processing as the correct answer of the question; and the user interaction module (101) is also used for feeding back the correct answer to the user. The method is used for the question answering process of the user.
Description
The embodiment of the invention relates to the field of artificial intelligence and Natural Language Processing (NLP), in particular to a question and answer system and a question and answer method.
A question answering system (QA) is an advanced form of information retrieval system, and can answer questions posed by users in natural language with accurate and concise natural language to meet people's demand for fast and accurate information retrieval. For example, a user submits a question to the question and answer system: "when the phone invented? ", the system should return a reduced answer: "1867".
Currently, the most representative question-answering system in the industry is deep question-answering system (deep qa), fig. 1 is a schematic diagram of deep qa architecture, as shown in fig. 1, the deep question-answering system may include the following processes: receiving a question input by a user → question analysis → question decomposition → main search → alternative answer generation → generation hypothesis → soft filtering → hypothesis and evidence scoring → fusion and ranking of final answers → feeding back the highest ranked correct answer to the user.
As can be seen from fig. 1, in the deep qa process, each sub-process is executed in series, and if an error occurs in the previous sub-process, the error is accumulated in the subsequent sub-process, so as to reduce the accuracy of the final answer of the deep question answering system, such as: assuming that the accuracy of each sub-process is 95%, the accuracy of the 9 sub-processes after being connected in series is as follows: 0.95^9 ^ 0.63; secondly, for the existing deep qa, the requirement on the quality of the corpus is very high, and the existing deep qa is usually limited to small-scale professional corpora such as encyclopedias, professional documents or specialized manual writing, and the candidate answers obtained based on the small-scale corpora are limited and have large errors, which seriously affects the accuracy of the final answers.
Disclosure of Invention
The embodiment of the invention provides a question-answering system and a question-answering method. The problem that the answer accuracy rate obtained by the existing DeepQA is not high is solved.
In order to achieve the purpose, the embodiment of the invention adopts the following technical scheme:
in a first aspect, an embodiment of the present invention provides a question answering system, including:
the user interaction module is used for receiving a question provided by a user;
the chapter structure analysis module is used for obtaining a first candidate answer set corresponding to the question received by the user interaction module based on a chapter structure analysis algorithm; the chapter structure analysis algorithm is used for obtaining candidate answers corresponding to the question by adopting syntactic structure analysis or defined grammar rules or a structured knowledge base, and the first candidate answer set comprises at least one first candidate answer corresponding to the question and the grade of the first candidate answer;
the characteristic statistical module is used for obtaining a second candidate answer set corresponding to the question received by the user interaction module based on a characteristic statistical algorithm; the feature statistical algorithm is used for obtaining candidate answers corresponding to the questions in a word frequency statistical mode, and the second candidate answer set comprises at least one second candidate answer corresponding to the questions and scores of the second candidate answers;
the combined processing module is used for combining the first candidate answer set obtained by the chapter structure analysis module and the second candidate answer set obtained by the characteristic statistics module, and taking the candidate answer with the highest score after combined processing as the correct answer of the question;
and the user interaction module is also used for feeding back the correct answer to the user.
Compared with the existing question-answering system, the question-answering system provided by the embodiment of the invention combines a chapter structure analysis algorithm and a feature statistical algorithm to obtain correct answers to questions, because the chapter structure analysis is an algorithm for selecting candidate answers based on syntactic structure analysis or defined grammar rules or a structured knowledge base, the feature statistical algorithm is an algorithm for selecting candidate answers based on a word frequency statistical mode, and the two candidate answers are selected in different modes, so that the types of the selected candidate answers are greatly different, such as: incorrect answers contained in the candidate answer set returned based on the chapter structure analysis algorithm generally do not appear in the candidate answer set selected based on the feature statistical algorithm, so that the embodiment of the invention can remove the incorrect answers with the front scores to a great extent by means of the complementarity of the candidate answer sets returned by the two algorithms, and the accuracy of the question-answering system is improved.
With reference to the first aspect, in a possible implementation manner, the chapter structure analysis module may specifically include:
the problem analysis unit is used for performing word segmentation, syntax analysis and named entity identification on the problem to obtain at least one subproblem and at least one keyword corresponding to the subproblem;
the retrieval unit is used for inputting at least one keyword corresponding to the subproblem into the first corpus respectively for any subproblem in the at least one subproblem and retrieving to obtain a related document set of each keyword;
the alternative answer generating unit is used for extracting at least one alternative answer corresponding to the sub-question from the relevant document set of all keywords corresponding to the sub-question for any sub-question in the at least one sub-question, and performing hypothesis generation and soft filtering processing on the at least one alternative answer to obtain an alternative answer set corresponding to the sub-question; the alternative answer set comprises at least one alternative answer;
the evidence retrieval scoring unit is used for substituting at least one alternative answer in the alternative answer set corresponding to the subproblem into the subproblem to generate at least one statement for any subproblem in the at least one subproblem, inputting each statement into an evidence base for retrieval, and scoring the alternative answers corresponding to the statements according to the number of the retrieved related documents;
and the answer synthesizing and sorting unit is used for synthesizing the alternative answer set corresponding to each subproblem, and taking M alternative answers before scoring in the synthesized alternative answer set as a first candidate answer set, wherein M is an integer greater than or equal to 1.
In this way, the chapter structure analysis module may generate a first set of candidate answers through analysis of the question, retrieval of a set of related documents based on the first corpus, generation of candidate answers, scoring of the candidate answers, synthesis and ranking of the candidate answers with the aid of the first corpus.
With reference to the foregoing possible implementation manners, in a possible implementation manner, the feature statistics module may specifically include:
the searching unit is used for inputting the problems received by the user interaction module into the second corpus and searching to obtain a relevant document set of the problems;
the feature extraction unit is used for extracting features from the relevant document set searched by the search unit based on a feature statistical algorithm to obtain an alternative answer set, and the alternative answer set comprises at least one alternative answer corresponding to the question;
the feature scoring and answer sorting unit is used for scoring each alternative answer in the alternative answer set determined by the feature extraction unit, taking the alternative answer of N before scoring as a second candidate answer set, wherein N is an integer greater than or equal to 1;
the first corpus and the second corpus are different.
As such, the feature statistics module may generate a second set of candidate answers by searching for relevant documents for the question, extracting candidate answers based on the feature statistics, scoring the candidate answers, with a second corpus different from the first corpus.
In combination with the above possible implementations, in one possible implementation,
the number of the corpora contained in the second corpus is greater than the number of the corpora contained in the first corpus.
Optionally, the first corpus may contain at least one of the following corpora: wikipedia, knowledge map, professional literature, and manual corpora. The second corpus may include the first corpus and at least one of: baidu know, forum posts, web portals, blogs, microblogs.
Therefore, a formal and high-quality corpus can be configured for the discourse structure analysis module, and the purity of the candidate answers determined by the discourse structure analysis module is ensured; meanwhile, in order to exert the advantages of feature statistics, a large-scale corpus is configured for the feature statistics module, the search range of the corpus is expanded, the feature statistics module determines candidate answers different from answers determined by discourse structure analysis, and the accuracy of the answers determined by the question-answering system is improved.
With reference to the foregoing possible implementation manners, in a possible implementation manner, the feature statistics module may specifically include:
the feature extraction unit is used for extracting features from all relevant document sets retrieved by the retrieval unit based on a feature statistical algorithm to obtain an alternative answer set, and the alternative answer set comprises at least one alternative answer corresponding to the question;
and the feature scoring and answer ranking unit is used for scoring each candidate answer in the candidate answer set determined by the feature extraction unit, and taking the candidate answer before scoring as a second candidate answer set, wherein O is an integer greater than or equal to 1.
In this possible implementation, the feature statistics module may extract candidate answers from the set of related documents retrieved by the discourse structure analysis module based on the feature statistics and score the candidate answers to generate a second set of candidate answers. Therefore, the feature statistical module is not required to search related document sets, and the design complexity of the feature statistical module is greatly reduced.
With reference to the foregoing possible implementation manners, in a possible implementation manner, the feature statistics module may specifically include:
the searching unit is used for inputting the problems received by the user interaction module into an evidence base and searching to obtain a relevant document set of the problems;
the feature extraction unit is used for extracting features from the relevant document set searched by the search unit based on a feature statistical algorithm to obtain an alternative answer set, and the alternative answer set comprises at least one alternative answer corresponding to the question;
and the feature scoring and answer ranking unit is used for scoring each candidate answer in the candidate answer set determined by the feature extraction unit, and taking the candidate answer before scoring as a second candidate answer set, wherein P is an integer greater than or equal to 1.
Therefore, the feature statistics module can generate a second candidate answer set by searching relevant documents of questions, extracting candidate answers based on feature statistics and scoring the candidate answers by means of the evidence base, and does not need to separately configure a corpus for the feature statistics module, so that the overall complexity of the question-answering system provided by the embodiment of the invention is greatly reduced.
With reference to the foregoing possible implementation manners, in one possible implementation manner, the combination processing module may specifically be configured to:
extracting the intersection of the first candidate answer set and the second candidate answer set, and taking the candidate answer with the highest score in the extracted intersection as the correct answer of the question; or
And performing weighting processing on the same candidate answer in the first candidate answer set and the second candidate answer set, and taking the candidate answer with the highest score after weighting processing as the correct answer of the question.
In this way, the candidate answer with the highest score in the intersection of the candidate answer set obtained based on the chapter structure analysis algorithm and the candidate answer set obtained based on the feature statistical algorithm may be used as the final answer, or the candidate answer with the highest score after the candidate answer in the intersection of the candidate answer set obtained based on the chapter structure analysis algorithm and the candidate answer set obtained based on the feature statistical algorithm is weighted may be used as the final answer.
In a second aspect, an embodiment of the present invention provides a question answering method, including:
receiving a question provided by a user, obtaining a first candidate answer set corresponding to the question based on a discourse structure analysis algorithm, obtaining a second candidate answer set corresponding to the question based on a characteristic statistical algorithm, carrying out combined processing on the first candidate answer set and the second candidate answer set, and taking a candidate answer with the highest score after the combined processing as a correct answer of the question; feeding back the correct answer to the user;
the chapter structure analysis algorithm is used for obtaining candidate answers corresponding to the questions by adopting syntactic structure analysis or defined grammar rules or a structured knowledge base, and the feature statistical algorithm is used for obtaining candidate answers corresponding to the questions by adopting a word frequency statistical mode.
Specifically, the specific implementation process of the question answering method may refer to a process executed by each module or unit in the first aspect or a possible implementation manner of the first aspect, and is not repeated herein. Therefore, the question-answering system provided by this aspect can achieve the same advantageous effects as those of the first aspect.
In another aspect, an embodiment of the present application provides a question-answering system, where the question-answering system may implement a function executed by a question-answering system element in the foregoing method embodiment, where the function may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.
In one possible design, the structure of the question-answering system includes a processor and a communication unit, and the processor is configured to support the question-answering system to execute the corresponding functions in the above method. The communication unit is used for supporting communication between the question answering system and users or other network elements. The question-answering system may also include a memory, for coupling with the processor, that stores the necessary program instructions and data for the question-answering system.
In yet another aspect, the present application provides a computer storage medium for storing computer software instructions for the above question-answering system, where the computer software instructions include a program designed to execute the above aspects.
In yet another aspect, the present application provides a computer program product storing computer software instructions for the question answering system, where the computer software instructions include a program designed to execute the above aspects.
In yet another aspect, the present invention provides an apparatus, which exists in the form of a chip product, and the apparatus includes a processor and a memory, the memory is configured to be coupled to the processor and stores necessary program instructions and data of the apparatus, and the processor is configured to execute the program instructions stored in the memory, so that the apparatus performs the functions corresponding to the question answering system in the above method.
Fig. 1 is a schematic diagram of a network architecture of deep qa provided in the prior art;
FIG. 2 is a simplified schematic diagram of a question answering system according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a question answering system according to an embodiment of the present invention;
fig. 4 is a schematic diagram illustrating a question answering system according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating a question answering system according to an embodiment of the present invention;
fig. 6 is a flowchart of a question answering method according to an embodiment of the present invention;
fig. 7 is a schematic composition diagram of a question answering system according to an embodiment of the present invention.
The embodiment of the invention provides a question-answering system, which has the following basic principles: after receiving a question provided by a user, acquiring one group of candidate answers corresponding to the question based on a discourse structure analysis algorithm, acquiring the other group of candidate answers corresponding to the question based on a characteristic statistical algorithm, combining and processing the two groups of candidate answers, and feeding back the processed candidate answer with the highest score as a final correct answer to the user.
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Fig. 2 is a simplified schematic diagram of a question answering system 10 according to an embodiment of the present invention. The question-answering system 10 may be arranged on a user terminal in the form of application software (APP), and a user may interact with the question-answering system 10 by clicking the APP corresponding to the question-answering system 10 on the user terminal, where the user terminal may be: mobile phones, tablet computers, ultra-mobile personal computers (UMPC), notebook computers, netbooks, and Personal Digital Assistants (PDA); the question-answering system 10 may also be a stand-alone device for directly interacting with the user, as the present invention is not limited in this respect.
Specifically, as shown in fig. 2, the question-answering system 10 may include: a user interaction module 101, a chapter structure analysis module 102, a feature statistics module 103, and a composition processing module 104.
The user interaction module 101 may be configured to receive a question posed by a user and feed back a correct and correct answer to the question to the user.
The user may ask a question: the user may use a natural language to express a question, and the correct answer to the question may be: answers described in concise natural language such as "words", "phrases", or "lists".
The user interaction module 101: may include a graphical interface, which is designed with an input box for the user to input questions through the input unit such as keyboard, microphone, etc., for example: the user interaction module 101 may be configured to receive a question posed by a user through an input box, and feed back a correct answer to the question to the user in a form of text through an image interface of the user interaction module 101; alternatively, the user interaction module 101 of the question-answering system 10 may include an audio unit, which may include a microphone and a player, wherein the microphone may be configured to receive sounds made by the user, and the player may be configured to feed back the answers determined by the question-answering system 10 to the user in the form of sounds, such as: the user interaction module 101 may be used to receive a question posed by a user through an audio unit and to play a correct answer to the question in the form of sound to the user.
The chapter structure analysis module 102 may be configured to obtain a question posed by the user from the user interaction module 101, and obtain a first candidate answer set corresponding to the question based on a chapter structure analysis algorithm.
The chapter structure analysis algorithm is used for obtaining candidate answers corresponding to the questions by adopting syntactic structure analysis or defined grammar rules or a structured knowledge base; the syntactic structure analysis, the defined grammar rules and the structured knowledge base are common means in the prior chapter structure analysis algorithm, and are not detailed here.
The first candidate answer set may include at least one candidate answer, each candidate answer corresponding to a score, the score being used to characterize the credibility of the candidate answer, called the correct answer, and the higher the score, the more likely the candidate answer is to be the correct answer, the score may be expressed in percentage.
Optionally, in the analysis algorithm based on the discourse structure in the embodiment of the present invention, the scores of the candidate answers may be the results obtained by synthesizing a plurality of scoring algorithms; some typical scoring algorithms may include, but are not limited to, the following: 1. whether the type of the candidate answer is the same as the answer type of the question or not is judged, if so, the score of the candidate answer is relatively higher, and if not, the score of the candidate answer is relatively lower; for example, if the user asks the question of which city, and the corresponding answer type is the city type, the candidate answers of the city types such as beijing, tianjin and the like have higher scores than the candidate answers of the city types such as Tiananmen and the like which do not belong to the city type; 2. whether the candidate answer appears at an important position of the article or encyclopedia (such as a title, a first paragraph of the article or the encyclopedia and the like), if the candidate answer appears at the important position of the article or the encyclopedia, the score of the candidate answer is relatively high, otherwise, the score of the candidate answer is relatively low; 3. and inputting the sentences of the candidate answers substituted into the questions into an evidence base for retrieval, wherein if the number of returned documents is large, the score of the candidate answers is high, and otherwise, the score of the candidate answers is low.
The feature statistics module 103 may be configured to obtain a question posed by a user from the user interaction module 101, and obtain a second candidate answer set corresponding to the question based on a feature statistics algorithm;
the feature statistical algorithm is used for obtaining candidate answers corresponding to the question in a word frequency statistical manner, the second candidate answer set may include at least one candidate answer, each candidate answer corresponds to a score, and the score is also used for representing the credibility of the candidate answer becoming a correct answer, and the higher the score is, the higher the probability of becoming a correct answer is. Generally, in feature-based statistical algorithms, the score of a word as a candidate answer may be represented by the weight of the word in the article. The term weighting method in the industry may include word frequency, relative word frequency, word frequency-inverse document frequency, and other algorithms.
The combination processing module 104: the system is configured to perform combination processing on the first candidate answer set of the chapter structure analysis module 101 and the second candidate answer set obtained by the feature statistics module 103, and use the candidate answer with the highest score after the combination processing as the correct answer corresponding to the question.
Optionally, the combined processing module 104 extracts an intersection of the first candidate answer set and the second candidate answer set, and takes a candidate answer with the highest score in the extracted intersection as a correct answer to the question; or
And performing weighting processing on the same candidate answer in the first candidate answer set and the second candidate answer set, and taking the candidate answer with the highest score after weighting processing as the correct answer of the question.
Wherein, the weighting processing means: for the same word, the scores of the word in the two candidate answer sets are multiplied by a weight (namely a coefficient) respectively, and then the total score is obtained by adding, and the total score is used as the score of the word. If a word does not appear in a candidate answer set, the word may be considered to have a score of 0 in the candidate answer set.
For example, suppose for "which city is the capital of china? "this question, results in two candidate answer sets: the first candidate answer set and the score are (Beijing 0.86, Tianjin 0.80), the second candidate answer set and the score are (Tiananmen 0.81, Beijing 0.78), the intersection of the two candidate answer sets is only Beijing, and then the 'Beijing' is taken as the correct answer of the 'city which the first capital of China is'; or setting the weight of the first candidate answer set as 2 and the weight of the second candidate answer set as 1, and performing weighting processing on the two candidate answer sets, so that the weighting score of Beijing is as follows: 0.86 × 2+0.78 × 1 ═ 2.5; the weighted scores of Tianjin are: 0.80 x 2+0 x 1 ═ 1.6; the score for Tiananmen is: 0 x 2+0.81 x 1 ═ 0.81, the score of Beijing is the highest, then "Beijing" is taken as the correct answer to "which city the capital of China is".
Therefore, the correct answers to the questions are obtained by combining the discourse structure analysis algorithm and the feature statistical algorithm, the types of the selected candidate answers are greatly different due to different modes adopted when the two candidate answers are selected, and usually, the incorrect answers contained in the candidate answer set returned by the discourse structure analysis algorithm generally do not appear in the candidate answer set selected by the feature statistical algorithm, so that the question-answering system shown in fig. 2 can remove the incorrect answers with the grades being close to the grades to a great extent by means of the complementarity of the two algorithms, and the accuracy of the question-answering system is improved.
The functional blocks of the question answering system 10 shown in fig. 2 will be further described with reference to fig. 3, 4 and 5.
In one possible implementation of the embodiment of the present invention, as shown in fig. 3, the chapter structure analysis module 102 may include: a question analysis unit 1021a, a retrieval unit 1022a, an alternative answer generation unit 1023a, an evidence retrieval scoring unit 1024a, an answer synthesis and ranking unit 1025 a; the feature statistics module 103 may include: a search unit 1031a, a feature extraction unit 1032a, and a feature scoring and answer ranking unit 1033 a.
The question analysis unit 1021a is configured to perform word segmentation, syntax parsing, and named entity recognition on a question posed by a user, and obtain at least one sub-question and at least one keyword corresponding to the sub-question.
The word segmentation and syntax analysis are common processes in the field of natural language processing, and are not described in detail herein. For example: will "be the grand President? "this problem is input into the open-source Stanford NLP package, and the output result is as follows:
the word segmentation result is as follows:
who is the grand president?
And (3) syntactic analysis results:
(ROOT
(IP
(NP
(DNP
(NP (NR Hua is))
(of DEG)
(NP (NN President)))
(VP (VC is)
(NP (PN who)))
(PU?)))
Named entity recognition results:
< who is the President of ORG Hua </ORG >?
In english, the answer types can be determined as person, time, place, etc. by who, when, where, etc. interrogative words. The situation of chinese is complicated because the words in chinese query are very diverse, such as asking the person who can be, and which position; words may even go unnoticed, such as "directly do not know what is this bit? "is used to ask a person. However, the method for determining answer types in Chinese is similar to English, and the answer types are determined according to rules by matching words and sentence pattern rules.
After determining the query (including the omitted query), the keyword can be further determined. The keywords are generally words that modify the query words, and can be obtained by analyzing the results of the sentence analysis. For example, in the above example, the keyword and its modified query word are [ Hua is-President- (who) ].
For each sub-question, search section 1022a inputs the keyword of the sub-question into the first corpus to search, and obtains a document set related to the keyword.
Wherein the set of related documents may contain at least one document that is related to the keyword. Optionally, extracting the relevant document set according to the keywords may be implemented by a general search engine, which is not described herein; such as: the retrieval unit 1022a may be configured to input the keyword into the input box of the first corpus and click the search button to perform retrieval.
The candidate answer generating unit 1023a is configured to, for each sub-question, extract at least one candidate answer corresponding to the sub-question from the relevant document set of the keyword corresponding to the sub-question acquired by the retrieving unit 1022a, and perform hypothesis generation and soft filtering processing on the at least one candidate answer to obtain a candidate answer set corresponding to the sub-question.
Wherein, the alternative answer set may include: at least one alternative answer.
Alternatively, the candidate answer generating unit 1023a may be configured to extract a candidate answer set from the relevant document set by using a syntactic rule or a structured knowledge base (i.e., a knowledge graph) analyzed or defined by a syntactic structure.
The generation of hypothesis and the soft filtering are general processes of the existing question answering system, and are not detailed here. For example: the generation hypothesis may be a process of generating a statement sentence by substituting the alternative answer into the original question. For example, suppose the question is "which city is the first of china? "there are two alternative answers," Beijing "and" Tiananmen ", respectively, then the generation hypotheses are" the capital of China is Beijing "and" the capital of China is Tiananmen ", respectively. And soft filtering, namely filtering out alternative answers which do not meet the requirements through some lightweight scoring algorithms, type matching algorithms and the like. For example, in the previous example, the question "which city is the first of china? "type of answer, should be" city "; then of the two alternative answers, "beijing" is a city and thus a likely correct answer; "Tiananmen" is not a city and therefore is likely not the correct answer. Thus, soft filtering can filter out the alternative answer "Tiananmen".
The evidence retrieval scoring unit 1024a is configured to substitute each candidate answer in the candidate answer set corresponding to each sub-question into the sub-question to generate a sentence for each sub-question acquired by the candidate answer generating unit 1023a, input the sentence into an evidence base for search, and score the candidate answer according to the number of retrieved related documents.
It should be noted that the evidence retrieval scoring unit 1024a may be used not only for scoring according to the number of returned related documents, but also for scoring the alternative answers by using other scoring algorithms (such as the aforementioned scoring algorithm), which is not limited by the embodiment of the present invention.
And the answer synthesizing and sorting unit 1025a is used for synthesizing a candidate answer set corresponding to each subproblem and taking M candidate answers before scoring in the synthesized candidate answer set as the first candidate answer set, wherein M is an integer greater than or equal to 1.
Wherein M is an integer greater than or equal to 1, and may be set as required, which is not limited in the embodiment of the present invention; such as: the candidate answer of M before scoring may be a candidate answer with a score greater than or equal to a preset score, and the preset score may be set according to needs, which is not limited in the embodiment of the present invention.
The searching unit 1031a is configured to input the question received by the user interaction module into the second corpus, and search for a related document set of the question.
The feature extraction unit 1032a is configured to perform feature extraction on the relevant document set searched by the search unit 1031a based on a feature statistical algorithm to obtain an alternative answer set, where the alternative answer set includes at least one alternative answer corresponding to the question.
Wherein the feature extraction may include: the feature extraction method based on the word frequency, the method based on the information gain and other feature extraction methods can be as follows: the characters are word frequency, relative word frequency, word frequency-reversal file frequency appearing in the document set. These algorithms, and the above mentioned information gain based method, are algorithms commonly used in the industry and will not be described further herein.
A feature scoring and answer ranking unit 1033a, configured to score each candidate answer in the candidate answer set determined by the feature extracting unit 1032a, and use the candidate answer that is N before scoring as the second candidate answer set, where N is an integer greater than or equal to 1.
The general feature extraction unit 1032a already includes a process of calculating a score (i.e., a weight) of each candidate answer (i.e., feature) at the time of feature extraction, and thus, an algorithm for calculating a score of each candidate answer may be the above-mentioned algorithm based on word frequency, relative word frequency, word frequency — inverse file frequency, and the like. The two processes of feature extraction and calculating the score of each alternative answer may also be separated, and different algorithms may be used for calculation, which is not limited in the embodiment of the present invention.
In the embodiment of the present invention, the first corpus and the second corpus used by the chapter structure analysis module 102 and the feature statistics module 103 are different, the first corpus is a corpus with higher purity, the second corpus is an expanded corpus of the first corpus, and the second corpus is a larger document library containing a wide range of corpora compared to the first corpus, that is, the second corpus contains a larger number of corpora than the first corpus. Specifically, the first corpus may include: the wikipedia, knowledge graph, professional literature, manual corpora and other small-scale corpora with strong specialties and normative specialties, and the second corpus may include: the first corpus, Baidu know, forum post, web portal, and other web pages are currently searchable large-scale corpora.
The evidence base is generally a large document base and can contain a wide variety of corpora, such as: the evidence library may include large-scale corpora that can be searched at present, such as web pages, encyclopedia, hundredths, forum posts, web portals, and other web pages, and the corpora are similar to general search engines and have no special requirements. Optionally, in an implementation manner, the evidence base and the second corpus may be set as the same corpus, and in another implementation manner, the evidence base and the second corpus may be set according to specific uses of the evidence base and the second corpus, respectively.
Optionally, when starting the question-answering function of the question-answering system 10, a prompt for setting a corpus may be sent to the user through the user interaction interface of the question-answering system 10, at this time, the user may input corpora in the input box of the user interaction interface according to the prompt, and click a storage button on the user interaction interface to store the corpora in the question-answering system 10; alternatively, the first corpus and the second corpus are arranged in a database, and the question-answering system accesses the corpora when necessary.
Therefore, the chapter structure analysis module 102 obtains the candidate answers by using the normative corpus, and the feature statistics module 103 obtains the candidate answers by using the large-range corpus, so that the search range of the candidate answers is expanded and the precision of the question-answering system is improved while the purity of the answers is ensured.
In another possible implementation of the embodiment of the present invention, in order to reduce the design complexity of the question-answering system 10, the feature statistics module 103 may determine the candidate answers without setting a search unit, but using the relevant document set retrieved by the chapter structure analysis module 102, that is, only setting the first corpus for the question-answering system 10.
Specifically, as shown in fig. 4, the chapter structure analysis module 102 in the question-answering system 10 may include: a question analysis unit 1021b, a retrieval unit 1022b, an alternative answer generation unit 1023b, an evidence retrieval scoring unit 1024b, an answer synthesis and ranking unit 1025 b; the feature statistics module 103 may include: a feature extraction unit 1031b, and a feature scoring and answer ranking unit 1032 b.
The function of the question analyzing unit 1021b is the same as that of the question analyzing unit 1021a shown in fig. 3, the function of the retrieving unit 1022b is the same as that of the retrieving unit 1022a shown in fig. 3, the function of the candidate answer generating unit 1023b is the same as that of the candidate answer generating unit 1023a shown in fig. 3, the function of the evidence retrieval scoring unit 1024b is the same as that of the evidence retrieval scoring unit 1024a shown in fig. 3, and the functions of the answer synthesizing and sorting unit 1025b and the answer synthesizing and sorting unit 1025a shown in fig. 3 are the same, and are not repeated herein.
And a feature extraction unit 1031b, configured to perform feature extraction on the document set acquired by the retrieval unit 1022b based on a feature statistical algorithm to obtain an alternative answer set.
The feature extraction unit 1031b has the same function as the feature extraction unit 1032a shown in fig. 3, and the description thereof is not repeated here.
The feature scoring and answer ranking unit 1032b has the same function as the feature scoring and answer ranking unit 1033a shown in fig. 3, and will not be described again.
Therefore, the feature statistical module in the question-answering system can extract features from the relevant document set retrieved by the chapter structure analysis module without setting a search unit to determine a candidate answer set, so that the design complexity of the feature statistical module is reduced, and further the design complexity of the whole question-answering system is reduced.
In yet another possible implementation of the embodiment of the present invention, as shown in fig. 5, the chapter structure analysis module 102 may include: a question analysis unit 1021c, a retrieval unit 1022c, a candidate answer generation unit 1023c, an evidence retrieval scoring unit 1024c, an answer synthesis and ranking unit 1025 c; the feature statistics module 103 may include: a search unit 1031c, a feature extraction unit 1032c, and a feature scoring and answer ranking unit 1033 c.
The function of the question analyzing unit 1021c is the same as that of the question analyzing unit 1021a shown in fig. 3, the function of the retrieving unit 1022c is the same as that of the retrieving unit 1022a shown in fig. 3, the function of the candidate answer generating unit 1023c is the same as that of the candidate answer generating unit 1023a shown in fig. 3, the function of the evidence retrieval scoring unit 1024c is the same as that of the evidence retrieval scoring unit 1024a shown in fig. 3, and the functions of the answer synthesizing and sorting unit 1025c and that of the answer synthesizing and sorting unit 1025a shown in fig. 3 are the same, and are not repeated here.
A search unit 1031c, configured to input the question into an evidence base, and search for a related document set of the question;
the feature extraction unit 1032c has the same function as the feature extraction unit 1032a shown in fig. 3, and will not be repeated herein. The feature scoring and answer ranking unit 1032c has the same function as the feature scoring and answer ranking unit 1033a shown in fig. 3, and will not be described again.
The evidence base used by the search unit 1031c and the evidence base used by the evidence retrieval scoring unit may be the same corpus.
Optionally, for the existing question-answering system design scheme of the friend (only including the design scheme of the user interaction module 101 and the chapter structure analysis module 102 shown in fig. 2 to fig. 5), the existing question-answering system design scheme is slightly modified. The elements of the feature statistics module 102 shown in fig. 5 may be integrated with the evidence search scoring element 1024c of the chapter structure analysis module 10 to improve the accuracy of the "evidence search" step of the chapter structure analysis module 102.
The question answering method provided by the embodiment of the invention is described below with reference to the question answering systems shown in fig. 2 to 5. It should be noted that although a logical order is shown in the method flow diagrams described below, in some cases, the steps shown or described may be performed in an order different than here.
Fig. 6 is a question answering method provided in the embodiment of the present invention, and as shown in fig. 6, the method may include:
step 601: receiving a question posed by a user.
Step 602: and obtaining a first candidate answer set corresponding to the question based on a chapter structure analysis algorithm.
The chapter structure analysis algorithm is used for obtaining candidate answers corresponding to the question by adopting syntactic structure analysis or defined grammar rules or a structured knowledge base, and the first candidate answer set comprises at least one first candidate answer corresponding to the question and the grade of the first candidate answer.
Alternatively, the first set of candidate answers may be determined by:
performing word segmentation, syntax analysis and named entity identification on the problem to obtain at least one subproblem and at least one keyword corresponding to the subproblem;
for any sub-problem in at least one sub-problem, respectively inputting at least one keyword corresponding to the sub-problem into a first corpus, and retrieving to obtain a related document set of each keyword;
for any sub-question in at least one sub-question, extracting at least one alternative answer corresponding to the sub-question from the relevant document set of all keywords corresponding to the sub-question, and performing hypothesis generation and soft filtering processing on the at least one alternative answer to obtain an alternative answer set corresponding to the sub-question; the alternative answer set comprises at least one alternative answer;
for any sub-question in at least one sub-question, substituting at least one alternative answer in an alternative answer set corresponding to the sub-question into the sub-question to generate at least one statement, inputting each statement into an evidence base for retrieval, and grading the alternative answer corresponding to the statement according to the number of retrieved related documents;
and synthesizing a candidate answer set corresponding to each subproblem, and taking M candidate answers before scoring in the synthesized candidate answer set as a first candidate answer set, wherein M is an integer greater than or equal to 1.
Step 603: and obtaining a second candidate answer set corresponding to the question based on a feature statistical algorithm.
The feature statistical algorithm is used for obtaining candidate answers corresponding to the questions in a word frequency statistical mode, and the second candidate answer set comprises at least one second candidate answer corresponding to the questions and scores of the second candidate answers.
Alternatively, the second candidate answer set may be obtained by the following manner 1, manner 2, or manner 3:
mode 1: inputting the question into a second corpus, and searching to obtain a relevant document set of the question;
based on a feature statistical algorithm, extracting features from a relevant document set of the question to obtain an alternative answer set, wherein the alternative answer set comprises at least one alternative answer corresponding to the question;
scoring each alternative answer in the alternative answer set obtained after the features are extracted, and taking N alternative answers before scoring as a second alternative answer set, wherein N is an integer greater than or equal to 1;
wherein the first corpus and the second corpus are different.
Mode 2: based on a feature statistical algorithm, extracting features from the relevant document sets of all the keywords corresponding to at least one sub-question to obtain an alternative answer set, wherein the alternative answer set comprises at least one alternative answer corresponding to the question;
and scoring each alternative answer in the alternative answer set obtained after the feature extraction, and taking the alternative answer before scoring as a second alternative answer set, wherein O is an integer greater than or equal to 1.
The relevant document set of all keywords corresponding to at least one sub-question can be obtained through step 602.
Mode 3: inputting the problem into an evidence base, and searching to obtain a relevant document set of the problem;
based on a feature statistical algorithm, extracting features from a relevant document set of the question to obtain an alternative answer set, wherein the alternative answer set comprises at least one alternative answer corresponding to the question;
and scoring each alternative answer in the alternative answer set obtained after the feature extraction, and taking the alternative answer before scoring as a second alternative answer set, wherein P is an integer greater than or equal to 1.
Wherein the evidence base used in this step may be the same as the evidence base used in step 602.
Step 604: and combining the first candidate answer set and the second candidate answer set, and taking the candidate answer with the highest score after the combination processing as the correct answer of the question.
Optionally, an intersection of the first candidate answer set and the second candidate answer set may be extracted, and a candidate answer with the highest score in the extracted intersection may be used as a correct answer to the question.
And weighting the same candidate answer in the first candidate answer set and the second candidate answer set, and taking the candidate answer with the highest score after weighting as the correct answer of the question.
Step 605: and feeding back the correct answer to the user.
Therefore, correct answers to questions are obtained by combining the discourse structure analysis algorithm and the feature statistical algorithm, the types of the selected candidate answers are greatly different due to different modes adopted when the two candidate answers are selected, and usually, incorrect answers contained in a candidate answer set returned based on the discourse structure analysis algorithm generally do not appear in the candidate answer set selected based on the feature statistical algorithm, so that the question-answering method can remove the incorrect answers with grades ahead to a great extent by means of complementarity of the two algorithms, and the accuracy of a question-answering system is improved.
The above description mainly introduces the solutions provided in the embodiments of the present application from the perspective of a question-answering system. It is understood that the question-answering system includes hardware structures and/or software modules corresponding to the respective functions in order to implement the above-described functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative algorithm steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiment of the present application, the functional modules of the question-answering system may be divided according to the above method example, for example, each functional module (such as the question-answering system shown in fig. 2 to 5) may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.
In the case of integrated units, fig. 7 shows another possible schematic composition of the question-answering system involved in the above-described embodiment. As shown in fig. 7, the question-answering system may include at least one processor 71, a memory 72, a communication unit 73, and a communication bus 74. The following specifically describes each component of the question answering system with reference to fig. 7:
the processor 71 is a control center of the question answering system, and may be a single processor or a collective term for a plurality of processing elements. For example, the processor 71 is a Central Processing Unit (CPU), and may be an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application, such as: one or more microprocessors (digital signal processors, DSPs), or one or more Field Programmable Gate Arrays (FPGAs). The processor 71 may perform various functions of the question and answer system by running or executing software programs stored in the memory 72, and calling data stored in the memory 72, among other things.
In particular implementations, processor 71 may include one or more CPUs such as CPU0 and CPU1 shown in fig. 7 as one example. In particular implementations, the question-answering system may include a plurality of processors, such as processor 71 and processor 75 shown in FIG. 7, as one embodiment. Each of these processors may be a single-Core Processor (CPU) or a multi-Core Processor (CPU). A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
The memory 72 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that may store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 72 may be separate and coupled to the processor 71 via a communication bus 74. The memory 72 may also be integrated with the processor 71. The memory 72 is used for storing software programs for executing the schemes provided by the embodiments of the present application, and is controlled by the processor 71 to execute the schemes.
A communication unit 73 for interacting with a user or other device, such as: the communication unit 73 may be a user interaction interface of the question-answering system.
The communication bus 74 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.
The question-answering system shown in fig. 7 can perform the operations performed by the question-answering system in the question-answering method provided by the embodiment of the present application. Therefore, all relevant contents of each step related to the method embodiment may be referred to the functional description of the corresponding functional module, which is not described herein again, for example: processor 71 may be configured to support the question answering system to perform steps 602-604, and communication unit 73 may be configured to support the question answering system to perform steps 601, 605. The question answering system provided by the embodiment of the invention is used for executing the question answering method, so that the same effect as the question answering method can be achieved.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical functional division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another device, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may be one physical unit or a plurality of physical units, that is, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present invention may be essentially or partially contributed to by the prior art, or all or part of the technical solution may be embodied in the form of a software product, where the software product is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions within the technical scope of the present invention are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Claims (21)
- A question-answering system, comprising:the user interaction module is used for receiving a question provided by a user;the chapter structure analysis module is used for obtaining a first candidate answer set corresponding to the question received by the user interaction module based on a chapter structure analysis algorithm; the chapter structure analysis algorithm is used for obtaining candidate answers corresponding to the question by adopting syntactic structure analysis or defined grammar rules or a structured knowledge base, and the first candidate answer set comprises at least one first candidate answer corresponding to the question and the score of the first candidate answer;the characteristic statistical module is used for obtaining a second candidate answer set corresponding to the question received by the user interaction module based on a characteristic statistical algorithm; the feature statistical algorithm is used for obtaining candidate answers corresponding to the questions in a word frequency statistical mode, and the second candidate answer set comprises at least one second candidate answer corresponding to the questions and scores of the second candidate answers;the combined processing module is used for carrying out combined processing on the first candidate answer set and the second candidate answer set and taking the candidate answer with the highest score after the combined processing as the correct answer of the question;and the user interaction module is also used for feeding back the correct answer to the user.
- The question-answering system according to claim 1, wherein the chapter structure analysis module specifically comprises:the problem analysis unit is used for performing word segmentation, syntax analysis and named entity identification on the problem to obtain at least one subproblem and at least one keyword corresponding to the subproblem;the retrieval unit is used for inputting at least one keyword corresponding to the subproblem into a first corpus respectively for any subproblem in the at least one subproblem and retrieving to obtain a related document set of each keyword;the alternative answer generating unit is used for extracting at least one alternative answer corresponding to the sub-question from the relevant document set of all keywords corresponding to the sub-question for any sub-question in the at least one sub-question, and performing hypothesis generation and soft filtering processing on the at least one alternative answer to obtain an alternative answer set corresponding to the sub-question; the alternative answer set comprises at least one alternative answer;the evidence retrieval scoring unit is used for substituting at least one alternative answer in the alternative answer set corresponding to the subproblem into the subproblem to generate at least one statement for any subproblem in the at least one subproblem, inputting each statement into an evidence base for retrieval, and scoring the alternative answers corresponding to the statements according to the number of the retrieved related documents;and the answer synthesizing and sorting unit is used for synthesizing the alternative answer set corresponding to each subproblem, and taking M alternative answers before scoring in the synthesized alternative answer set as the first candidate answer set, wherein M is an integer greater than or equal to 1.
- The question-answering system according to claim 2, wherein the feature statistics module specifically comprises:the searching unit is used for inputting the problems received by the user interaction module into a second corpus and searching to obtain a related document set of the problems;the feature extraction unit is used for extracting features from the relevant document set searched by the search unit based on a feature statistical algorithm to obtain an alternative answer set, and the alternative answer set comprises at least one alternative answer corresponding to the question;the feature scoring and answer sorting unit is used for scoring each alternative answer in the alternative answer set determined by the feature extraction unit, and taking the alternative answer of N before scoring as the second candidate answer set, wherein N is an integer greater than or equal to 1;the first corpus and the second corpus are different.
- The question-answering system according to claim 3,the number of the corpora contained in the second corpus is greater than the number of the corpora contained in the first corpus.
- The question-answering system according to claim 2, wherein the feature statistics module specifically comprises:the feature extraction unit is used for extracting features from all relevant document sets retrieved by the retrieval unit based on a feature statistical algorithm to obtain an alternative answer set, and the alternative answer set comprises at least one alternative answer corresponding to the question;and the feature scoring and answer ranking unit is used for scoring each candidate answer in the candidate answer set determined by the feature extraction unit, and taking an O candidate answer before scoring as the second candidate answer set, wherein O is an integer greater than or equal to 1.
- The question-answering system according to claim 2, wherein the feature statistics module specifically comprises:the searching unit is used for inputting the problems received by the user interaction module into the evidence base and searching to obtain a related document set of the problems;the feature extraction unit is used for extracting features from the relevant document set searched by the search unit based on a feature statistical algorithm to obtain an alternative answer set, and the alternative answer set comprises at least one alternative answer corresponding to the question;and the feature scoring and answer ranking unit is used for scoring each candidate answer in the candidate answer set determined by the feature extraction unit, and taking a candidate answer of P before scoring as the second candidate answer set, wherein P is an integer greater than or equal to 1.
- The question-answering system according to any one of claims 1 to 6, characterized in that the combined processing module is specifically configured to:extracting the intersection of the first candidate answer set and the second candidate answer set, and taking the candidate answer with the highest score in the extracted intersection as the correct answer of the question; orAnd performing weighting processing on the same candidate answer in the second candidate answer set of the first candidate answer set, and taking the candidate answer with the highest score after weighting processing as the correct answer of the question.
- A question-answering method, comprising:receiving a question posed by a user;obtaining a first candidate answer set corresponding to the question based on a discourse structure analysis algorithm; the chapter structure analysis algorithm is used for obtaining candidate answers corresponding to the question by adopting syntactic structure analysis or defined grammar rules or a structured knowledge base, and the first candidate answer set comprises at least one first candidate answer corresponding to the question and the score of the first candidate answer;obtaining a second candidate answer set corresponding to the question based on a feature statistical algorithm; the feature statistical algorithm is used for obtaining candidate answers corresponding to the questions in a word frequency statistical mode, and the second candidate answer set comprises at least one second candidate answer corresponding to the questions and scores of the second candidate answers;combining the first candidate answer set and the second candidate answer set, and taking the candidate answer with the highest score after the combination processing as the correct answer of the question;and feeding back the correct answer to the user.
- The method of claim 8, wherein the obtaining a first set of candidate answers corresponding to the question based on a discourse structure analysis algorithm comprises:performing word segmentation, syntax analysis and named entity identification on the problem to obtain at least one subproblem and at least one keyword corresponding to the subproblem;for any sub-question in the at least one sub-question, respectively inputting at least one keyword corresponding to the sub-question into a first corpus, and retrieving to obtain a related document set of each keyword;for any sub-question in the at least one sub-question, extracting at least one alternative answer corresponding to the sub-question from a related document set of all keywords corresponding to the sub-question, and performing hypothesis generation and soft filtering processing on the at least one alternative answer to obtain an alternative answer set corresponding to the sub-question; the alternative answer set comprises at least one alternative answer;for any sub-question in the at least one sub-question, substituting at least one alternative answer in an alternative answer set corresponding to the sub-question into the sub-question to generate at least one statement, inputting each statement into an evidence base for retrieval, and scoring the alternative answer corresponding to the statement according to the number of retrieved related documents;and synthesizing a candidate answer set corresponding to each subproblem, and taking M candidate answers before scoring in the synthesized candidate answer set as the first candidate answer set, wherein M is an integer greater than or equal to 1.
- The method of claim 9, wherein the deriving a second set of candidate answers corresponding to the question based on a feature statistics algorithm comprises:inputting the question into a second corpus, and searching to obtain a related document set of the question;extracting features from a relevant document set of the question based on a feature statistical algorithm to obtain an alternative answer set, wherein the alternative answer set comprises at least one alternative answer corresponding to the question;scoring each alternative answer in the alternative answer set obtained after the feature extraction, and taking N alternative answers before scoring as the second alternative answer set, wherein N is an integer greater than or equal to 1;the first corpus and the second corpus are different.
- The method of claim 10,the number of the corpora contained in the second corpus is greater than the number of the corpora contained in the first corpus.
- The method of claim 9, wherein the deriving a second set of candidate answers corresponding to the question based on a feature statistics algorithm comprises:based on a feature statistical algorithm, extracting features from the relevant document sets of all the keywords corresponding to the sub-questions to obtain an alternative answer set, wherein the alternative answer set comprises at least one alternative answer corresponding to the question;and scoring each alternative answer in the alternative answer set obtained after the feature extraction, and taking an alternative answer before scoring as the second alternative answer set, wherein O is an integer greater than or equal to 1.
- The method according to claim 9, wherein the feature statistics specifically include:inputting the question into the evidence base, and searching to obtain a related document set of the question;extracting features from a relevant document set of the question based on a feature statistical algorithm to obtain an alternative answer set, wherein the alternative answer set comprises at least one alternative answer corresponding to the question;and scoring each alternative answer in the alternative answer set obtained after the feature extraction, and taking the alternative answer before scoring as the second alternative answer set, wherein P is an integer greater than or equal to 1.
- The method according to any one of claims 8 to 13, wherein the combining the first candidate answer set and the second candidate answer set, and wherein the combining the highest scoring candidate answer as the correct answer to the question comprises:extracting the intersection of the first candidate answer set and the second candidate answer set, and taking the candidate answer with the highest score in the extracted intersection as the correct answer of the question; orAnd performing weighting processing on the same candidate answer in the second candidate answer set of the first candidate answer set, and taking the candidate answer with the highest score after weighting processing as the correct answer of the question.
- A question-answering system, comprising:a communication unit for receiving a question posed by a user;the processor is used for obtaining a first candidate answer set corresponding to the question received by the communication unit based on a discourse structure analysis algorithm; the chapter structure analysis algorithm is used for obtaining candidate answers corresponding to the question by adopting syntactic structure analysis or defined grammar rules or a structured knowledge base, and the first candidate answer set comprises at least one first candidate answer corresponding to the question and the score of the first candidate answer;the processor is further configured to obtain a second candidate answer set corresponding to the question received by the communication unit based on a feature statistical algorithm; the feature statistical algorithm is used for obtaining candidate answers corresponding to the questions in a word frequency statistical mode, and the second candidate answer set comprises at least one second candidate answer corresponding to the questions and scores of the second candidate answers;the processor is further configured to perform combination processing on the first candidate answer set and the second candidate answer set, and use a candidate answer with a highest score after the combination processing as a correct answer to the question;the communication unit is further configured to feed back the correct answer to the user.
- The question-answering system according to claim 15, wherein the processor is specifically configured to:performing word segmentation, syntax analysis and named entity identification on the problem to obtain at least one subproblem and at least one keyword corresponding to the subproblem;for any sub-question in the at least one sub-question, respectively inputting at least one keyword corresponding to the sub-question into a first corpus, and retrieving to obtain a related document set of each keyword;for any sub-question in the at least one sub-question, extracting at least one alternative answer corresponding to the sub-question from a related document set of all keywords corresponding to the sub-question, and performing hypothesis generation and soft filtering processing on the at least one alternative answer to obtain an alternative answer set corresponding to the sub-question; the alternative answer set comprises at least one alternative answer;for any sub-question in the at least one sub-question, substituting at least one alternative answer in an alternative answer set corresponding to the sub-question into the sub-question to generate at least one statement, inputting each statement into an evidence base for retrieval, and scoring the alternative answer corresponding to the statement according to the number of retrieved related documents;and synthesizing a candidate answer set corresponding to each subproblem, and taking M candidate answers before scoring in the synthesized candidate answer set as the first candidate answer set, wherein M is an integer greater than or equal to 1.
- The question-answering system according to claim 16, wherein the processor is specifically configured to:inputting the questions received by the communication unit into a second corpus, and searching to obtain a related document set of the questions;based on a feature statistical algorithm, performing feature extraction from a related document set searched by the processor to obtain an alternative answer set, wherein the alternative answer set comprises at least one alternative answer corresponding to the question;scoring each alternative answer in the alternative answer set determined by the processor, and taking the alternative answer of N before scoring as the second candidate answer set, wherein N is an integer greater than or equal to 1;the first corpus and the second corpus are different.
- The question-answering system according to claim 17,the number of the corpora contained in the second corpus is greater than the number of the corpora contained in the first corpus.
- The question-answering system according to claim 16, wherein the processor is specifically configured to:based on a feature statistical algorithm, performing feature extraction on all relevant document sets retrieved by the processor to obtain an alternative answer set, wherein the alternative answer set comprises at least one alternative answer corresponding to the question;and scoring each alternative answer in the alternative answer set determined by the processor, and taking an alternative answer before scoring, which is an integer greater than or equal to 1, as the second candidate answer set.
- The question-answering system according to claim 16, wherein the processor is specifically configured to:inputting the questions received by the communication unit into the evidence base, and searching to obtain a related document set of the questions;based on a feature statistical algorithm, performing feature extraction from a related document set searched by the processor to obtain an alternative answer set, wherein the alternative answer set comprises at least one alternative answer corresponding to the question;and scoring each alternative answer in the alternative answer set determined by the processor, and taking the alternative answer before scoring as the second alternative answer set, wherein P is an integer greater than or equal to 1.
- The question-answering system according to any one of claims 15-20, wherein the processor is specifically configured to:extracting the intersection of the first candidate answer set and the second candidate answer set, and taking the candidate answer with the highest score in the extracted intersection as the correct answer of the question; orAnd performing weighting processing on the same candidate answer in the second candidate answer set of the first candidate answer set, and taking the candidate answer with the highest score after weighting processing as the correct answer of the question.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/090401 WO2019000240A1 (en) | 2017-06-27 | 2017-06-27 | Question answering system and question answering method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110799970A true CN110799970A (en) | 2020-02-14 |
Family
ID=64740209
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201780092702.9A Pending CN110799970A (en) | 2017-06-27 | 2017-06-27 | Question-answering system and question-answering method |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110799970A (en) |
WO (1) | WO2019000240A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114942986A (en) * | 2022-06-21 | 2022-08-26 | 平安科技(深圳)有限公司 | Text generation method and device, computer equipment and computer readable storage medium |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111353290B (en) * | 2020-02-28 | 2023-07-14 | 支付宝(杭州)信息技术有限公司 | Method and system for automatically responding to user inquiry |
CN111782790A (en) * | 2020-07-03 | 2020-10-16 | 阳光保险集团股份有限公司 | Document analysis method and device, electronic equipment and storage medium |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101329683A (en) * | 2008-07-25 | 2008-12-24 | 华为技术有限公司 | Recommendation system and method |
CN103229162A (en) * | 2010-09-28 | 2013-07-31 | 国际商业机器公司 | Providing answers to questions using logical synthesis of candidate answers |
US20140297571A1 (en) * | 2013-03-29 | 2014-10-02 | International Business Machines Corporation | Justifying Passage Machine Learning for Question and Answer Systems |
CN104536991A (en) * | 2014-12-10 | 2015-04-22 | 乐娟 | Answer extraction method and device |
CN104572797A (en) * | 2014-05-12 | 2015-04-29 | 深圳市智搜信息技术有限公司 | Individual service recommendation system and method based on topic model |
CN104615724A (en) * | 2015-02-06 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Establishing method of knowledge base and information search method and device based on knowledge base |
US20150347569A1 (en) * | 2014-05-29 | 2015-12-03 | International Business Machines Corporation | Managing documents in question answering systems |
CN105159996A (en) * | 2015-09-07 | 2015-12-16 | 百度在线网络技术(北京)有限公司 | Deep question-and-answer service providing method and device based on artificial intelligence |
CN105760417A (en) * | 2015-01-02 | 2016-07-13 | 国际商业机器公司 | Cognitive Interactive Searching Method And System Based On Personalized User Model And Context |
CN106649258A (en) * | 2016-09-22 | 2017-05-10 | 北京联合大学 | Intelligent question and answer system |
CN106649786A (en) * | 2016-12-28 | 2017-05-10 | 北京百度网讯科技有限公司 | Deep question answer-based answer retrieval method and device |
CN106874441A (en) * | 2017-02-07 | 2017-06-20 | 腾讯科技(上海)有限公司 | Intelligent answer method and apparatus |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3385146B2 (en) * | 1995-06-13 | 2003-03-10 | シャープ株式会社 | Conversational sentence translator |
CN1952928A (en) * | 2005-10-20 | 2007-04-25 | 梁威 | Computer system to constitute natural language base and automatic dialogue retrieve |
CN103605781A (en) * | 2013-11-29 | 2014-02-26 | 苏州大学 | Implicit expression chapter relationship type inference method and system |
-
2017
- 2017-06-27 WO PCT/CN2017/090401 patent/WO2019000240A1/en active Application Filing
- 2017-06-27 CN CN201780092702.9A patent/CN110799970A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101329683A (en) * | 2008-07-25 | 2008-12-24 | 华为技术有限公司 | Recommendation system and method |
CN103229162A (en) * | 2010-09-28 | 2013-07-31 | 国际商业机器公司 | Providing answers to questions using logical synthesis of candidate answers |
US20140297571A1 (en) * | 2013-03-29 | 2014-10-02 | International Business Machines Corporation | Justifying Passage Machine Learning for Question and Answer Systems |
CN104572797A (en) * | 2014-05-12 | 2015-04-29 | 深圳市智搜信息技术有限公司 | Individual service recommendation system and method based on topic model |
US20150347569A1 (en) * | 2014-05-29 | 2015-12-03 | International Business Machines Corporation | Managing documents in question answering systems |
CN104536991A (en) * | 2014-12-10 | 2015-04-22 | 乐娟 | Answer extraction method and device |
CN105760417A (en) * | 2015-01-02 | 2016-07-13 | 国际商业机器公司 | Cognitive Interactive Searching Method And System Based On Personalized User Model And Context |
CN104615724A (en) * | 2015-02-06 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Establishing method of knowledge base and information search method and device based on knowledge base |
CN105159996A (en) * | 2015-09-07 | 2015-12-16 | 百度在线网络技术(北京)有限公司 | Deep question-and-answer service providing method and device based on artificial intelligence |
CN106649258A (en) * | 2016-09-22 | 2017-05-10 | 北京联合大学 | Intelligent question and answer system |
CN106649786A (en) * | 2016-12-28 | 2017-05-10 | 北京百度网讯科技有限公司 | Deep question answer-based answer retrieval method and device |
CN106874441A (en) * | 2017-02-07 | 2017-06-20 | 腾讯科技(上海)有限公司 | Intelligent answer method and apparatus |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114942986A (en) * | 2022-06-21 | 2022-08-26 | 平安科技(深圳)有限公司 | Text generation method and device, computer equipment and computer readable storage medium |
CN114942986B (en) * | 2022-06-21 | 2024-03-19 | 平安科技(深圳)有限公司 | Text generation method, text generation device, computer equipment and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2019000240A1 (en) | 2019-01-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109408526B (en) | SQL sentence generation method, device, computer equipment and storage medium | |
CN109783651B (en) | Method and device for extracting entity related information, electronic equipment and storage medium | |
CN108287858B (en) | Semantic extraction method and device for natural language | |
US9223779B2 (en) | Text segmentation with multiple granularity levels | |
WO2021189951A1 (en) | Text search method and apparatus, and computer device and storage medium | |
US20180052823A1 (en) | Hybrid Classifier for Assigning Natural Language Processing (NLP) Inputs to Domains in Real-Time | |
WO2009000103A1 (en) | Word probability determination | |
JPH1145241A (en) | Japanese syllabary-chinese character conversion system and computer-readable recording medium where programs making computer function as means of same system is recorded | |
CN111414763A (en) | Semantic disambiguation method, device, equipment and storage device for sign language calculation | |
JP2010537286A (en) | Creating an area dictionary | |
CN109840255A (en) | Reply document creation method, device, equipment and storage medium | |
CN110032734B (en) | Training method and device for similar meaning word expansion and generation of confrontation network model | |
US20040186706A1 (en) | Translation system, dictionary updating server, translation method, and program and recording medium for use therein | |
CN109815390B (en) | Method, device, computer equipment and computer storage medium for retrieving multilingual information | |
CN110799970A (en) | Question-answering system and question-answering method | |
CN118296120A (en) | Large-scale language model retrieval enhancement generation method for multi-mode multi-scale multi-channel recall | |
CN113505196B (en) | Text retrieval method and device based on parts of speech, electronic equipment and storage medium | |
CN114490984A (en) | Question-answer knowledge extraction method, device, equipment and medium based on keyword guidance | |
CN112182159B (en) | Personalized search type dialogue method and system based on semantic representation | |
CN111859974A (en) | Semantic disambiguation method and device combined with knowledge graph and intelligent learning equipment | |
CN114970524B (en) | Controllable text generation method and device | |
CN115796194A (en) | English translation system based on machine learning | |
CN112800314B (en) | Method, system, storage medium and equipment for search engine query automatic completion | |
CN109727591B (en) | Voice search method and device | |
CN115544204A (en) | Bad corpus filtering method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200214 |
|
RJ01 | Rejection of invention patent application after publication |