CN107885844A - Automatic question-answering method and system based on systematic searching - Google Patents
Automatic question-answering method and system based on systematic searching Download PDFInfo
- Publication number
- CN107885844A CN107885844A CN201711107543.3A CN201711107543A CN107885844A CN 107885844 A CN107885844 A CN 107885844A CN 201711107543 A CN201711107543 A CN 201711107543A CN 107885844 A CN107885844 A CN 107885844A
- Authority
- CN
- China
- Prior art keywords
- question sentence
- template
- question
- answer
- sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000009897 systematic effect Effects 0.000 title claims abstract description 12
- 238000003058 natural language processing Methods 0.000 claims abstract description 18
- 238000010276 construction Methods 0.000 claims description 18
- 238000005194 fractionation Methods 0.000 claims description 17
- 230000000750 progressive effect Effects 0.000 claims description 17
- 239000000203 mixture Substances 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 5
- 230000003247 decreasing effect Effects 0.000 claims description 4
- 238000009826 distribution Methods 0.000 description 6
- 230000002349 favourable effect Effects 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 4
- 238000005520 cutting process Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000003860 storage Methods 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 238000000151 deposition Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000009313 farming Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000013549 information retrieval technique Methods 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Automatic question-answering method and system based on systematic searching, class problem is answered a letter first and carries out Chinese natural language processing generation syntax tree, and then the complicated question comprising multiple subproblems is split as semantic simple question sentence that is more specific, only including single problem by the crucial verb and conjunction in question sentence;Next each simple question sentence is classified using the trigger word corresponding to predefined question sentence type, and the key message needed for the type question sentence template is extracted from question sentence, it is corresponding to insert in template groove, form the question sentence template for including all information needed of solving a problem;Then according to the type of question sentence template, in template knowledge base corresponding to slave phase, the keyword in key message corresponding templates knowledge base in template groove is retrieved, and obtains some candidate answers;Finally, optimal answer is chosen from candidate answers, as the answer of simple problem, then the answer of all simple problems is integrated, draws final result.
Description
Technical field
The invention belongs to field of computer technology, is related to natural language processing and automatic question answering technology, is based on dividing to be a kind of
The automatic question-answering method and its system of class retrieval, it is particularly a kind of that class problem is answered towards middle school geography letter based on systematic searching
Chinese natural language automatic question-answering method.
Background technology
Natural language processing (natural language processing) is the branch in artificial intelligence and linguistics field
Subject, research can realize the various theoretical and methods for carrying out efficient communication between people and computer with natural language.Handle nature
The key of language is computer to be allowed " understanding " natural language, so that computer can interact with people.Natural language processing
Key technology includes Chinese word segmentation, part-of-speech tagging, syntactic analysis, name Entity recognition, reference resolution, dependency analysis etc..
Question answering system (question answering system), based on information retrieval technique and natural language processing, energy
Answer the natural language problem of user automatically with accurate, succinct natural language.Question answering system can automatically analyze problem and to
Go out corresponding candidate answers, traditional automatically request-answering system is mainly by module structures such as Research of Question Analysis, information retrieval and answer generations
Into.
Automatic question answering is mainly what text-oriented set was carried out.Keyword first in problem analysis, then passes through key
Word is retrieved in text library, some documents before certainty factor highest in acquisition returning result, then therefrom generates answer.
The content of the invention
The problem to be solved in the present invention is:How by the knowledge base that sets, it is automatic answer middle school geography letter and answer class ask
Inscribe that this kind of specific area, question sentence are complex, problem can classify, different classes of have the problem of corresponding search method.
The technical scheme is that:A kind of automatic question-answering method based on systematic searching, is realized by computer program
The automatic answer of class problem is answered in letter, and it is the complicated question for including multiple subproblems that class problem is answered in the letter, is comprised the following steps:It is first
Class problem progress Chinese natural language processing generation syntax tree is first answered a letter, and the Chinese natural language processing includes dividing
Word, part-of-speech tagging and syntactic analysis, then according to the simple crucial verb and conjunction answered in class problem by comprising multiple subproblems
Complicated question is split as semantic simple question sentence that is more specific, only including single problem;
Question sentence template is preset, each template is provided with corresponding trigger word, and the question sentence template, which refers to, meets interrogative sentence language
The question sentence model of method, different trigger words correspond to different types of question sentence template, and template is provided with corresponding template groove, and template groove is used for
Insert the key message needed for the question sentence template;
Using the trigger word corresponding to question sentence template, each simple question sentence is classified, and is extracted from simple question sentence
Key message needed for the question sentence template, the part of speech definition of corresponding templates setting are inserted in template groove, are formed required comprising solving a problem
The question sentence template of information;Then according to the type of question sentence template, in template knowledge base corresponding to slave phase, according to the pass in template groove
Keyword in key information corresponding templates knowledge base is retrieved, and obtains some candidate answers;Finally, chosen from candidate answers
Optimal answer, as the answer of simple problem, then the answer of all simple problems is integrated, draws final result.
After obtaining simple question sentence, the progressive relationship simple question sentence is analyzed, asked according to verb and conjunction
When sentence is split, the question sentence feature after reference and fractionation in question sentence, judge whether there is progressive relationship between subproblem, it is progressive
The answer that relation refers to previous subproblem can be as the key message of the latter subproblem;If there is progressive relationship, using previous
Individual sub- problem answers use previous subproblem answer when subsequently solving a problem as being marked in the subproblem of key message
Key message.
Simple question sentence is split to comprise the following steps:
1.1) first a letter is answered class problem question sentence is segmented by natural language processing technique, part-of-speech tagging, sentence
Method is analyzed, and obtains syntax tree;
1.2) syntax tree obtained based on the step 1.1), judges to whether there is multiple crucial verbs in question sentence, if depositing
Question sentence is then being split into by some question sentences for including a crucial verb according to crucial verb, otherwise without dynamic according to key
Word carries out question sentence fractionation;
1.3) question sentence for only including a crucial verb obtained based on the step 1.2), judges question sentence by syntax tree
In whether there is the parallel construction being made up of conjunction connection, if in the presence of parallel construction being taken out, and according in parallel construction
Conjunction split into some question sentences not comprising parallel construction, otherwise without according to conjunction carry out question sentence fractionation;After this step
Former question sentence can be split into only comprising a crucial verb and in the absence of some simple question sentences of parallel construction.
From template knowledge base retrieval obtain final result process it is as follows:
2.1) template knowledge base is carried out extensive in advance, and it is possessed in this place that specific entity therein is replaced with into the entity
Key feature, and carry out answer retrieval using these features as keyword help;
2.2), will be corresponding in the template knowledge base of respective type according to the semanteme of template groove in different type question sentence template
Index is established on a certain row or a few row of template groove;
2.3) knowledge base of index is established based on the step 2.2), according to the question sentence mould comprising information needed of solving a problem
Plate, in corresponding template knowledge base, using the key message in question sentence template as keyword, existed according to the semanteme of template groove
Retrieved on corresponding index, the order for obtaining some candidate answers and successively decreasing according to similarity arranges;
2.4) candidate answers obtained based on the step 2.3), each first three candidate answers of selection similarity highest
Answer as simple question sentence;
2.5) answer of the simple question sentence obtained based on affiliated step 2.4), the simple question sentence that comprehensive all fractionations obtain
Answer, it is the final result of whole problem to be solved.
A kind of automatically request-answering system device based on systematic searching, it is stored thereon with computer program, the computer journey
Foregoing method is realized when sequence is performed.
The present invention is mainly directed towards middle school geography letter and answers solving a problem for class problem, it is proposed that if a kind of be first split as complicated question
Stem structure is clear, the simple question sentence of definite semantic;Then simple question sentence is classified, according to the type of question sentence, using corresponding
Question sentence template, and from question sentence extract key message insert in template groove;Then the root from the template knowledge base of corresponding types
Answer is retrieved according to key message in groove, finally selection and synthesis from candidate answers, forms the automatic question-answering method of final result,
And realize an automatically request-answering system.
The beneficial effects of the invention are as follows:(1) complicated question is split as some comprising a crucial verb and do not included
The simple question sentence of parallel construction so that the question sentence structure after fractionation is relatively sharp, semantic more clear and definite, helps to carry out problem
Groove operation is filled out in semantic analysis and follow-up classification, and is judged in split process and be marked the progressive relationship between subproblem, from
And improve the accuracy of answer.(2) type of question sentence and the question sentence template of respective type and template trigger word are predefined;Press
Question sentence is classified according to template trigger word, retrieved in the knowledge base of respective type, is closed needed for different classes of template
Key information is different, can more targetedly solve the problems, such as different type, so as to improve the accuracy of answer.(3) in
Learn geographical letter and answer class problem, this method preferably can be split to question sentence, semantic analysis, classification and template fill out groove, can be with
The question sentence for including multiple subproblems that progressive relationship be present is answered, answer step is clear and orderly, and it is more careful to obtain
And accurate answer.
Brief description of the drawings
Fig. 1 is the disposed of in its entirety flow chart of the present invention.
Embodiment
The present invention by the way that complicated question is split as into some simple question sentences clear in structure, definite semantic, and to tearing open
After simple question sentence after point is classified, answer is retrieved in the template knowledge base of respective classes, so as to answering middle school automatically
Li Jian answers that this kind of specific area of class problem, question sentence are complex, problem can classify, different classes of have asking for corresponding search method
Topic.
Illustrated below so that class problem is answered in middle school geography letter as an example, the inventive method comprises the following steps:First to one
Individual middle school geography letter answers class problem and carries out Chinese natural language processing, realizes participle, part-of-speech tagging, syntactic analysis etc., generates sentence
Method tree;Then the crucial verb and conjunction in sentence are split to question sentence, obtain some comprising a crucial verb and
Simple question sentence not comprising parallel construction, and marked the coordination between question sentence;Next predefined question sentence type is used
Corresponding template trigger word is classified to question sentence, determines the type of question sentence template, and category question sentence is extracted from question sentence
Key message, insert in the groove of template, formed question sentence template;Then answer retrieval is carried out, according to the type of question sentence template, from
In corresponding template knowledge base, the key message in template groove is retrieved in template knowledge base, obtains some times
Select answer;Finally, most correct answer is chosen from candidate answers, the answer of all simple problems is integrated, drawn most
Whole answer.
The method flow of the present invention is as shown in figure 1, including 3 parts:One middle school geography letter is answered in the progress of class problem
Literary natural language processing, according to crucial verb and crucial conjunction, question sentence fractionation is carried out with reference to syntax tree, complicated question is split as
The only simple question sentence comprising a crucial verb and not comprising composition arranged side by side;Then simple problem is divided according to trigger word
Class, and extract the key message in question sentence and insert in the groove of question sentence template corresponding to each type, obtain question sentence template;Last root
According to the type of question sentence template, candidate answers are retrieved in corresponding template knowledge base, are drawn most by the selection and synthesis of answer
Whole answer.
Specific embodiment is respectively described below:
1. answering class problem for a middle school geography letter, question sentence is segmented by natural language processing technique, part of speech
Mark, syntactic analysis, for the question sentence comprising multiple subproblems, it is necessary to crucial verb and conjunction in question sentence, by these
If complicated question sentence is split as, stem structure is apparent, the semantic simple question sentence for more clearly, only including a single problem.
1.1) class problem first is answered to a middle school geography letter, question sentence segmented by natural language processing technique, word
Property mark, syntactic analysis, obtain syntax tree;
1.2) syntax tree obtained based on the step 1.1), judges to whether there is multiple crucial verbs in question sentence, if depositing
Question sentence is then being split into by some question sentences for including a crucial verb according to crucial verb, otherwise without dynamic according to key
Word carries out question sentence fractionation;
1.3) question sentence for only including a crucial verb obtained based on the step 1.2), judges question sentence by syntax tree
In whether there is the parallel construction being made up of conjunction connection, if in the presence of parallel construction being taken out, and according in parallel construction
Conjunction split into some question sentences not comprising parallel construction, otherwise without according to conjunction carry out question sentence fractionation;After this step
Former question sentence can be split into only comprising a crucial verb and in the absence of some simple question sentences of parallel construction.
Further, in question sentence split process, due to progressive relationship between some subproblems be present, i.e., previous subproblem
Answer can be as the key message of the latter subproblem, so when carrying out question sentence fractionation according to verb and conjunction, it is necessary to root
According to the question sentence feature after the reference and fractionation in question sentence, judge whether there is progressive relationship between subproblem, if so, needing using
Previous subproblem answer in the subproblem of key message as marking.
Illustrate below by embodiment.
Class problem question sentence is answered for a middle school geography letter, possible itself is exactly a simple question sentence, such as:" analysis A
The natural cause of reservoir is built on ground.", an only crucial verb " analysis " in the words, also it is not present and passes through conjunction connection
Composition arranged side by side, so this question sentence is exactly a simple question sentence;In another example:" with illustrating A staple crops and its distribution, and explain
State the physical geography condition of agricultural development.", two crucial verbs " explanation " and " elaboration " are contained in the words, also contains
Conjunction " and " two compositions arranged side by side being connected, so the words is a typical complicated question.
Natural language processing is carried out to question sentence first, uses the NLP of Open-Source Tools bag, such as Stanford Univ USA
Parser, the FudanNLP etc. of Chinese Fudan University, question sentence is segmented, part-of-speech tagging, syntactic analysis etc., form syntax
Tree.
In this process, in order to improve the accuracy rate of participle, some and geographically relevant vocabulary is added, are mainly opened
Some particular entity names, long physical name, place name and proper noun of source kit None- identified etc., it will increase income originally Chinese natural
Issuable participle mistake is corrected during language processing tools Packet analyzing sentence, avoids the mistake during participle as far as possible
By mistake.On the basis of above-mentioned participle, part-of-speech tagging and syntactic analysis are carried out, and generate the syntax tree of question sentence.
The first step that question sentence is split is that the crucial verb in question sentence is split.Especially, for middle school geography letter
Class problem is answered, crucial verb is the enquirement verb in question sentence.Here enquirement verb refers to that middle school geography letter is answered to be made in class problem
, for problem of representation requirement or answer degree, active word in be guided out question sentence content, place question sentence.Middle school
The enquirement verb that Li Jian answers used in class problem has 15, contains most situations.Now it is listed below:
Infer, analyze, sketching, illustrating, summarizing, illustrating, enumerating, illustrating, concluding, pointing out, saying, describing, proposing, generally
Include, judge.
Split according to crucial verb, obtain the question sentence for there was only an enquirement verb in every.Such as above-mentioned complex sentence
The example of son can be first split as:" with illustrating A staple crops and its distribution." and " illustrate the manage bar naturally of agricultural development
Part.”
The second step that question sentence is split is that the crucial conjunction in question sentence is split.According to syntax tree, find out in sentence
Conjunction, it is determined that composition arranged side by side, retains public composition.Such as " with illustrating A staple crops and its distribution.", composition arranged side by side
It is " A staple crops " and " it is distributed ", can be split as:" with illustrating A staple crops." and " with illustrating A main farming
Thing is distributed.”.Here reference resolution is had been completed before fractionation, " its " is replaced with " A staple crops ".
Between some subproblems formed are split, progressive relationship be present.For example, " with illustrating A staple crops distribution.”
In " A staple crops " refer to is exactly previous problem " explanation A ground staple crops." answer, so splitting
Cheng Zhong, always according to the question sentence feature after the reference and fractionation in question sentence, the progressive relationship between subproblem is judged, and needing
" # " is added in using the answer of previous problem as the question sentence of key message as mark, so the latter question sentence in example
It is represented as " with illustrating #A staple crops distribution.”
Two step fractured operations more than completing, complicated question have just been split into simple question sentence, and simple question sentence is clear in structure, language
It is adopted clearly, and judged the progressive relationship between question sentence be after solve a problem and provide a great help.
2. for simple question sentence, classified according to trigger word corresponding to predefined question sentence type, and extract in question sentence
Key message carry out question sentence template fill out groove
Question sentence is classified first by trigger word corresponding to predefined template, that is, determines the template belonging to the question sentence
Type.For the template of each type, there is corresponding template groove to record the key message needed for the template.From question sentence
Key message needed for template is extracted, and semanteme is inserted successively as corresponding to template groove, is obtained comprising all information needed of solving a problem
Question sentence template:
2.1) some question sentence templates are predefined for question sentence type, such issues that these templates contain solution required pass
Key information, the groove of template correspond to the semanteme of these key messages, and each template has corresponding trigger word to assist in template
Type;According to the trigger word corresponding to predefined question sentence template, the template type belonging to simple question sentence is judged;
2.2) based on obtained template type, for the template of each type, the groove in template has all corresponded to required pass
The semanteme of key information.According to the participle of question sentence, part-of-speech tagging, syntactic analysis result, with reference to the language of the template groove of each type
Justice, cutting is carried out to question sentence, if question sentence is cut into dry ingredients;
2.3) based on obtained question sentence cutting result, the key needed for template is chosen from some question sentence compositions after cutting
Information, and inserted successively by the semanteme of template slot definition, obtain question sentence template.
Answered in middle school geography letter after class problem analyzed, according to the content and form of enquirement, predefine 14
Question sentence type, each question sentence type correspond to a question sentence template.14 question sentence types are as follows:
Problem class, countermeasuress class, influence class, advantage class, condition class, compare class, cause and effect class, transportation class, distribution
Class, feature class, adjustment developing direction class, change class, meating and use class, entity information statement class.
Wherein, each template has specific trigger word, such as:The trigger word of countermeasuress class have " countermeasure ", " measure ",
" good plan " etc., the trigger word of cause and effect class have " reason ", " factor ", " origin cause of formation ", " reason " etc..Especially, last a kind of entity letter
Breath statement class, expression is not belonging to the problem of above any kind, without trigger word.
According to template trigger word, simple question sentence is assigned into corresponding classification first, it is determined that question sentence template used.Such as
" general introduction Chongli is as the weather conditions for avenging main competition field of beginning a project." this question sentence, because comprising trigger word " condition ", it belongs to
In the question sentence of condition class.Template corresponding to condition class question sentence is:" condition (main body, in terms of #, # is favourable/unfavorable/advantage,) ", mould
" # " in plate represent the groove can be sky, "" it is used for problem of representation answer.
Key message progress template will be extracted after classification from question sentence and fills out groove operation.Mainly pass through the syntax of question sentence
Tree construction carrys out cutting question sentence, obtains needing the key message inserted in template groove, is then inserted successively according to the semanteme of template groove.
Such as " general introduction Chongli is as the weather conditions for avenging main competition field of beginning a project.", " general introduction " is to put question to verb, after enquirement verb always
To " " part before word all should be as the main part of condition class question sentence template;Due to " " after word, " condition " it
Preceding not occur words such as similar " favourable ", " unfavorable " or " advantage ", so the 3rd groove is sky, remainder " weather " is exactly
The content of this groove of the aspect of the template.So " general introduction Chongli is as the weather conditions for avenging main competition field of beginning a project." corresponding to ask
Sentence template be:" condition (Chongli is begun a project main competition field as snow, weather, #,)”.So, question sentence template is just contained in question sentence
All information, and the key message that corresponding templates groove semanteme is inserted is exactly the understanding to question sentence.
3. using question sentence template, answer retrieval is carried out from corresponding template knowledge base, obtains candidate answers, and pass through choosing
Take and draw final result with synthesis.
Corresponding each type of question sentence template, has the template knowledge base built, and knowledge base is carried out
It is extensive so that the scope of application of knowledge base is bigger.The type of corresponding templates selectes corresponding knowledge base, and mould is used in knowledge base
Information in board slot is retrieved as keyword, draws some candidate answers, and press similarity descending order.Finally select
Go out some optimal answers and integrate the answer of all simple problems, obtain the answer of whole problem to be solved.
3.1) template knowledge base has been carried out extensive in advance, and specific entity therein is replaced with into the entity is had in this place
Key feature, and using these features as keyword help carry out answer retrieval;
3.2) according to the semanteme of different type template bracket groove, by a certain of corresponding templates groove in the knowledge base of respective type
Index is established on row or a few row;
3.3) knowledge base of index is established based on the step 3.2), according to the question sentence obtained in the step (3.3)
Template, in corresponding template knowledge base, using the key message in question sentence template as keyword, according to the semanteme of template groove
Retrieved on corresponding index, the order for obtaining some candidate answers and successively decreasing according to similarity arranges;
3.4) candidate answers obtained based on the step 3.3), each first three candidate answers of selection similarity highest
Answer as simple problem;
3.5) answer of the simple problem obtained based on affiliated step 3.4), the simple question sentence that comprehensive all fractionations obtain
Answer, it is the final result of whole problem to be solved.
The retrieval that a middle school geography knowledge base is used for answer has been got well in prior manual sorting.This knowledge base is according to question sentence mould
The type of plate is divided into 13 classes, because entity information statement class question sentence is not belonging to preceding 13 class, so this kind of question sentence needs entirely knowing
Know in storehouse and retrieved.
Due to some specific entities in template knowledge base be present, thus template knowledge base has been carried out it is extensive, by these
Entity replaces with the entity possessed determinant attribute in this place, and using these attributes as keyword during retrieval, so as to increase
Add the scope of application of template knowledge base, can also improve the accuracy of answer.
The knowledge base of corresponding different question sentence template types, the storage mode of knowledge is different, determines substantially according to question sentence template groove
The semanteme of justice is divided into several columns storage.Such as:" condition (main body, in terms of #, # is favourable/unfavorable/advantage,) " this condition class mould
Plate, the row in corresponding condition class template knowledge base are respectively:" entity, behavior, in terms of $, $ is favourable/unfavorable, condition ", here
" " represent the row content can be sky.It can be seen that the storage form of template knowledge base is substantially close with the template of formulation.
Module is retrieved when answer and receives a question sentence template, first according to the type of question sentence template, finds corresponding mould
Plate knowledge base.Index is established according to the content of template groove per class template knowledge base., will such as condition class template knowledge base
" entity " and " behavior " is combined foundation index, for using " main body " to carry out text matches in condition class question sentence template;Together
Sample also establishes the index of " aspect " and the row of " favourable/unfavorable " two successively, in condition class question sentence template using " for the use of " and " have
Profit/unfavorable/advantage " carries out text matches.
For the knowledge entry matched, the value of last row is chosen as candidate answers, is successively decreased sequentially according to similarity
Arranged, for follow-up answer selecting step.
The system that the present invention is realized is being established index and retrieved on index, uses the full text information based on Java
Gopher bag Lucene.Lucene can be that the data of text type establish index, as long as the data that can index needs
The text of format conversion, Lucene just can be indexed and search for document.Used in the system that the present invention is realized
Effect is more satisfactory.
For the candidate answers of answer searching step, the system that the present invention is realized chooses first three in candidate answers every time
Answer as simple problem.Finally, the answer of the simple problem after all fractionations is integrated, you can entirely waited to ask
The final result of solution problem.
The present invention is different from text answering method, but with the natural language semantization based on template extraction and systematic searching
Method, the solution that class problem is answered in middle school geography letter is mainly directed towards, realizes the classification after question sentence splits and split and fill out groove step;Again
Automatically request-answering system based on template knowledge base is realized with the search method based on Lucene indexes, can provide and be asked than text
The answer of the more careful and accurate problem to be solved of the answer that the method for answering provides.
The present invention is different from being in place of many automatically request-answering systems at present:
1st, the present invention has gone question sentence and has split the judgement of the progressive relationship between sub- question sentence first, and the purpose so done exists
In the subproblem of more than one being included in many question sentences, if splitting without question sentence but directly being examined in knowledge base
Rope, obtained answer are inaccuracy and do not have orderliness certainly.The present invention has also been carried out in question sentence split process between subproblem
The judgement of progressive relationship, have very well in the case of answer for previous subproblem is as key message in the latter subproblem
Effect, answer is also apparent from clear and definite and orderly;
2nd, the present invention predefines the template of some adjustment types, and the purpose of these templates is to more fully understand
Problem, it is different classes of the problem of required key message differ, and these key messages be after key in retrieving
Word, if not carrying out so sufficiently understanding and choosing from question sentence keyword to question sentence, obtained answer also can be inaccurate
3rd, the present invention has carried out once extensive operation to the knowledge base built, due to having in the knowledge base of structure
Specific entity, these entities are not particularly suited for all question sentences, so during extensive with these entities residing for them
Some features that position should have replace these entities, can expand the use range of knowledge base, and retrieved when
Time can also be retrieved according to the feature of entity in question sentence as keyword, so as to increase the success rate of retrieval and retrieve
The accuracy of answer.
Claims (5)
- A kind of 1. automatic question-answering method based on systematic searching, it is characterized in that realizing that oneself of class problem is answered in letter by computer program Dynamic to answer, it is the complicated question for including multiple subproblems that class problem is answered in the letter, is comprised the following steps:Class is answered to a letter first Problem carries out Chinese natural language processing generation syntax tree, and the Chinese natural language processing includes participle, part-of-speech tagging and sentence Method is analyzed, and then answer crucial verb and conjunction in class problem according to letter is split as language by the complicated question comprising multiple subproblems Justice is more specific, only includes the simple question sentence of single problem;Question sentence template is preset, each template is provided with corresponding trigger word, and the question sentence template, which refers to, meets query sentence grammar Question sentence model, different trigger words correspond to different types of question sentence template, and template is provided with corresponding template groove, and template groove is used to insert Key message needed for the question sentence template;Using the trigger word corresponding to question sentence template, each simple question sentence is classified, and extract this from simple question sentence and ask Key message needed for sentence template, the part of speech definition of corresponding templates setting are inserted in template groove, formed comprising information needed of solving a problem Question sentence template;Then according to the type of question sentence template, in template knowledge base corresponding to slave phase, the crucial letter in template groove Keyword in breath corresponding templates knowledge base is retrieved, and obtains some candidate answers;Finally, chosen from candidate answers optimal Answer, as the answer of simple problem, then the answer of all simple problems is integrated, draws final result.
- A kind of 2. automatic question-answering method based on systematic searching according to claim 1, it is characterized in that obtaining simple question sentence Afterwards, the progressive relationship simple question sentence is analyzed, when carrying out question sentence fractionation according to verb and conjunction, according in question sentence Question sentence feature after referring to and splitting, judges whether there is progressive relationship between subproblem, progressive relationship refers to previous subproblem Answer can be as the key message of the latter subproblem;If there is progressive relationship, previous subproblem answer is being used as crucial Marked in the subproblem of information, the key message of previous subproblem answer is used when subsequently solving a problem.
- 3. a kind of automatic question-answering method based on systematic searching according to claim 1 or 2, simply asked it is characterized in that splitting Sentence comprises the following steps:1.1) first a letter is answered class problem question sentence is segmented by natural language processing technique, part-of-speech tagging, syntax point Analysis, obtains syntax tree;1.2) syntax tree obtained based on the step 1.1), judges to whether there is multiple crucial verbs in question sentence, if in the presence of, Question sentence is split into by some question sentences for including a crucial verb according to crucial verb, otherwise without being carried out according to crucial verb Question sentence is split;1.3) the only question sentence comprising crucial verb obtained based on the step 1.2), judged by syntax tree be in question sentence It is no the parallel construction that composition is connected by conjunction to be present, if in the presence of, parallel construction is taken out, and the company in parallel construction Word splits into some question sentences not comprising parallel construction, otherwise without carrying out question sentence fractionation according to conjunction;It can be incited somebody to action after this step Former question sentence is split into only comprising a crucial verb and in the absence of some simple question sentences of parallel construction.
- 4. a kind of automatic question-answering method based on systematic searching according to claim 1 or 2, it is characterized in that from template knowledge In storehouse retrieval obtain final result process it is as follows:2.1) template knowledge base is carried out extensive in advance, and specific entity therein is replaced with into the entity possessed key in this place Feature, and carry out answer retrieval using these features as keyword help;2.2) according to the semanteme of template groove in different type question sentence template, by corresponding templates in the template knowledge base of respective type Index is established on a certain row or a few row of groove;2.3) knowledge base of index is established based on the step 2.2), according to the question sentence template comprising information needed of solving a problem, In corresponding template knowledge base, using the key message in question sentence template as keyword, according to the semantic corresponding of template groove Index on retrieved, the order for obtaining some candidate answers and successively decreasing according to similarity arranges;2.4) candidate answers obtained based on the step 2.3), every time first three candidate answers conduct of selection similarity highest The answer of simple question sentence;2.5) answer of the simple question sentence obtained based on affiliated step 2.4), the simple question sentence that comprehensive all fractionations obtain are answered Case, it is the final result of whole problem to be solved.
- 5. a kind of automatically request-answering system based on systematic searching, computer program is stored with system and device, it is characterized in that described The method described in claim 1 or 2 is realized when computer program is performed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711107543.3A CN107885844A (en) | 2017-11-10 | 2017-11-10 | Automatic question-answering method and system based on systematic searching |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711107543.3A CN107885844A (en) | 2017-11-10 | 2017-11-10 | Automatic question-answering method and system based on systematic searching |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107885844A true CN107885844A (en) | 2018-04-06 |
Family
ID=61780193
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711107543.3A Pending CN107885844A (en) | 2017-11-10 | 2017-11-10 | Automatic question-answering method and system based on systematic searching |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107885844A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108920497A (en) * | 2018-05-23 | 2018-11-30 | 北京奇艺世纪科技有限公司 | A kind of man-machine interaction method and device |
CN109344177A (en) * | 2018-09-18 | 2019-02-15 | 图普科技(广州)有限公司 | A kind of model combination method and device |
CN109543020A (en) * | 2018-11-27 | 2019-03-29 | 科大讯飞股份有限公司 | Inquiry handles method and system |
CN109740077A (en) * | 2018-12-29 | 2019-05-10 | 北京百度网讯科技有限公司 | Answer searching method, device and its relevant device based on semantic indexing |
CN109739963A (en) * | 2018-12-27 | 2019-05-10 | 苏州龙信信息科技有限公司 | Information retrieval method, device, equipment and medium |
CN109766453A (en) * | 2019-01-18 | 2019-05-17 | 广东小天才科技有限公司 | A kind of method and system of user's corpus semantic understanding |
CN109783801A (en) * | 2018-12-14 | 2019-05-21 | 厦门快商通信息技术有限公司 | A kind of electronic device, multi-tag classification method and storage medium |
CN110147544A (en) * | 2018-05-24 | 2019-08-20 | 清华大学 | A kind of instruction generation method, device and relevant device based on natural language |
CN110727783A (en) * | 2019-10-23 | 2020-01-24 | 支付宝(杭州)信息技术有限公司 | Method and device for asking question of user based on dialog system |
CN110795548A (en) * | 2019-10-25 | 2020-02-14 | 招商局金融科技有限公司 | Intelligent question answering method, device and computer readable storage medium |
CN110837547A (en) * | 2019-10-16 | 2020-02-25 | 云知声智能科技股份有限公司 | Method and device for understanding multi-intention text in man-machine interaction |
CN111708874A (en) * | 2020-08-24 | 2020-09-25 | 湖南大学 | Man-machine interaction question-answering method and system based on intelligent complex intention recognition |
CN111831810A (en) * | 2020-07-23 | 2020-10-27 | 中国平安人寿保险股份有限公司 | Intelligent question and answer method, device, equipment and storage medium |
CN111858861A (en) * | 2019-04-28 | 2020-10-30 | 华为技术有限公司 | Question-answer interaction method based on picture book and electronic equipment |
CN113449117A (en) * | 2021-06-24 | 2021-09-28 | 武汉工程大学 | Bi-LSTM and Chinese knowledge graph-based composite question-answering method |
CN113886557A (en) * | 2021-12-07 | 2022-01-04 | 北京云迹科技有限公司 | Question answering method and device and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101013421A (en) * | 2007-02-02 | 2007-08-08 | 清华大学 | Rule-based automatic analysis method of Chinese basic block |
US20140163962A1 (en) * | 2012-12-10 | 2014-06-12 | International Business Machines Corporation | Deep analysis of natural language questions for question answering system |
CN105701253A (en) * | 2016-03-04 | 2016-06-22 | 南京大学 | Chinese natural language interrogative sentence semantization knowledge base automatic question-answering method |
CN107193798A (en) * | 2017-05-17 | 2017-09-22 | 南京大学 | A kind of examination question understanding method in rule-based examination question class automatically request-answering system |
-
2017
- 2017-11-10 CN CN201711107543.3A patent/CN107885844A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101013421A (en) * | 2007-02-02 | 2007-08-08 | 清华大学 | Rule-based automatic analysis method of Chinese basic block |
US20140163962A1 (en) * | 2012-12-10 | 2014-06-12 | International Business Machines Corporation | Deep analysis of natural language questions for question answering system |
CN105701253A (en) * | 2016-03-04 | 2016-06-22 | 南京大学 | Chinese natural language interrogative sentence semantization knowledge base automatic question-answering method |
CN107193798A (en) * | 2017-05-17 | 2017-09-22 | 南京大学 | A kind of examination question understanding method in rule-based examination question class automatically request-answering system |
Non-Patent Citations (2)
Title |
---|
GONG CHENG,ETC.: "Taking up the Gaokao Challenge: An Information Retrieval Approach", 《IGCAI》 * |
镇丽华等: "自动问答系统中问句分类研究综述", 《安徽工业大学学报(自然科学版)》 * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108920497A (en) * | 2018-05-23 | 2018-11-30 | 北京奇艺世纪科技有限公司 | A kind of man-machine interaction method and device |
CN110147544A (en) * | 2018-05-24 | 2019-08-20 | 清华大学 | A kind of instruction generation method, device and relevant device based on natural language |
CN109344177A (en) * | 2018-09-18 | 2019-02-15 | 图普科技(广州)有限公司 | A kind of model combination method and device |
CN109543020A (en) * | 2018-11-27 | 2019-03-29 | 科大讯飞股份有限公司 | Inquiry handles method and system |
CN109543020B (en) * | 2018-11-27 | 2022-11-04 | 科大讯飞股份有限公司 | Query processing method and system |
CN109783801A (en) * | 2018-12-14 | 2019-05-21 | 厦门快商通信息技术有限公司 | A kind of electronic device, multi-tag classification method and storage medium |
CN109783801B (en) * | 2018-12-14 | 2023-08-25 | 厦门快商通信息技术有限公司 | Electronic device, multi-label classification method and storage medium |
WO2020134684A1 (en) * | 2018-12-27 | 2020-07-02 | 苏州龙信信息科技有限公司 | Information retrieval method, apparatus, device and medium |
CN109739963A (en) * | 2018-12-27 | 2019-05-10 | 苏州龙信信息科技有限公司 | Information retrieval method, device, equipment and medium |
CN109740077A (en) * | 2018-12-29 | 2019-05-10 | 北京百度网讯科技有限公司 | Answer searching method, device and its relevant device based on semantic indexing |
CN109766453A (en) * | 2019-01-18 | 2019-05-17 | 广东小天才科技有限公司 | A kind of method and system of user's corpus semantic understanding |
WO2020221142A1 (en) * | 2019-04-28 | 2020-11-05 | 华为技术有限公司 | Picture book-based question and answer interaction method and electronic device |
CN111858861A (en) * | 2019-04-28 | 2020-10-30 | 华为技术有限公司 | Question-answer interaction method based on picture book and electronic equipment |
CN111858861B (en) * | 2019-04-28 | 2022-07-19 | 华为技术有限公司 | Question-answer interaction method based on picture book and electronic equipment |
CN110837547A (en) * | 2019-10-16 | 2020-02-25 | 云知声智能科技股份有限公司 | Method and device for understanding multi-intention text in man-machine interaction |
CN110727783A (en) * | 2019-10-23 | 2020-01-24 | 支付宝(杭州)信息技术有限公司 | Method and device for asking question of user based on dialog system |
CN110795548A (en) * | 2019-10-25 | 2020-02-14 | 招商局金融科技有限公司 | Intelligent question answering method, device and computer readable storage medium |
CN111831810A (en) * | 2020-07-23 | 2020-10-27 | 中国平安人寿保险股份有限公司 | Intelligent question and answer method, device, equipment and storage medium |
CN111831810B (en) * | 2020-07-23 | 2024-02-09 | 中国平安人寿保险股份有限公司 | Intelligent question-answering method, device, equipment and storage medium |
CN111708874A (en) * | 2020-08-24 | 2020-09-25 | 湖南大学 | Man-machine interaction question-answering method and system based on intelligent complex intention recognition |
CN113449117A (en) * | 2021-06-24 | 2021-09-28 | 武汉工程大学 | Bi-LSTM and Chinese knowledge graph-based composite question-answering method |
CN113449117B (en) * | 2021-06-24 | 2023-09-26 | 武汉工程大学 | Bi-LSTM and Chinese knowledge graph based compound question-answering method |
CN113886557A (en) * | 2021-12-07 | 2022-01-04 | 北京云迹科技有限公司 | Question answering method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107885844A (en) | Automatic question-answering method and system based on systematic searching | |
CN102262634B (en) | Automatic questioning and answering method and system | |
CN107193798B (en) | A kind of examination question understanding method in rule-based examination question class automatically request-answering system | |
CN110147436A (en) | A kind of mixing automatic question-answering method based on padagogical knowledge map and text | |
US20080126319A1 (en) | Automated short free-text scoring method and system | |
CN101599071A (en) | The extraction method of conversation text topic | |
CN108334493B (en) | Question knowledge point automatic extraction method based on neural network | |
CN103309926A (en) | Chinese and English-named entity identification method and system based on conditional random field (CRF) | |
CN104252533A (en) | Search method and search device | |
KR20050036541A (en) | Semi-automatic construction method for knowledge of encyclopedia question answering system | |
CN108509409A (en) | A method of automatically generating semantic similarity sentence sample | |
CN109614620B (en) | HowNet-based graph model word sense disambiguation method and system | |
WO2024011813A1 (en) | Text expansion method and apparatus, device, and medium | |
CN112052324A (en) | Intelligent question answering method and device and computer equipment | |
CN110164217A (en) | It a kind of online question and answer and reviews from surveying tutoring system | |
CN110147544A (en) | A kind of instruction generation method, device and relevant device based on natural language | |
CN104699758B (en) | The commanding document intelligent generating system and method for a kind of graphics and text library association | |
CN111553160A (en) | Method and system for obtaining answers to question sentences in legal field | |
CN111143531A (en) | Question-answer pair construction method, system, device and computer readable storage medium | |
CN106777080A (en) | Short abstraction generating method, database building method and interactive method | |
CN101576909A (en) | Mongolian digital knowledge base system construction method | |
CN114218379B (en) | Attribution method for question answering incapacity of intelligent question answering system | |
CN106250367B (en) | Method based on the improved Nivre algorithm building interdependent treebank of Vietnamese | |
CN110750632B (en) | Improved Chinese ALICE intelligent question-answering method and system | |
KR101506757B1 (en) | Method for the formation of an unambiguous model of a text in a natural language |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180406 |