CN107885844A - Automatic question-answering method and system based on systematic searching - Google Patents

Automatic question-answering method and system based on systematic searching Download PDF

Info

Publication number
CN107885844A
CN107885844A CN201711107543.3A CN201711107543A CN107885844A CN 107885844 A CN107885844 A CN 107885844A CN 201711107543 A CN201711107543 A CN 201711107543A CN 107885844 A CN107885844 A CN 107885844A
Authority
CN
China
Prior art keywords
question sentence
template
question
answer
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711107543.3A
Other languages
Chinese (zh)
Inventor
张昊
何硙卓
邵菲
程龚
瞿裕忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201711107543.3A priority Critical patent/CN107885844A/en
Publication of CN107885844A publication Critical patent/CN107885844A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Automatic question-answering method and system based on systematic searching, class problem is answered a letter first and carries out Chinese natural language processing generation syntax tree, and then the complicated question comprising multiple subproblems is split as semantic simple question sentence that is more specific, only including single problem by the crucial verb and conjunction in question sentence;Next each simple question sentence is classified using the trigger word corresponding to predefined question sentence type, and the key message needed for the type question sentence template is extracted from question sentence, it is corresponding to insert in template groove, form the question sentence template for including all information needed of solving a problem;Then according to the type of question sentence template, in template knowledge base corresponding to slave phase, the keyword in key message corresponding templates knowledge base in template groove is retrieved, and obtains some candidate answers;Finally, optimal answer is chosen from candidate answers, as the answer of simple problem, then the answer of all simple problems is integrated, draws final result.

Description

Automatic question-answering method and system based on systematic searching
Technical field
The invention belongs to field of computer technology, is related to natural language processing and automatic question answering technology, is based on dividing to be a kind of The automatic question-answering method and its system of class retrieval, it is particularly a kind of that class problem is answered towards middle school geography letter based on systematic searching Chinese natural language automatic question-answering method.
Background technology
Natural language processing (natural language processing) is the branch in artificial intelligence and linguistics field Subject, research can realize the various theoretical and methods for carrying out efficient communication between people and computer with natural language.Handle nature The key of language is computer to be allowed " understanding " natural language, so that computer can interact with people.Natural language processing Key technology includes Chinese word segmentation, part-of-speech tagging, syntactic analysis, name Entity recognition, reference resolution, dependency analysis etc..
Question answering system (question answering system), based on information retrieval technique and natural language processing, energy Answer the natural language problem of user automatically with accurate, succinct natural language.Question answering system can automatically analyze problem and to Go out corresponding candidate answers, traditional automatically request-answering system is mainly by module structures such as Research of Question Analysis, information retrieval and answer generations Into.
Automatic question answering is mainly what text-oriented set was carried out.Keyword first in problem analysis, then passes through key Word is retrieved in text library, some documents before certainty factor highest in acquisition returning result, then therefrom generates answer.
The content of the invention
The problem to be solved in the present invention is:How by the knowledge base that sets, it is automatic answer middle school geography letter and answer class ask Inscribe that this kind of specific area, question sentence are complex, problem can classify, different classes of have the problem of corresponding search method.
The technical scheme is that:A kind of automatic question-answering method based on systematic searching, is realized by computer program The automatic answer of class problem is answered in letter, and it is the complicated question for including multiple subproblems that class problem is answered in the letter, is comprised the following steps:It is first Class problem progress Chinese natural language processing generation syntax tree is first answered a letter, and the Chinese natural language processing includes dividing Word, part-of-speech tagging and syntactic analysis, then according to the simple crucial verb and conjunction answered in class problem by comprising multiple subproblems Complicated question is split as semantic simple question sentence that is more specific, only including single problem;
Question sentence template is preset, each template is provided with corresponding trigger word, and the question sentence template, which refers to, meets interrogative sentence language The question sentence model of method, different trigger words correspond to different types of question sentence template, and template is provided with corresponding template groove, and template groove is used for Insert the key message needed for the question sentence template;
Using the trigger word corresponding to question sentence template, each simple question sentence is classified, and is extracted from simple question sentence Key message needed for the question sentence template, the part of speech definition of corresponding templates setting are inserted in template groove, are formed required comprising solving a problem The question sentence template of information;Then according to the type of question sentence template, in template knowledge base corresponding to slave phase, according to the pass in template groove Keyword in key information corresponding templates knowledge base is retrieved, and obtains some candidate answers;Finally, chosen from candidate answers Optimal answer, as the answer of simple problem, then the answer of all simple problems is integrated, draws final result.
After obtaining simple question sentence, the progressive relationship simple question sentence is analyzed, asked according to verb and conjunction When sentence is split, the question sentence feature after reference and fractionation in question sentence, judge whether there is progressive relationship between subproblem, it is progressive The answer that relation refers to previous subproblem can be as the key message of the latter subproblem;If there is progressive relationship, using previous Individual sub- problem answers use previous subproblem answer when subsequently solving a problem as being marked in the subproblem of key message Key message.
Simple question sentence is split to comprise the following steps:
1.1) first a letter is answered class problem question sentence is segmented by natural language processing technique, part-of-speech tagging, sentence Method is analyzed, and obtains syntax tree;
1.2) syntax tree obtained based on the step 1.1), judges to whether there is multiple crucial verbs in question sentence, if depositing Question sentence is then being split into by some question sentences for including a crucial verb according to crucial verb, otherwise without dynamic according to key Word carries out question sentence fractionation;
1.3) question sentence for only including a crucial verb obtained based on the step 1.2), judges question sentence by syntax tree In whether there is the parallel construction being made up of conjunction connection, if in the presence of parallel construction being taken out, and according in parallel construction Conjunction split into some question sentences not comprising parallel construction, otherwise without according to conjunction carry out question sentence fractionation;After this step Former question sentence can be split into only comprising a crucial verb and in the absence of some simple question sentences of parallel construction.
From template knowledge base retrieval obtain final result process it is as follows:
2.1) template knowledge base is carried out extensive in advance, and it is possessed in this place that specific entity therein is replaced with into the entity Key feature, and carry out answer retrieval using these features as keyword help;
2.2), will be corresponding in the template knowledge base of respective type according to the semanteme of template groove in different type question sentence template Index is established on a certain row or a few row of template groove;
2.3) knowledge base of index is established based on the step 2.2), according to the question sentence mould comprising information needed of solving a problem Plate, in corresponding template knowledge base, using the key message in question sentence template as keyword, existed according to the semanteme of template groove Retrieved on corresponding index, the order for obtaining some candidate answers and successively decreasing according to similarity arranges;
2.4) candidate answers obtained based on the step 2.3), each first three candidate answers of selection similarity highest Answer as simple question sentence;
2.5) answer of the simple question sentence obtained based on affiliated step 2.4), the simple question sentence that comprehensive all fractionations obtain Answer, it is the final result of whole problem to be solved.
A kind of automatically request-answering system device based on systematic searching, it is stored thereon with computer program, the computer journey Foregoing method is realized when sequence is performed.
The present invention is mainly directed towards middle school geography letter and answers solving a problem for class problem, it is proposed that if a kind of be first split as complicated question Stem structure is clear, the simple question sentence of definite semantic;Then simple question sentence is classified, according to the type of question sentence, using corresponding Question sentence template, and from question sentence extract key message insert in template groove;Then the root from the template knowledge base of corresponding types Answer is retrieved according to key message in groove, finally selection and synthesis from candidate answers, forms the automatic question-answering method of final result, And realize an automatically request-answering system.
The beneficial effects of the invention are as follows:(1) complicated question is split as some comprising a crucial verb and do not included The simple question sentence of parallel construction so that the question sentence structure after fractionation is relatively sharp, semantic more clear and definite, helps to carry out problem Groove operation is filled out in semantic analysis and follow-up classification, and is judged in split process and be marked the progressive relationship between subproblem, from And improve the accuracy of answer.(2) type of question sentence and the question sentence template of respective type and template trigger word are predefined;Press Question sentence is classified according to template trigger word, retrieved in the knowledge base of respective type, is closed needed for different classes of template Key information is different, can more targetedly solve the problems, such as different type, so as to improve the accuracy of answer.(3) in Learn geographical letter and answer class problem, this method preferably can be split to question sentence, semantic analysis, classification and template fill out groove, can be with The question sentence for including multiple subproblems that progressive relationship be present is answered, answer step is clear and orderly, and it is more careful to obtain And accurate answer.
Brief description of the drawings
Fig. 1 is the disposed of in its entirety flow chart of the present invention.
Embodiment
The present invention by the way that complicated question is split as into some simple question sentences clear in structure, definite semantic, and to tearing open After simple question sentence after point is classified, answer is retrieved in the template knowledge base of respective classes, so as to answering middle school automatically Li Jian answers that this kind of specific area of class problem, question sentence are complex, problem can classify, different classes of have asking for corresponding search method Topic.
Illustrated below so that class problem is answered in middle school geography letter as an example, the inventive method comprises the following steps:First to one Individual middle school geography letter answers class problem and carries out Chinese natural language processing, realizes participle, part-of-speech tagging, syntactic analysis etc., generates sentence Method tree;Then the crucial verb and conjunction in sentence are split to question sentence, obtain some comprising a crucial verb and Simple question sentence not comprising parallel construction, and marked the coordination between question sentence;Next predefined question sentence type is used Corresponding template trigger word is classified to question sentence, determines the type of question sentence template, and category question sentence is extracted from question sentence Key message, insert in the groove of template, formed question sentence template;Then answer retrieval is carried out, according to the type of question sentence template, from In corresponding template knowledge base, the key message in template groove is retrieved in template knowledge base, obtains some times Select answer;Finally, most correct answer is chosen from candidate answers, the answer of all simple problems is integrated, drawn most Whole answer.
The method flow of the present invention is as shown in figure 1, including 3 parts:One middle school geography letter is answered in the progress of class problem Literary natural language processing, according to crucial verb and crucial conjunction, question sentence fractionation is carried out with reference to syntax tree, complicated question is split as The only simple question sentence comprising a crucial verb and not comprising composition arranged side by side;Then simple problem is divided according to trigger word Class, and extract the key message in question sentence and insert in the groove of question sentence template corresponding to each type, obtain question sentence template;Last root According to the type of question sentence template, candidate answers are retrieved in corresponding template knowledge base, are drawn most by the selection and synthesis of answer Whole answer.
Specific embodiment is respectively described below:
1. answering class problem for a middle school geography letter, question sentence is segmented by natural language processing technique, part of speech Mark, syntactic analysis, for the question sentence comprising multiple subproblems, it is necessary to crucial verb and conjunction in question sentence, by these If complicated question sentence is split as, stem structure is apparent, the semantic simple question sentence for more clearly, only including a single problem.
1.1) class problem first is answered to a middle school geography letter, question sentence segmented by natural language processing technique, word Property mark, syntactic analysis, obtain syntax tree;
1.2) syntax tree obtained based on the step 1.1), judges to whether there is multiple crucial verbs in question sentence, if depositing Question sentence is then being split into by some question sentences for including a crucial verb according to crucial verb, otherwise without dynamic according to key Word carries out question sentence fractionation;
1.3) question sentence for only including a crucial verb obtained based on the step 1.2), judges question sentence by syntax tree In whether there is the parallel construction being made up of conjunction connection, if in the presence of parallel construction being taken out, and according in parallel construction Conjunction split into some question sentences not comprising parallel construction, otherwise without according to conjunction carry out question sentence fractionation;After this step Former question sentence can be split into only comprising a crucial verb and in the absence of some simple question sentences of parallel construction.
Further, in question sentence split process, due to progressive relationship between some subproblems be present, i.e., previous subproblem Answer can be as the key message of the latter subproblem, so when carrying out question sentence fractionation according to verb and conjunction, it is necessary to root According to the question sentence feature after the reference and fractionation in question sentence, judge whether there is progressive relationship between subproblem, if so, needing using Previous subproblem answer in the subproblem of key message as marking.
Illustrate below by embodiment.
Class problem question sentence is answered for a middle school geography letter, possible itself is exactly a simple question sentence, such as:" analysis A The natural cause of reservoir is built on ground.", an only crucial verb " analysis " in the words, also it is not present and passes through conjunction connection Composition arranged side by side, so this question sentence is exactly a simple question sentence;In another example:" with illustrating A staple crops and its distribution, and explain State the physical geography condition of agricultural development.", two crucial verbs " explanation " and " elaboration " are contained in the words, also contains Conjunction " and " two compositions arranged side by side being connected, so the words is a typical complicated question.
Natural language processing is carried out to question sentence first, uses the NLP of Open-Source Tools bag, such as Stanford Univ USA Parser, the FudanNLP etc. of Chinese Fudan University, question sentence is segmented, part-of-speech tagging, syntactic analysis etc., form syntax Tree.
In this process, in order to improve the accuracy rate of participle, some and geographically relevant vocabulary is added, are mainly opened Some particular entity names, long physical name, place name and proper noun of source kit None- identified etc., it will increase income originally Chinese natural Issuable participle mistake is corrected during language processing tools Packet analyzing sentence, avoids the mistake during participle as far as possible By mistake.On the basis of above-mentioned participle, part-of-speech tagging and syntactic analysis are carried out, and generate the syntax tree of question sentence.
The first step that question sentence is split is that the crucial verb in question sentence is split.Especially, for middle school geography letter Class problem is answered, crucial verb is the enquirement verb in question sentence.Here enquirement verb refers to that middle school geography letter is answered to be made in class problem , for problem of representation requirement or answer degree, active word in be guided out question sentence content, place question sentence.Middle school The enquirement verb that Li Jian answers used in class problem has 15, contains most situations.Now it is listed below:
Infer, analyze, sketching, illustrating, summarizing, illustrating, enumerating, illustrating, concluding, pointing out, saying, describing, proposing, generally Include, judge.
Split according to crucial verb, obtain the question sentence for there was only an enquirement verb in every.Such as above-mentioned complex sentence The example of son can be first split as:" with illustrating A staple crops and its distribution." and " illustrate the manage bar naturally of agricultural development Part.”
The second step that question sentence is split is that the crucial conjunction in question sentence is split.According to syntax tree, find out in sentence Conjunction, it is determined that composition arranged side by side, retains public composition.Such as " with illustrating A staple crops and its distribution.", composition arranged side by side It is " A staple crops " and " it is distributed ", can be split as:" with illustrating A staple crops." and " with illustrating A main farming Thing is distributed.”.Here reference resolution is had been completed before fractionation, " its " is replaced with " A staple crops ".
Between some subproblems formed are split, progressive relationship be present.For example, " with illustrating A staple crops distribution.” In " A staple crops " refer to is exactly previous problem " explanation A ground staple crops." answer, so splitting Cheng Zhong, always according to the question sentence feature after the reference and fractionation in question sentence, the progressive relationship between subproblem is judged, and needing " # " is added in using the answer of previous problem as the question sentence of key message as mark, so the latter question sentence in example It is represented as " with illustrating #A staple crops distribution.”
Two step fractured operations more than completing, complicated question have just been split into simple question sentence, and simple question sentence is clear in structure, language It is adopted clearly, and judged the progressive relationship between question sentence be after solve a problem and provide a great help.
2. for simple question sentence, classified according to trigger word corresponding to predefined question sentence type, and extract in question sentence Key message carry out question sentence template fill out groove
Question sentence is classified first by trigger word corresponding to predefined template, that is, determines the template belonging to the question sentence Type.For the template of each type, there is corresponding template groove to record the key message needed for the template.From question sentence Key message needed for template is extracted, and semanteme is inserted successively as corresponding to template groove, is obtained comprising all information needed of solving a problem Question sentence template:
2.1) some question sentence templates are predefined for question sentence type, such issues that these templates contain solution required pass Key information, the groove of template correspond to the semanteme of these key messages, and each template has corresponding trigger word to assist in template Type;According to the trigger word corresponding to predefined question sentence template, the template type belonging to simple question sentence is judged;
2.2) based on obtained template type, for the template of each type, the groove in template has all corresponded to required pass The semanteme of key information.According to the participle of question sentence, part-of-speech tagging, syntactic analysis result, with reference to the language of the template groove of each type Justice, cutting is carried out to question sentence, if question sentence is cut into dry ingredients;
2.3) based on obtained question sentence cutting result, the key needed for template is chosen from some question sentence compositions after cutting Information, and inserted successively by the semanteme of template slot definition, obtain question sentence template.
Answered in middle school geography letter after class problem analyzed, according to the content and form of enquirement, predefine 14 Question sentence type, each question sentence type correspond to a question sentence template.14 question sentence types are as follows:
Problem class, countermeasuress class, influence class, advantage class, condition class, compare class, cause and effect class, transportation class, distribution Class, feature class, adjustment developing direction class, change class, meating and use class, entity information statement class.
Wherein, each template has specific trigger word, such as:The trigger word of countermeasuress class have " countermeasure ", " measure ", " good plan " etc., the trigger word of cause and effect class have " reason ", " factor ", " origin cause of formation ", " reason " etc..Especially, last a kind of entity letter Breath statement class, expression is not belonging to the problem of above any kind, without trigger word.
According to template trigger word, simple question sentence is assigned into corresponding classification first, it is determined that question sentence template used.Such as " general introduction Chongli is as the weather conditions for avenging main competition field of beginning a project." this question sentence, because comprising trigger word " condition ", it belongs to In the question sentence of condition class.Template corresponding to condition class question sentence is:" condition (main body, in terms of #, # is favourable/unfavorable/advantage,) ", mould " # " in plate represent the groove can be sky, "" it is used for problem of representation answer.
Key message progress template will be extracted after classification from question sentence and fills out groove operation.Mainly pass through the syntax of question sentence Tree construction carrys out cutting question sentence, obtains needing the key message inserted in template groove, is then inserted successively according to the semanteme of template groove. Such as " general introduction Chongli is as the weather conditions for avenging main competition field of beginning a project.", " general introduction " is to put question to verb, after enquirement verb always To " " part before word all should be as the main part of condition class question sentence template;Due to " " after word, " condition " it Preceding not occur words such as similar " favourable ", " unfavorable " or " advantage ", so the 3rd groove is sky, remainder " weather " is exactly The content of this groove of the aspect of the template.So " general introduction Chongli is as the weather conditions for avenging main competition field of beginning a project." corresponding to ask Sentence template be:" condition (Chongli is begun a project main competition field as snow, weather, #,)”.So, question sentence template is just contained in question sentence All information, and the key message that corresponding templates groove semanteme is inserted is exactly the understanding to question sentence.
3. using question sentence template, answer retrieval is carried out from corresponding template knowledge base, obtains candidate answers, and pass through choosing Take and draw final result with synthesis.
Corresponding each type of question sentence template, has the template knowledge base built, and knowledge base is carried out It is extensive so that the scope of application of knowledge base is bigger.The type of corresponding templates selectes corresponding knowledge base, and mould is used in knowledge base Information in board slot is retrieved as keyword, draws some candidate answers, and press similarity descending order.Finally select Go out some optimal answers and integrate the answer of all simple problems, obtain the answer of whole problem to be solved.
3.1) template knowledge base has been carried out extensive in advance, and specific entity therein is replaced with into the entity is had in this place Key feature, and using these features as keyword help carry out answer retrieval;
3.2) according to the semanteme of different type template bracket groove, by a certain of corresponding templates groove in the knowledge base of respective type Index is established on row or a few row;
3.3) knowledge base of index is established based on the step 3.2), according to the question sentence obtained in the step (3.3) Template, in corresponding template knowledge base, using the key message in question sentence template as keyword, according to the semanteme of template groove Retrieved on corresponding index, the order for obtaining some candidate answers and successively decreasing according to similarity arranges;
3.4) candidate answers obtained based on the step 3.3), each first three candidate answers of selection similarity highest Answer as simple problem;
3.5) answer of the simple problem obtained based on affiliated step 3.4), the simple question sentence that comprehensive all fractionations obtain Answer, it is the final result of whole problem to be solved.
The retrieval that a middle school geography knowledge base is used for answer has been got well in prior manual sorting.This knowledge base is according to question sentence mould The type of plate is divided into 13 classes, because entity information statement class question sentence is not belonging to preceding 13 class, so this kind of question sentence needs entirely knowing Know in storehouse and retrieved.
Due to some specific entities in template knowledge base be present, thus template knowledge base has been carried out it is extensive, by these Entity replaces with the entity possessed determinant attribute in this place, and using these attributes as keyword during retrieval, so as to increase Add the scope of application of template knowledge base, can also improve the accuracy of answer.
The knowledge base of corresponding different question sentence template types, the storage mode of knowledge is different, determines substantially according to question sentence template groove The semanteme of justice is divided into several columns storage.Such as:" condition (main body, in terms of #, # is favourable/unfavorable/advantage,) " this condition class mould Plate, the row in corresponding condition class template knowledge base are respectively:" entity, behavior, in terms of $, $ is favourable/unfavorable, condition ", here " " represent the row content can be sky.It can be seen that the storage form of template knowledge base is substantially close with the template of formulation.
Module is retrieved when answer and receives a question sentence template, first according to the type of question sentence template, finds corresponding mould Plate knowledge base.Index is established according to the content of template groove per class template knowledge base., will such as condition class template knowledge base " entity " and " behavior " is combined foundation index, for using " main body " to carry out text matches in condition class question sentence template;Together Sample also establishes the index of " aspect " and the row of " favourable/unfavorable " two successively, in condition class question sentence template using " for the use of " and " have Profit/unfavorable/advantage " carries out text matches.
For the knowledge entry matched, the value of last row is chosen as candidate answers, is successively decreased sequentially according to similarity Arranged, for follow-up answer selecting step.
The system that the present invention is realized is being established index and retrieved on index, uses the full text information based on Java Gopher bag Lucene.Lucene can be that the data of text type establish index, as long as the data that can index needs The text of format conversion, Lucene just can be indexed and search for document.Used in the system that the present invention is realized Effect is more satisfactory.
For the candidate answers of answer searching step, the system that the present invention is realized chooses first three in candidate answers every time Answer as simple problem.Finally, the answer of the simple problem after all fractionations is integrated, you can entirely waited to ask The final result of solution problem.
The present invention is different from text answering method, but with the natural language semantization based on template extraction and systematic searching Method, the solution that class problem is answered in middle school geography letter is mainly directed towards, realizes the classification after question sentence splits and split and fill out groove step;Again Automatically request-answering system based on template knowledge base is realized with the search method based on Lucene indexes, can provide and be asked than text The answer of the more careful and accurate problem to be solved of the answer that the method for answering provides.
The present invention is different from being in place of many automatically request-answering systems at present:
1st, the present invention has gone question sentence and has split the judgement of the progressive relationship between sub- question sentence first, and the purpose so done exists In the subproblem of more than one being included in many question sentences, if splitting without question sentence but directly being examined in knowledge base Rope, obtained answer are inaccuracy and do not have orderliness certainly.The present invention has also been carried out in question sentence split process between subproblem The judgement of progressive relationship, have very well in the case of answer for previous subproblem is as key message in the latter subproblem Effect, answer is also apparent from clear and definite and orderly;
2nd, the present invention predefines the template of some adjustment types, and the purpose of these templates is to more fully understand Problem, it is different classes of the problem of required key message differ, and these key messages be after key in retrieving Word, if not carrying out so sufficiently understanding and choosing from question sentence keyword to question sentence, obtained answer also can be inaccurate
3rd, the present invention has carried out once extensive operation to the knowledge base built, due to having in the knowledge base of structure Specific entity, these entities are not particularly suited for all question sentences, so during extensive with these entities residing for them Some features that position should have replace these entities, can expand the use range of knowledge base, and retrieved when Time can also be retrieved according to the feature of entity in question sentence as keyword, so as to increase the success rate of retrieval and retrieve The accuracy of answer.

Claims (5)

  1. A kind of 1. automatic question-answering method based on systematic searching, it is characterized in that realizing that oneself of class problem is answered in letter by computer program Dynamic to answer, it is the complicated question for including multiple subproblems that class problem is answered in the letter, is comprised the following steps:Class is answered to a letter first Problem carries out Chinese natural language processing generation syntax tree, and the Chinese natural language processing includes participle, part-of-speech tagging and sentence Method is analyzed, and then answer crucial verb and conjunction in class problem according to letter is split as language by the complicated question comprising multiple subproblems Justice is more specific, only includes the simple question sentence of single problem;
    Question sentence template is preset, each template is provided with corresponding trigger word, and the question sentence template, which refers to, meets query sentence grammar Question sentence model, different trigger words correspond to different types of question sentence template, and template is provided with corresponding template groove, and template groove is used to insert Key message needed for the question sentence template;
    Using the trigger word corresponding to question sentence template, each simple question sentence is classified, and extract this from simple question sentence and ask Key message needed for sentence template, the part of speech definition of corresponding templates setting are inserted in template groove, formed comprising information needed of solving a problem Question sentence template;Then according to the type of question sentence template, in template knowledge base corresponding to slave phase, the crucial letter in template groove Keyword in breath corresponding templates knowledge base is retrieved, and obtains some candidate answers;Finally, chosen from candidate answers optimal Answer, as the answer of simple problem, then the answer of all simple problems is integrated, draws final result.
  2. A kind of 2. automatic question-answering method based on systematic searching according to claim 1, it is characterized in that obtaining simple question sentence Afterwards, the progressive relationship simple question sentence is analyzed, when carrying out question sentence fractionation according to verb and conjunction, according in question sentence Question sentence feature after referring to and splitting, judges whether there is progressive relationship between subproblem, progressive relationship refers to previous subproblem Answer can be as the key message of the latter subproblem;If there is progressive relationship, previous subproblem answer is being used as crucial Marked in the subproblem of information, the key message of previous subproblem answer is used when subsequently solving a problem.
  3. 3. a kind of automatic question-answering method based on systematic searching according to claim 1 or 2, simply asked it is characterized in that splitting Sentence comprises the following steps:
    1.1) first a letter is answered class problem question sentence is segmented by natural language processing technique, part-of-speech tagging, syntax point Analysis, obtains syntax tree;
    1.2) syntax tree obtained based on the step 1.1), judges to whether there is multiple crucial verbs in question sentence, if in the presence of, Question sentence is split into by some question sentences for including a crucial verb according to crucial verb, otherwise without being carried out according to crucial verb Question sentence is split;
    1.3) the only question sentence comprising crucial verb obtained based on the step 1.2), judged by syntax tree be in question sentence It is no the parallel construction that composition is connected by conjunction to be present, if in the presence of, parallel construction is taken out, and the company in parallel construction Word splits into some question sentences not comprising parallel construction, otherwise without carrying out question sentence fractionation according to conjunction;It can be incited somebody to action after this step Former question sentence is split into only comprising a crucial verb and in the absence of some simple question sentences of parallel construction.
  4. 4. a kind of automatic question-answering method based on systematic searching according to claim 1 or 2, it is characterized in that from template knowledge In storehouse retrieval obtain final result process it is as follows:
    2.1) template knowledge base is carried out extensive in advance, and specific entity therein is replaced with into the entity possessed key in this place Feature, and carry out answer retrieval using these features as keyword help;
    2.2) according to the semanteme of template groove in different type question sentence template, by corresponding templates in the template knowledge base of respective type Index is established on a certain row or a few row of groove;
    2.3) knowledge base of index is established based on the step 2.2), according to the question sentence template comprising information needed of solving a problem, In corresponding template knowledge base, using the key message in question sentence template as keyword, according to the semantic corresponding of template groove Index on retrieved, the order for obtaining some candidate answers and successively decreasing according to similarity arranges;
    2.4) candidate answers obtained based on the step 2.3), every time first three candidate answers conduct of selection similarity highest The answer of simple question sentence;
    2.5) answer of the simple question sentence obtained based on affiliated step 2.4), the simple question sentence that comprehensive all fractionations obtain are answered Case, it is the final result of whole problem to be solved.
  5. 5. a kind of automatically request-answering system based on systematic searching, computer program is stored with system and device, it is characterized in that described The method described in claim 1 or 2 is realized when computer program is performed.
CN201711107543.3A 2017-11-10 2017-11-10 Automatic question-answering method and system based on systematic searching Pending CN107885844A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711107543.3A CN107885844A (en) 2017-11-10 2017-11-10 Automatic question-answering method and system based on systematic searching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711107543.3A CN107885844A (en) 2017-11-10 2017-11-10 Automatic question-answering method and system based on systematic searching

Publications (1)

Publication Number Publication Date
CN107885844A true CN107885844A (en) 2018-04-06

Family

ID=61780193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711107543.3A Pending CN107885844A (en) 2017-11-10 2017-11-10 Automatic question-answering method and system based on systematic searching

Country Status (1)

Country Link
CN (1) CN107885844A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920497A (en) * 2018-05-23 2018-11-30 北京奇艺世纪科技有限公司 A kind of man-machine interaction method and device
CN109344177A (en) * 2018-09-18 2019-02-15 图普科技(广州)有限公司 A kind of model combination method and device
CN109543020A (en) * 2018-11-27 2019-03-29 科大讯飞股份有限公司 Inquiry handles method and system
CN109740077A (en) * 2018-12-29 2019-05-10 北京百度网讯科技有限公司 Answer searching method, device and its relevant device based on semantic indexing
CN109739963A (en) * 2018-12-27 2019-05-10 苏州龙信信息科技有限公司 Information retrieval method, device, equipment and medium
CN109766453A (en) * 2019-01-18 2019-05-17 广东小天才科技有限公司 A kind of method and system of user's corpus semantic understanding
CN109783801A (en) * 2018-12-14 2019-05-21 厦门快商通信息技术有限公司 A kind of electronic device, multi-tag classification method and storage medium
CN110147544A (en) * 2018-05-24 2019-08-20 清华大学 A kind of instruction generation method, device and relevant device based on natural language
CN110727783A (en) * 2019-10-23 2020-01-24 支付宝(杭州)信息技术有限公司 Method and device for asking question of user based on dialog system
CN110795548A (en) * 2019-10-25 2020-02-14 招商局金融科技有限公司 Intelligent question answering method, device and computer readable storage medium
CN110837547A (en) * 2019-10-16 2020-02-25 云知声智能科技股份有限公司 Method and device for understanding multi-intention text in man-machine interaction
CN111708874A (en) * 2020-08-24 2020-09-25 湖南大学 Man-machine interaction question-answering method and system based on intelligent complex intention recognition
CN111831810A (en) * 2020-07-23 2020-10-27 中国平安人寿保险股份有限公司 Intelligent question and answer method, device, equipment and storage medium
CN111858861A (en) * 2019-04-28 2020-10-30 华为技术有限公司 Question-answer interaction method based on picture book and electronic equipment
CN113449117A (en) * 2021-06-24 2021-09-28 武汉工程大学 Bi-LSTM and Chinese knowledge graph-based composite question-answering method
CN113886557A (en) * 2021-12-07 2022-01-04 北京云迹科技有限公司 Question answering method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101013421A (en) * 2007-02-02 2007-08-08 清华大学 Rule-based automatic analysis method of Chinese basic block
US20140163962A1 (en) * 2012-12-10 2014-06-12 International Business Machines Corporation Deep analysis of natural language questions for question answering system
CN105701253A (en) * 2016-03-04 2016-06-22 南京大学 Chinese natural language interrogative sentence semantization knowledge base automatic question-answering method
CN107193798A (en) * 2017-05-17 2017-09-22 南京大学 A kind of examination question understanding method in rule-based examination question class automatically request-answering system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101013421A (en) * 2007-02-02 2007-08-08 清华大学 Rule-based automatic analysis method of Chinese basic block
US20140163962A1 (en) * 2012-12-10 2014-06-12 International Business Machines Corporation Deep analysis of natural language questions for question answering system
CN105701253A (en) * 2016-03-04 2016-06-22 南京大学 Chinese natural language interrogative sentence semantization knowledge base automatic question-answering method
CN107193798A (en) * 2017-05-17 2017-09-22 南京大学 A kind of examination question understanding method in rule-based examination question class automatically request-answering system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GONG CHENG,ETC.: "Taking up the Gaokao Challenge: An Information Retrieval Approach", 《IGCAI》 *
镇丽华等: "自动问答系统中问句分类研究综述", 《安徽工业大学学报(自然科学版)》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920497A (en) * 2018-05-23 2018-11-30 北京奇艺世纪科技有限公司 A kind of man-machine interaction method and device
CN110147544A (en) * 2018-05-24 2019-08-20 清华大学 A kind of instruction generation method, device and relevant device based on natural language
CN109344177A (en) * 2018-09-18 2019-02-15 图普科技(广州)有限公司 A kind of model combination method and device
CN109543020A (en) * 2018-11-27 2019-03-29 科大讯飞股份有限公司 Inquiry handles method and system
CN109543020B (en) * 2018-11-27 2022-11-04 科大讯飞股份有限公司 Query processing method and system
CN109783801A (en) * 2018-12-14 2019-05-21 厦门快商通信息技术有限公司 A kind of electronic device, multi-tag classification method and storage medium
CN109783801B (en) * 2018-12-14 2023-08-25 厦门快商通信息技术有限公司 Electronic device, multi-label classification method and storage medium
WO2020134684A1 (en) * 2018-12-27 2020-07-02 苏州龙信信息科技有限公司 Information retrieval method, apparatus, device and medium
CN109739963A (en) * 2018-12-27 2019-05-10 苏州龙信信息科技有限公司 Information retrieval method, device, equipment and medium
CN109740077A (en) * 2018-12-29 2019-05-10 北京百度网讯科技有限公司 Answer searching method, device and its relevant device based on semantic indexing
CN109766453A (en) * 2019-01-18 2019-05-17 广东小天才科技有限公司 A kind of method and system of user's corpus semantic understanding
WO2020221142A1 (en) * 2019-04-28 2020-11-05 华为技术有限公司 Picture book-based question and answer interaction method and electronic device
CN111858861A (en) * 2019-04-28 2020-10-30 华为技术有限公司 Question-answer interaction method based on picture book and electronic equipment
CN111858861B (en) * 2019-04-28 2022-07-19 华为技术有限公司 Question-answer interaction method based on picture book and electronic equipment
CN110837547A (en) * 2019-10-16 2020-02-25 云知声智能科技股份有限公司 Method and device for understanding multi-intention text in man-machine interaction
CN110727783A (en) * 2019-10-23 2020-01-24 支付宝(杭州)信息技术有限公司 Method and device for asking question of user based on dialog system
CN110795548A (en) * 2019-10-25 2020-02-14 招商局金融科技有限公司 Intelligent question answering method, device and computer readable storage medium
CN111831810A (en) * 2020-07-23 2020-10-27 中国平安人寿保险股份有限公司 Intelligent question and answer method, device, equipment and storage medium
CN111831810B (en) * 2020-07-23 2024-02-09 中国平安人寿保险股份有限公司 Intelligent question-answering method, device, equipment and storage medium
CN111708874A (en) * 2020-08-24 2020-09-25 湖南大学 Man-machine interaction question-answering method and system based on intelligent complex intention recognition
CN113449117A (en) * 2021-06-24 2021-09-28 武汉工程大学 Bi-LSTM and Chinese knowledge graph-based composite question-answering method
CN113449117B (en) * 2021-06-24 2023-09-26 武汉工程大学 Bi-LSTM and Chinese knowledge graph based compound question-answering method
CN113886557A (en) * 2021-12-07 2022-01-04 北京云迹科技有限公司 Question answering method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN107885844A (en) Automatic question-answering method and system based on systematic searching
CN102262634B (en) Automatic questioning and answering method and system
CN107193798B (en) A kind of examination question understanding method in rule-based examination question class automatically request-answering system
CN110147436A (en) A kind of mixing automatic question-answering method based on padagogical knowledge map and text
US20080126319A1 (en) Automated short free-text scoring method and system
CN101599071A (en) The extraction method of conversation text topic
CN108334493B (en) Question knowledge point automatic extraction method based on neural network
CN103309926A (en) Chinese and English-named entity identification method and system based on conditional random field (CRF)
CN104252533A (en) Search method and search device
KR20050036541A (en) Semi-automatic construction method for knowledge of encyclopedia question answering system
CN108509409A (en) A method of automatically generating semantic similarity sentence sample
CN109614620B (en) HowNet-based graph model word sense disambiguation method and system
WO2024011813A1 (en) Text expansion method and apparatus, device, and medium
CN112052324A (en) Intelligent question answering method and device and computer equipment
CN110164217A (en) It a kind of online question and answer and reviews from surveying tutoring system
CN110147544A (en) A kind of instruction generation method, device and relevant device based on natural language
CN104699758B (en) The commanding document intelligent generating system and method for a kind of graphics and text library association
CN111553160A (en) Method and system for obtaining answers to question sentences in legal field
CN111143531A (en) Question-answer pair construction method, system, device and computer readable storage medium
CN106777080A (en) Short abstraction generating method, database building method and interactive method
CN101576909A (en) Mongolian digital knowledge base system construction method
CN114218379B (en) Attribution method for question answering incapacity of intelligent question answering system
CN106250367B (en) Method based on the improved Nivre algorithm building interdependent treebank of Vietnamese
CN110750632B (en) Improved Chinese ALICE intelligent question-answering method and system
KR101506757B1 (en) Method for the formation of an unambiguous model of a text in a natural language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180406