CN105701253B - The knowledge base automatic question-answering method of Chinese natural language question semanteme - Google Patents

The knowledge base automatic question-answering method of Chinese natural language question semanteme Download PDF

Info

Publication number
CN105701253B
CN105701253B CN201610125710.6A CN201610125710A CN105701253B CN 105701253 B CN105701253 B CN 105701253B CN 201610125710 A CN201610125710 A CN 201610125710A CN 105701253 B CN105701253 B CN 105701253B
Authority
CN
China
Prior art keywords
question
question sentence
knowledge base
tree
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610125710.6A
Other languages
Chinese (zh)
Other versions
CN105701253A (en
Inventor
胡伟
姜成樾
程龚
瞿裕忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201610125710.6A priority Critical patent/CN105701253B/en
Publication of CN105701253A publication Critical patent/CN105701253A/en
Application granted granted Critical
Publication of CN105701253B publication Critical patent/CN105701253B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of knowledge base automatic question-answering methods of Chinese natural language question semanteme, the following steps are included: the fact that input to user type problem carries out Chinese natural language processing, realize participle, part-of-speech tagging, name Entity recognition and extension, generative semantics dependency tree;The constituents such as time, space, true main body, the true object in question sentence are obtained using extensive template and semantic analytic technique, then semantization processing is carried out, extract the relevant component attribute of all events and its value in question sentence, it is right to generate multiple " attributes-value ", wherein element to be answered is replaced with interrogative, forms complicated true triplet sets;Triple where part to be answered combines the KnowledgeBase-query that other relevant fact triples form the constraint of a conditional, and the match query based on similarity calculation is carried out into knowledge base, extracts from knowledge base as a result, obtaining final result.The present invention realizes the fast and accurately inquiry response to knowledge base.

Description

The knowledge base automatic question-answering method of Chinese natural language question semanteme
Technical field
The present invention relates to semantic net, natural language processing and automatic question answering technology more particularly to a kind of Chinese natural languages The knowledge base automatic question-answering method of question semanteme, specifically a kind of Chinese natural language question semanteme based on template extraction Knowledge base automatic question-answering method, it is especially a kind of by by Chinese natural language question sentence carry out template extraction semantization convert The knowledge base automatic question-answering method of fact-oriented type problem is realized at KnowledgeBase-query.
Background technique
Semantic net (Semantic Web) is an important development direction of WWW, be WWW on the representation of knowledge, Reasoning, exchange and multiplexing provide the foundation.Semantic net describes entity therein (entity) using one group " attribute-value ", Individually " attribute-value " is to can be expressed as < pi,vi>, wherein piIndicate some attribute, viIndicate piSome value.Entity It can be described as the set of one or more such values pair.Such as WWW inventor Tim Mr. Berners-Lee, name It is represented as<name in the DBpedia of semantic web data source, " Tim Berners-Lee ">.In general, a semantic network entity is retouched It states comprising " attribute-value " as tens of or even up to a hundred, and an attribute can also have multiple and different values.With Semantic net is fast-developing, and semantic network technology has different degrees of research and application in each different field.
Natural language processing (natural language processing) is to study the language that people interacts with a computer to ask One Men Xueke of topic.The key for handling natural language is computer to be allowed " understanding " natural language, the key of natural language processing Technology includes participle, part-of-speech tagging, name Entity recognition, reference resolution, syntax dependency parsing of nature sentence etc..
Question and answer technology (question answering) is a kind of advanced form of information retrieval technique, it can with it is accurate, Succinct natural language answers the natural language problem of user.Automatically request-answering system can automatically analyze problem and provide corresponding Candidate answers, traditional automatically request-answering system is mainly by module compositions such as case study, information retrieval and answer generations.
Traditional automatic question answering is mainly what text-oriented set carried out, will be crucial including the keyword in problem analysis Word is submitted to search engine, and relevant documentation is retrieved from text library, and acquisition returns the result the highest preceding several documents of middle certainty factor, Answer is therefrom generated again.But with the development of semantic network technology with gradually popularize, knowledge mapping (knowledge graph), The higher structural knowledge library of the information tissue degree such as data (linked data) is linked to rise, for example, DBpedia and Freebase makes it possible new-type based on the realization automatic question answering of structural knowledge library.
The collection of document of considerable scale by semanteme after parsing, using representation of knowledge form (the common table of structuring It is shown as the triple structure of " entity-attribute-value "), form the knowledge base comprising a large amount of triples.In this knowledge base On the basis of carry out automatic question answering, it is highly efficient, more accurate than traditional text based automatic question answering.It is looked into if user is able to use Knowledge base is putd question in inquiry, undoubtedly precisely can rapidly obtain answer.But in practice automatic question answering technology, greatly Certain customers can not realize the question formulation of this " profession ", and the form that often only will use Human Natural Language carries out It puts question to, therefore the knowledge base question and answer based on natural language question sentence have important value.Automatic question answering process in knowledge based library In, after user inputs Chinese natural language question sentence, conventional method only takes question sentence simple process to obtain keyword, and generation is looked into It is not high to ask structuring degree, can not achieve the inquiry to knowledge base data precise and high efficiency.
Summary of the invention
Knowledge base (hereinafter referred to as " knowledge base ") of the present invention towards triple structure proposes a kind of by by user The Chinese natural language question sentence of input carries out the semantization based on template extraction, and Chinese natural language question sentence is converted into structuring The automatic question-answering method of the type problem of the fact that towards knowledge base is realized in inquiry.
True type problem can be divided into simple fact type problem and complicated true type problem.Simple fact, i.e., in knowledge base In be directly expressed as the form of single triple, such as " French capital is Paris " is a simple fact, the table in knowledge base It is shown as<" France ", " capital ", " Paris ">.And the complicated fact is retouched usually in natural sentence comprising having time or point adverbial It states, has further included participant's main body or object and true relevant behavior act, such as " nineteen fifty, Alan scheme spirit in Britain University of Manchester proposes turing test ", for another example " Nobel died in 1896 ", have in knowledge base increasingly complex Representation, similar blank node (blank node), is discussed further below, and such complicated true sentence often has in gio signal In class text.The present invention is by taking complicated true type problem as an example, but method is applied equally to simple fact type problem.
The purpose of the present invention is: during the automatic question answering of knowledge base, use semantic net and natural language processing technique Structuring conversion is carried out to Chinese natural language question sentence, to realize the fast and accurately inquiry response to knowledge base.
The technical scheme is that user inputs a true type problem, inquire part fact content therein (such as Any content such as inquiry true relevant time, place or master, object), first using natural language processing tool to question sentence into Row analysis processing, extracts corresponding keyword, reuses the extensive template and semantic analytic technique obtained based on statistical learning Identify the compositions such as time and space (containing at least one) in question sentence, true main body and true object (containing at least one) Element, part to be answered are replaced with interrogative, form complicated true triplet sets.Triple joint where part to be answered Other relevant fact triples form the KnowledgeBase-query of conditional constraint, are carried out into knowledge base based on similarity calculation Match query, extract ingredient to be answered from the highest candidate result of similarity, obtain final result.
The knowledge base automatic question-answering method of Chinese natural language question semanteme the following steps are included:
1. user inputs a true type problem, pass through the participle of natural language processing, part-of-speech tagging, name Entity recognition Etc. technologies extract the keyword in question sentence and based on referring to that entity is extended altogether, natural language question sentence is converted into the semanteme that band marks Dependency tree;
It is node template including dependency tree, interdependent 2. obtain one group of question matching template based on fairly large statistical learning The structure canonical template (may have different structure canonical templates for different problems type) and intermediate result template of tree, By the matching of question sentence and template, realizes all kinds of part of speech identifications, question sentence trunk contents extraction, finally obtain and can be used for constructing inquiry The intermediate result of triplet sets;
3. use typical space-time restriction class fact type question template, extract in question sentence true " time ", " place ", The components such as " true main body ", " true object ", " true behavior act " realize the semantization to intermediate result, Jin Ersheng At multiple " entity-attribute-value " tuples.KnowledgeBase-query, the set are carried out to the true triplet sets of obtained complexity The KnowledgeBase-query for having other triple constraints can be considered as, in query process of the reality to knowledge base, into Match query of the row based on similarity calculation, extracts element value to be answered from similarity soprano, and replacement interrogative is raw The final result asked at question sentence.
The beneficial effects of the present invention are: (1) defines one group of extensive template based on statistical learning, Chinese can be suitable for The parsing of natural language question sentence and Match of elemental composition, mark sentence element to the maximum extent.(2) at using semantic net and natural language Technical treatment fact type Chinese natural language question sentence is managed, the structuring semantic model an of clear logic is constructed for question sentence, than The dependency tree content obtained after single natural language processing is more fine specific, it is easier to the understanding and processing of machine.(3) it is based on The semantic model of template extraction and true type question sentence, obtains the KnowledgeBase-query of conditional constraint, is easier to look into knowledge base To accurate answer.
Detailed description of the invention
Fig. 1 is disposed of in its entirety flow chart of the invention;
Fig. 2 is the semantization model for the space-time restriction class fact type problem that the present invention defines.
Specific embodiment
The knowledge base automatic question answering for the Chinese natural language question semanteme based on template extraction that the invention discloses a kind of Method, comprising the following steps: the fact that input first to user type problem carries out Chinese natural language processing, realizes participle, word Property mark, name Entity recognition and extension, generative semantics dependency tree;Next using the extensive template obtained based on statistical learning The constituents such as time, space, true main body, the true object in question sentence are obtained with semantic analytic technique, are then carried out semantic The relevant component attribute of all events and its value in question sentence are extracted in change processing, generate multiple " attributes-value " it is right, In element to be answered replaced with interrogative, form complicated true triplet sets;Finally, triple connection where part to be answered The KnowledgeBase-query that other relevant fact triples form the constraint of a conditional is closed, is carried out based on similarity into knowledge base The match query of calculation is extracted from knowledge base as a result, obtaining final result.
Entire flow of the invention is as shown in Figure 1, include 3 parts: during the fact that inputted according to user type problem carries out Literary natural language processing realizes keyword extraction and refers to that extension obtains semantic dependent tree altogether, uses thing according to obtained dependency tree The one group of template first defined to semantic dependent tree matched to obtain more detailed part-of-speech tagging, trunk contents extraction and in Between result generate, finally using space-time restriction class fact type problem semantic model structural belt constraint structuralized query pair Knowledge base carries out the match query based on similarity calculation, obtains query result and therefrom extracts answer.
Specific embodiment is respectively described below:
1. the fact that inputted according to user type problem carries out, keyword extraction is realized in Chinese natural language processing and finger expands altogether Exhibition obtains semantic dependent tree
For the true type problem of a Chinese of input, natural language processing is carried out to question sentence first, uses Open-Source Tools Packet (such as FudanNLP of NLP Parser of Stanford Univ USA, Chinese Fudan University) segments question sentence, part of speech Mark, name Entity recognition and keyword abstraction.
In this process, in order to which the accuracy rate for improving keyword abstraction adds after Open-Source Tools distich subprocessing Enter some entity vocabularys (including extracting from the special noun vocabulary of urtext database documents quotation marks content, being derived from Chinese The noun entry vocabulary of Wikipedia, noun list, name vocabulary etc.) secondary verification is carried out to sentence, script is increased income Chinese (mainly Open-Source Tools packet is more unrecognized for issuable cutting mistake when natural language processing tool Packet analyzing sentence Particular entity name, long physical name, name, place name etc.) it solves, the accuracy of participle is improved as far as possible.
On the basis of above-mentioned participle, the semantic dependent tree of question sentence is generated.
After extracting the keyword of question sentence, it is contemplated that it not necessarily include completely the same word in target text library, so It carries out corresponding refer to altogether to these keywords to extend, mainly the synonym of keyword/near synonym extension.Addition extracts from Chinese The synonym table of Wikipedia, the synonym of word woods and some manual sortings, near synonym vocabulary content.
2. being matched to obtain more to dependency tree with one group of template of predefined according to obtained semantic dependent tree For the generation of specific part-of-speech tagging, trunk contents extraction and intermediate result
One group of question matching template obtained based on fairly large statistical learning of predefined, the node mould including dependency tree Plate, the structure canonical template of dependency tree and intermediate result express form template, and the matching by question sentence for template is realized each The identification of class part-of-speech tagging, question sentence trunk contents extraction, finally obtain the structuring triplet sets that can be used for inquiring.
It is as follows that this group of template is applied into the matching process on the dependency tree of question sentence:
(1) information that burl point template can parse all interdependent nodes in question sentence (plays the work for strengthening semantic tagger With specifying interrogative pronoun, name substantive noun, a variety of sentence ingredients such as predicate).Regular expression of all categories is defined, is used for Strengthen identification such as name, place name, time, physical name.Further according to name entity involved in above-mentioned natural language processing process The mark of identification and extension vocabulary, marks interdependent tree node classification in detail.
Combining above-mentioned (including regular expression and extension vocabulary etc. the side of all categories that can be used for marking type of word Method) after method, the structure and content of each tree node are stored with following tree node template:
Burl point template is used to accurately identify the node for meeting specified criteria, and system carries out natural language processing for problem Later, during traversing to dependency tree interior joint, realize that the reinforcing to each node content marks, more with one kind Detailed mode illustrates the type of the word content of each node.
On this basis, the content of each node can be categorizedly grouped under the classification of burl point template, as The matched basis of second step tree construction canonical template.
(2) the node path matching syntax tree path that tree construction canonical template can parse question sentence dependency tree obtains effectively Question sentence structure extracts the most useful content, generally question sentence trunk content and crucial qualifier.For true type problem In template extraction process, first step node template, which matches, can parse the noun content of time, place name, and the time, it is dotted Language does not generate other influences to true sentence structure trunk, selects to match it in the interdependent tree construction modulus of regularity plate of progress here Before, make proper treatment, extract time, place noun, and removes time that may be present, the preposition in point adverbial (such as " ", " in " etc.).
Specifically, according to the path of the root node of syntax tree to leaf node, the canonical template of Lai Dingyi tree construction is used for Canonical matches the path of syntax tree, extracts useful field.In general, the question sentence structure node of question and answer type of the same race has its general character, Such as often with the time with preposition or point adverbial and Subjective and Objective behavior act in typical true type problem, in node solution It is with uniformity in analysis and structure extraction.This feature allows burl point template to have certain generalization ability, it can passes through one (i.e. certain similar sentence patterns or similar theme question sentence are with the same or similar for a tree node template matching one kind general character node Tree construction canonical template).
It will set first and carry out path by starting point of root node, and obtain a series of root nodes to the interdependent road of leaf node Diameter.These route matchings use the form similar to regular expression.It is different from the place of regular expression, regular expressions The ordinary item of formula is all character match, and the ordinary item of tree construction canonical template is all burl point template in system, such a Template can be matched with same characteristic features but the generation path of the different tree of node content.
Canonical template supports canonical operation to have: connection (" ab "), side by side (" a | b " or " [ab] "), Kleene repeats (greedy Mode " a* " and non-greedy mode " a*? "), it is common repeat (greedy mode " a+ " and non-greedy mode " a+? "), it is optional (" a? ") And location matches (starting position " ^ " and end position " $ ").
The task of template is to identify specific minor structure and extract useful part from these minor structures, it would be desirable to be able to The node of the specific position of compatible portion is enough extracted, therefore supports the anonymous capture group based on bracket, capture group content uses whole Type serial number accesses.Therefore, matching result can easily pass through " canonical template name capture group # " (capture group, The subexpression matching content of regular expression facilitates reference with digital number, with " (sequence of " appearance is successively compiled in expression formula Number, in general, 0 indicates entire expression formula) access obtains.
In addition, each tree can generate several paths, after the completion of all route matchings, need to be formed tree construction. Since different paths can share a part of node, when route matching result is integrated, therefore, to assure that same node The result matched is also identical, i.e., the corresponding node under Different matching path will be aligned.So the canonical template of each tree construction is all Added " CONSTRAINTS " field, the node to constrain matching result between different paths is aligned, together as above, Matching result is obtained by " canonical template name captures group # ".The field only needs to express corresponding node matching content etc. Or differ, therefore be expressed as " (=canonical template name capture group # ...) " or " (!=canonical template name capture group is compiled Number ...) ".
According to described above, the problem of identical solution classification or the problem of similar clause, has the same or similar tree construction just Then template, therefore corresponding applicable extensive template can be defined according to actual needs during practical problem parsing.Due to The complexity features of Chinese language expression, the number of such structure of transvers plate is still relatively more (to be suitable for different Chinese and expresses sentence The template of formula).
Herein for typical space-time restriction class fact type problem, the formwork style of definition is provided.Dependency tree matching template An example flow it is as follows:
Example: " nineteen fifty, Alan scheme spirit where proposes turing test? "
It is segmented according to natural language, the question semanteme dependency tree result tentatively obtained are as follows:
" Alan schemes spirit ", " turing test " are to assert that the character is contacted by the name entity of entity vocabulary identifying processing Continue indivisible.
It is noted here that extract with solve time for not influencing of part or point adverbial part after, root that template obtains Set path be " proposition → Alan scheme spirit ", " propose → → where " and " proposition → turing test ".Template matching mistake later Journey is as follows:
More than, as interdependent tree node, the definition of canonical structure template and a space-time restriction class fact type question sentence show Example tree node, canonical structure template process of analysis.
Solution for space-time restriction class fact type problem solves different true elements, can be with other correspondences of like configurations Template, replacement solve interrogative pronoun, remaining basic format content of template is almost consistent.
(3) intermediate result that intermediate result expression form template is used to indicate to obtain after two above template extraction, is just The question and answer solution of beginning question sentence.Based on intermediate result, then the space-time restriction class fact type semantization model of predefined is used, Corresponding entity relationship triple is generated, can be used in the structuralized query of next step.
Such as " what French capital is? " the triple that the intermediate result that question sentence obtains generates is < " France ", " first All ", what >;" nineteen fifty, Alan scheme where spirit proposes turing test? " intermediate result generate triplet sets be Q: {<Q, " time ", " nineteen fifty ">,<Q, " place ", where>,<Q, " main body ", " Alan schemes spirit ">,<Q, " object ", " figure spirit is surveyed Examination ">,<" Alan schemes spirit ", " proposition ", " turing test ">}.
3. the semantization model using space-time restriction class fact type problem arranges intermediate result, the knot of structural belt constraint Structureization inquiry carries out the match query based on similarity calculation to knowledge base, obtains query result and therefrom extracts answer
In general, a complicated fact can be parsed out multiple components, most Expressive Features is true relevant Time, place, the behavior act that true relevant main body, object and main object are made.Space-time restriction class Fact Model Time noun, the place noun obtained according to dependency tree node template, main body that canonical structure template obtains, object, action row For (main body → object), accurate extract in sentence includes that generation true { time, place, main body, object, behavior act } is multiple Component.Partial Elements are expressed as empty (NULL) if without value, and the triple comprising null value can generate or not as needed It generates.
Using semantic network technology, the fact represented by sentence is described as blank node in semantic network technology.So-called blank Node indicates that the node of the URI mark of a specific, concrete can not be used.Under this situation, blank node indicate one it is true It states (statement), none specific value itself can describe it, but can be belonged to time, place, Subjective and Objective etc. Property and its value come expand describe its extension.
After each component and its value for extracting event statements, using component as attribute, element value is Specific literal, " entity-attribute-value " triples multiple to each sentence generation respectively.Specifically, triple indicates shape Formula T=<s, p, o>, s indicates that the subject of the triple description content, p are predicate, and o is object.The event expressed with event statements Q is center subject, then whole event (time, place, main body, object, behavior act) is represented by
Q:{Tt,Tl,Ts,To,Tact, Tt=<Q, time, tValue>, Tl=<Q, location, lValue>, Ts=< Q, Subject, sValue >, To=<Q, object, oValue>, Tact=<sValue, actValue, oValue>.
Content expressed by above-mentioned triplet sets can be presented with a kind of visual means, (note: work as the fact as shown in Figure 2 When there was only true main body in sentence, object merges with main body, and object value merges with main body value, and behavior act is formed from ring). Distinguishingly, in a true sentence, such as without object situation, such as " Nobel died in 1896 ", then object and main body are closed And behavior act forms main body from ring.
KnowledgeBase-query matching is carried out to obtained question sentence triplet sets.The set can be considered as one with other three The ternary group polling of tuple constraint, for question sentence fact triplet sets, the triple to be solved answered where element Value is replaced with interrogative.In query process of the reality to knowledge base, ignore the literal and knowledge base of true central node Q The literal similarity of the similar true central node of middle fact description scheme, to other words in each triple in addition to Q Face amount be variate-value member carry out similarity calculation (i.e. the fixed attributes such as time, place, main body, object name must be matched strictly, Attribute value then carries out similarity calculation matching), synonym extension (is added with famous Jaro-Winkler character range formula Vocabulary, it is believed that synonymous Word similarity is that 1) measurement (each includes in the triple of first similarity calculation of multiple literal variables The Similarity-Weighted for carrying out internal member again is average, and having the member of n literal variable, then similarity calculation weight is 1/n, such as Tact The similarity calculation weight of each literal variable elements is 1/3) to obtain each group of fact component triple in triple Similarity, obtaining 5 similarity values is { St,Sl,Ss,So,Sact}。
For question sentence triplet sets and each candidate triplet sets, the similarity weight of each triple is enabled to be {Wt,Wl,Ws,Wo,Wact, it is set here for judging whether two true triplet sets express same facts, therein group It is of equal value at the effect of judgement caused by element, therefore is all 1/5=0.2 to its 5 weight equivalent valuations, but also remain A possibility that being adjusted flexibly according to the actual situation.Then the final similarity S for answering sentence for the question sentence fact and a candidate is calculated I.e. are as follows:
S=WtSt+WlSl+WsSs+WoSo+WactSact.
Distinguishingly, it is possible to occur there was only time element or ground point element first, and only true main body situation, because This is enabled again:
Wt+Wl=0.4, Ws+Wo=0.4.
And under this special case, value is that empty element similarity weight is 0, which is not included in similarity Calculating process.
Based on above-mentioned calculating formula of similarity, each candidate thing in question sentence fact triplet sets and knowledge base is calculated After the final similarity of real triplet sets, descending arrangement is carried out to it, taking similarity value the maximum is the three of the problem that is best suitable for Tuple-set (multiple similarities and the very approximate situation of highest similarity if it exists, i.e., difference is less than 0.05, then it is assumed that they All it is qualified), part to be answered therefrom is extracted, the query in former question sentence triple to be solved is replaced with corresponding content Word, (by being then based on similarity calculation, therefore final result not necessarily complies fully with reality for the final result that can as provide The fact, because may be without relevant information knowledge in knowledge base).
The present invention is different from text answering method, but in the Chinese natural language semantization method based on template extraction, For true type problem, the conversion of Chinese natural language question sentence to structuralized query is realized, and based on similarity calculation The mode of match query realizes the automatic question answering in knowledge based library, and it is more fine-grained wait ask can to provide sentence more whole than text answers Solve the answer extracting of part.

Claims (1)

1. a kind of knowledge base automatic question-answering method of Chinese natural language question semanteme, which comprises the following steps:
(1.1) for a true type Chinese natural language problem of input, by the keyword abstraction in problem and based on finger altogether Entity is extended, generative semantics dependency tree;
(1.2) semantic dependent tree obtained based on the step (1.1), uses the dependency tree node template and base of predefined In the interdependent tree construction modulus of regularity plate that statistical learning obtains, a variety of true elements for including in question sentence and its value are extracted, is used Intermediate result template generation intermediate result;The step (1.2) the following steps are included:
(2.1) using the semantic dependent tree obtained based on the step (1.1), reinforcing again is carried out with dependency tree node template Match, mark out the specific part of speech classification of interdependent tree node, generates markup information;
(2.2) markup information obtained based on the step (2.1) carries out the trunk of question sentence using interdependent tree construction modulus of regularity plate It extracts, the node path matching syntax tree path of parsing question sentence dependency tree obtains effective question sentence structure, extracts the most useful ask Sentence trunk content and crucial qualifier;
(2.3) the question sentence trunk content obtained based on the step (2.2), using intermediate result template generation intermediate result, in Between result indicate content be question sentence solve solution, be the base for being subsequently generated the KnowledgeBase-query based on triplet sets Plinth;
(1.3) it is based on mentioning in the step (1.2) dependency tree via dependency tree node template and interdependent tree construction modulus of regularity plate Relevant element value and intermediate result are obtained, with space-time restriction class Fact Model by the element value and intermediate result language Adopted metaplasia forms the KnowledgeBase-query based on similarity calculation, answer is extracted from knowledge base at triplet sets;The step (1.3) the following steps are included:
(3.1) attribute for defining space-time restriction class Fact Model is the { behavior of time, place, main body, object, main object Movement }, the interdependent tree node and question sentence trunk content obtained by the step (2.1) and (2.2) determines that question sentence is true related Time, place, main body, the value of object and behavior act, including wait answer element;
(3.2) obtained value is extracted based on the step (3.1), with Subject, Predicate and Object triple<s, p, o>form generate each thing Real part point, wherein the value to be answered in the triple of element to be answered is replaced with interrogative, each question sentence can be expressed as one A question sentence fact triplet sets;
(3.3) the question sentence fact triplet sets obtained based on the step (3.2), are organized into the knowledge of a with constraint conditions Library inquiry;
(3.4) KnowledgeBase-query obtained based on the step (3.3), subgraph match is carried out into knowledge base, in knowledge base The similar candidate triplet sets of each description scheme carry out the accurate matching and element property value of each element property name Similarity calculation, then element weights are weighted and averaged to obtain each true triplet sets for the fact that define according to semantization model Final similarity, by similarity height sort, therefrom extract part to be answered, as final result.
CN201610125710.6A 2016-03-04 2016-03-04 The knowledge base automatic question-answering method of Chinese natural language question semanteme Active CN105701253B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610125710.6A CN105701253B (en) 2016-03-04 2016-03-04 The knowledge base automatic question-answering method of Chinese natural language question semanteme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610125710.6A CN105701253B (en) 2016-03-04 2016-03-04 The knowledge base automatic question-answering method of Chinese natural language question semanteme

Publications (2)

Publication Number Publication Date
CN105701253A CN105701253A (en) 2016-06-22
CN105701253B true CN105701253B (en) 2019-03-26

Family

ID=56220835

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610125710.6A Active CN105701253B (en) 2016-03-04 2016-03-04 The knowledge base automatic question-answering method of Chinese natural language question semanteme

Country Status (1)

Country Link
CN (1) CN105701253B (en)

Families Citing this family (96)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180025121A1 (en) * 2016-07-20 2018-01-25 Baidu Usa Llc Systems and methods for finer-grained medical entity extraction
CN106339366B (en) * 2016-08-08 2019-05-31 北京百度网讯科技有限公司 The method and apparatus of demand identification based on artificial intelligence
CN106295187A (en) * 2016-08-11 2017-01-04 中国科学院计算技术研究所 Construction of knowledge base method and system towards intelligent clinical auxiliary decision-making support system
CN106503194A (en) * 2016-11-02 2017-03-15 大唐软件技术股份有限公司 Information getting method and device
CN106844335A (en) * 2016-12-21 2017-06-13 海航生态科技集团有限公司 Natural language processing method and device
CN106815745A (en) * 2016-12-30 2017-06-09 北京三快在线科技有限公司 Vegetable recommends method and system
CN106919655B (en) * 2017-01-24 2020-05-19 网易(杭州)网络有限公司 Answer providing method and device
CN108446286B (en) * 2017-02-16 2023-04-25 阿里巴巴集团控股有限公司 Method, device and server for generating natural language question answers
CN107169013B (en) * 2017-03-31 2018-01-19 北京三快在线科技有限公司 A kind of processing method and processing device of dish information
CN106897273B (en) * 2017-04-12 2018-02-06 福州大学 A kind of network security dynamic early-warning method of knowledge based collection of illustrative plates
CN107239481B (en) * 2017-04-12 2021-03-12 北京大学 Knowledge base construction method for multi-source network encyclopedia
CN107247613A (en) * 2017-04-25 2017-10-13 北京航天飞行控制中心 Sentence analytic method and sentence resolver
CN107256226B (en) * 2017-04-28 2018-10-30 北京神州泰岳软件股份有限公司 A kind of construction method and device of knowledge base
CN107193798B (en) * 2017-05-17 2019-06-04 南京大学 A kind of examination question understanding method in rule-based examination question class automatically request-answering system
CN107239450B (en) * 2017-06-02 2021-11-23 上海对岸信息科技有限公司 Method for processing natural language based on interactive context
CN107423437B (en) * 2017-08-04 2020-09-01 逸途(北京)科技有限公司 Question-answer model optimization method based on confrontation network reinforcement learning
CN107423439B (en) * 2017-08-04 2021-03-02 识因智能科技(北京)有限公司 Chinese problem mapping method based on LDA
CN107748757B (en) * 2017-09-21 2021-05-07 北京航空航天大学 Question-answering method based on knowledge graph
CN109684354A (en) * 2017-10-18 2019-04-26 北京国双科技有限公司 Data query method and apparatus
CN107818148A (en) * 2017-10-23 2018-03-20 南京南瑞集团公司 Self-service query and statistical analysis method based on natural language processing
CN107885844A (en) * 2017-11-10 2018-04-06 南京大学 Automatic question-answering method and system based on systematic searching
CN107895037B (en) * 2017-11-28 2022-05-03 北京百度网讯科技有限公司 Question and answer data processing method, device, equipment and computer readable medium
CN108052577B (en) * 2017-12-08 2022-06-14 北京百度网讯科技有限公司 Universal text content mining method, device, server and storage medium
CN108108426B (en) * 2017-12-15 2021-05-07 杭州汇数智通科技有限公司 Understanding method and device for natural language question and electronic equipment
CN110020015A (en) * 2017-12-29 2019-07-16 中国科学院声学研究所 A kind of conversational system answers generation method and system
CN108287822B (en) * 2018-01-23 2022-03-01 北京容联易通信息技术有限公司 Chinese similarity problem generation system and method
CN109344385B (en) * 2018-01-30 2020-12-22 深圳壹账通智能科技有限公司 Natural language processing method, device, computer equipment and storage medium
CN108376287A (en) * 2018-03-02 2018-08-07 复旦大学 Multi-valued attribute segmenting device based on CN-DBpedia and method
CN108491378B (en) * 2018-03-08 2021-11-09 国网福建省电力有限公司 Intelligent response system for operation and maintenance of electric power information
CN110362662A (en) * 2018-04-09 2019-10-22 北京京东尚科信息技术有限公司 Data processing method, device and computer readable storage medium
CN108549694B (en) * 2018-04-16 2021-11-23 南京云问网络技术有限公司 Method for processing time information in text
CN108549710B (en) * 2018-04-20 2023-06-27 腾讯科技(深圳)有限公司 Intelligent question-answering method, device, storage medium and equipment
CN108932278B (en) * 2018-04-28 2021-05-18 厦门快商通信息技术有限公司 Man-machine conversation method and system based on semantic framework
CN108595696A (en) * 2018-05-09 2018-09-28 长沙学院 A kind of human-computer interaction intelligent answering method and system based on cloud platform
CN108733359B (en) * 2018-06-14 2020-12-25 北京航空航天大学 Automatic generation method of software program
CN108984527A (en) * 2018-07-10 2018-12-11 广州极天信息技术股份有限公司 A kind of method for recognizing semantics and device based on concept
CN110852110B (en) * 2018-07-25 2023-08-04 富士通株式会社 Target sentence extraction method, question generation method, and information processing apparatus
CN110851560B (en) * 2018-07-27 2023-03-10 杭州海康威视数字技术股份有限公司 Information retrieval method, device and equipment
CN110858100B (en) * 2018-08-22 2023-10-20 北京搜狗科技发展有限公司 Method and device for generating association candidate words
CN109344236B (en) * 2018-09-07 2020-09-04 暨南大学 Problem similarity calculation method based on multiple characteristics
CN109408811B (en) * 2018-09-29 2021-10-22 联想(北京)有限公司 Data processing method and server
CN110990541A (en) * 2018-09-30 2020-04-10 北京国双科技有限公司 Method and device for realizing question answering
CN109613917A (en) * 2018-11-02 2019-04-12 广州城市职业学院 A kind of question and answer robot and its implementation
CN109522418B (en) * 2018-11-08 2020-05-12 杭州费尔斯通科技有限公司 Semi-automatic knowledge graph construction method
CN111241841B (en) * 2018-11-13 2024-04-05 第四范式(北京)技术有限公司 Semantic analysis method and device, computing device and readable medium
CN111210824B (en) * 2018-11-21 2023-04-07 深圳绿米联创科技有限公司 Voice information processing method and device, electronic equipment and storage medium
CN109753541A (en) * 2018-12-10 2019-05-14 北京明略软件系统有限公司 A kind of relational network construction method and device, computer readable storage medium
CN109684448B (en) * 2018-12-17 2021-01-12 北京北大软件工程股份有限公司 Intelligent question and answer method
CN109766994A (en) * 2018-12-25 2019-05-17 华东师范大学 A kind of neural network framework of natural language inference
CN111400458A (en) * 2018-12-27 2020-07-10 上海智臻智能网络科技股份有限公司 Automatic generalization method and device
CN109710939B (en) * 2018-12-28 2023-06-09 北京百度网讯科技有限公司 Method and device for determining theme
CN109902087B (en) * 2019-02-02 2023-05-30 上海来也伯特网络科技有限公司 Data processing method and device for questions and answers and server
CN109947914B (en) 2019-02-21 2023-08-18 扬州大学 Automatic software defect question-answering method based on template
WO2020178626A1 (en) * 2019-03-01 2020-09-10 Cuddle Artificial Intelligence Private Limited Systems and methods for adaptive question answering
CN109949637B (en) * 2019-03-13 2021-07-16 广东小天才科技有限公司 Automatic answering method and device for objective questions
CN110147436B (en) * 2019-03-18 2021-02-26 清华大学 Education knowledge map and text-based hybrid automatic question-answering method
CN109977370B (en) * 2019-03-19 2023-06-16 河海大学常州校区 Automatic question-answer pair construction method based on document structure tree
CN110019687B (en) * 2019-04-11 2021-03-23 宁波深擎信息科技有限公司 Multi-intention recognition system, method, equipment and medium based on knowledge graph
CN109977421A (en) * 2019-04-15 2019-07-05 南京邮电大学 A kind of Knowledge Base of Programming subjects answering system after class
CN110096580B (en) * 2019-04-24 2022-05-24 北京百度网讯科技有限公司 FAQ conversation method and device and electronic equipment
CN111858861B (en) * 2019-04-28 2022-07-19 华为技术有限公司 Question-answer interaction method based on picture book and electronic equipment
CN111858866A (en) * 2019-04-30 2020-10-30 广东小天才科技有限公司 Semantic analysis method and device based on triples
CN110334179B (en) * 2019-05-22 2020-12-29 深圳追一科技有限公司 Question-answer processing method, device, computer equipment and storage medium
CN110347808A (en) * 2019-05-28 2019-10-18 成都美美臣科技有限公司 One e-commerce website intelligent robot customer service construction method
CN110532358B (en) * 2019-07-05 2023-08-22 东南大学 Knowledge base question-answering oriented template automatic generation method
CN110321544B (en) * 2019-07-08 2023-07-25 北京百度网讯科技有限公司 Method and device for generating information
CN110349477B (en) * 2019-07-16 2022-01-07 长沙酷得网络科技有限公司 Programming error repairing method, system and server based on historical learning behaviors
CN110427471B (en) * 2019-07-26 2022-10-18 四川长虹电器股份有限公司 Natural language question-answering method and system based on knowledge graph
CN110532366A (en) * 2019-09-03 2019-12-03 出门问问(武汉)信息科技有限公司 A kind of pattern rule management method, language generation method, apparatus and storage equipment
CN110852067A (en) * 2019-10-10 2020-02-28 杭州量之智能科技有限公司 Question analysis method for non-entity word dependency extraction based on SVM
CN110727780A (en) * 2019-10-17 2020-01-24 福建天晴数码有限公司 System and method for automatically expanding acquaintance text
CN110727782A (en) * 2019-10-22 2020-01-24 苏州思必驰信息科技有限公司 Question and answer corpus generation method and system
CN111125150B (en) * 2019-12-26 2023-12-26 成都航天科工大数据研究院有限公司 Search method for industrial field question-answering system
CN111159345B (en) * 2019-12-27 2023-09-05 中国矿业大学 Chinese knowledge base answer acquisition method and device
CN111339269B (en) * 2020-02-20 2023-09-26 来康科技有限责任公司 Knowledge graph question-answering training and application service system capable of automatically generating templates
CN111382256B (en) * 2020-03-20 2024-04-09 北京百度网讯科技有限公司 Information recommendation method and device
CN111553160B (en) * 2020-04-24 2024-02-02 北京北大软件工程股份有限公司 Method and system for obtaining question answers in legal field
CN111651569B (en) * 2020-04-24 2022-04-08 中国电力科学研究院有限公司 Knowledge base question-answering method and system in electric power field
CN111625623B (en) * 2020-04-29 2023-09-08 奇安信科技集团股份有限公司 Text theme extraction method, text theme extraction device, computer equipment, medium and program product
CN111708800A (en) * 2020-05-27 2020-09-25 北京百度网讯科技有限公司 Query method and device and electronic equipment
CN111782781A (en) * 2020-05-29 2020-10-16 平安科技(深圳)有限公司 Semantic analysis method and device, computer equipment and storage medium
CN111709250B (en) * 2020-06-11 2022-05-06 北京百度网讯科技有限公司 Method, apparatus, electronic device, and storage medium for information processing
CN111459973B (en) * 2020-06-16 2020-10-23 四川大学 Case type retrieval method and system based on case situation triple information
CN111949781B (en) * 2020-08-06 2021-11-19 贝壳找房(北京)科技有限公司 Intelligent interaction method and device based on natural sentence syntactic analysis
CN112256847B (en) * 2020-09-30 2023-04-07 昆明理工大学 Knowledge base question-answering method integrating fact texts
CN112287080B (en) * 2020-10-23 2023-10-03 平安科技(深圳)有限公司 Method and device for rewriting problem statement, computer device and storage medium
CN112380848B (en) * 2020-11-19 2022-04-26 平安科技(深圳)有限公司 Text generation method, device, equipment and storage medium
CN112417170B (en) * 2020-11-23 2023-11-14 南京大学 Relationship linking method for incomplete knowledge graph
CN112182230B (en) * 2020-11-27 2021-03-16 北京健康有益科技有限公司 Text data classification method and device based on deep learning
CN112733547A (en) * 2020-12-28 2021-04-30 北京计算机技术及应用研究所 Chinese question semantic understanding method by utilizing semantic dependency analysis
CN112906559B (en) * 2021-02-10 2022-03-18 网易有道信息技术(北京)有限公司 Machine-implemented method for correcting formulas and related product
CN113590782B (en) * 2021-07-28 2024-02-09 北京百度网讯科技有限公司 Training method of reasoning model, reasoning method and device
CN113761940B (en) * 2021-09-09 2023-08-11 杭州隆埠科技有限公司 News main body judging method, equipment and computer readable medium
CN114357123B (en) * 2022-03-18 2022-06-10 北京创新乐知网络技术有限公司 Data matching method, device and equipment based on hierarchical structure and storage medium
CN115080742B (en) * 2022-06-24 2023-09-05 北京百度网讯科技有限公司 Text information extraction method, apparatus, device, storage medium, and program product
CN117332097B (en) * 2023-11-30 2024-03-01 北京大数据先进技术研究院 Knowledge question-answering method, device and product based on space-time semantic constraint

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001082125A1 (en) * 2000-04-25 2001-11-01 Invention Machine Corporation, Inc. Creation of tree-based and customized industry-oriented knowledge base
CN101373532A (en) * 2008-07-10 2009-02-25 昆明理工大学 FAQ Chinese request-answering system implementing method in tourism field
CN101799802A (en) * 2009-02-05 2010-08-11 日电(中国)有限公司 Method and system for extracting entity relationship by using structural information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001082125A1 (en) * 2000-04-25 2001-11-01 Invention Machine Corporation, Inc. Creation of tree-based and customized industry-oriented knowledge base
CN101373532A (en) * 2008-07-10 2009-02-25 昆明理工大学 FAQ Chinese request-answering system implementing method in tourism field
CN101799802A (en) * 2009-02-05 2010-08-11 日电(中国)有限公司 Method and system for extracting entity relationship by using structural information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Proposed architectural model for optimal transformation of decision table and decision tree into knowledge base;M Shuaib Qureshi.etc;《Indian Journal of Science & Technology》;20100131;第362-364页
导游对话系统的相关技术研究;李静静;《中国优秀硕士学位论文全文数据库 信息科技辑》;20150315;第48-58页

Also Published As

Publication number Publication date
CN105701253A (en) 2016-06-22

Similar Documents

Publication Publication Date Title
CN105701253B (en) The knowledge base automatic question-answering method of Chinese natural language question semanteme
US10853357B2 (en) Extensible automatic query language generator for semantic data
CN107038229A (en) A kind of use-case extracting method based on natural semantic analysis
Athreya et al. Template-based question answering using recursive neural networks
Abdelnabi et al. Generating UML class diagram using NLP techniques and heuristic rules
Shekarpour et al. Question answering on linked data: Challenges and future directions
CN109840255A (en) Reply document creation method, device, equipment and storage medium
Steinmetz et al. From natural language questions to SPARQL queries: a pattern-based approach
Ghosh et al. Automated generation of er diagram from a given text in natural language
Lopez et al. QuerioDALI: question answering over dynamic and linked knowledge graphs
Cabrio et al. QALD-3: Multilingual Question Answering over Linked Data.
Li et al. Neural factoid geospatial question answering
Banerjee et al. Dblp-quad: A question answering dataset over the dblp scholarly knowledge graph
CN109857458A (en) The method for transformation of the flattening of AltaRica 3.0 based on ANTLR
Di Buono Information extraction for ontology population tasks. An application to the Italian archaeological domain
Nguyen et al. Systematic knowledge acquisition for question analysis
Bai et al. RDF snippets for Semantic Web search engines
Dileep et al. Template-based question answering analysis on the LC-QuAD2. 0 dataset
Tang et al. Ontology-based semantic retrieval for education management systems
Banerjee et al. Natural language querying and visualization system
Li et al. Automatic answer ranking based on sememe vector in KBQA
Yongyuth et al. The AGROVOC Concept Server Workbench: A collaborative tool for managing multilingual knowledge
Shen et al. OMReasoner: Combination of multi-matchers for ontology matching: Results for OAEI 2014
Hong et al. Extracting Web query interfaces based on form structures and semantic similarity
Seidel et al. KESeDa: knowledge extraction from heterogeneous semi-structured data sources

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant