CN105701253B - The knowledge base automatic question-answering method of Chinese natural language question semanteme - Google Patents
The knowledge base automatic question-answering method of Chinese natural language question semanteme Download PDFInfo
- Publication number
- CN105701253B CN105701253B CN201610125710.6A CN201610125710A CN105701253B CN 105701253 B CN105701253 B CN 105701253B CN 201610125710 A CN201610125710 A CN 201610125710A CN 105701253 B CN105701253 B CN 105701253B
- Authority
- CN
- China
- Prior art keywords
- question
- question sentence
- knowledge base
- tree
- template
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3335—Syntactic pre-processing, e.g. stopword elimination, stemming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of knowledge base automatic question-answering methods of Chinese natural language question semanteme, the following steps are included: the fact that input to user type problem carries out Chinese natural language processing, realize participle, part-of-speech tagging, name Entity recognition and extension, generative semantics dependency tree;The constituents such as time, space, true main body, the true object in question sentence are obtained using extensive template and semantic analytic technique, then semantization processing is carried out, extract the relevant component attribute of all events and its value in question sentence, it is right to generate multiple " attributes-value ", wherein element to be answered is replaced with interrogative, forms complicated true triplet sets;Triple where part to be answered combines the KnowledgeBase-query that other relevant fact triples form the constraint of a conditional, and the match query based on similarity calculation is carried out into knowledge base, extracts from knowledge base as a result, obtaining final result.The present invention realizes the fast and accurately inquiry response to knowledge base.
Description
Technical field
The present invention relates to semantic net, natural language processing and automatic question answering technology more particularly to a kind of Chinese natural languages
The knowledge base automatic question-answering method of question semanteme, specifically a kind of Chinese natural language question semanteme based on template extraction
Knowledge base automatic question-answering method, it is especially a kind of by by Chinese natural language question sentence carry out template extraction semantization convert
The knowledge base automatic question-answering method of fact-oriented type problem is realized at KnowledgeBase-query.
Background technique
Semantic net (Semantic Web) is an important development direction of WWW, be WWW on the representation of knowledge,
Reasoning, exchange and multiplexing provide the foundation.Semantic net describes entity therein (entity) using one group " attribute-value ",
Individually " attribute-value " is to can be expressed as < pi,vi>, wherein piIndicate some attribute, viIndicate piSome value.Entity
It can be described as the set of one or more such values pair.Such as WWW inventor Tim Mr. Berners-Lee, name
It is represented as<name in the DBpedia of semantic web data source, " Tim Berners-Lee ">.In general, a semantic network entity is retouched
It states comprising " attribute-value " as tens of or even up to a hundred, and an attribute can also have multiple and different values.With
Semantic net is fast-developing, and semantic network technology has different degrees of research and application in each different field.
Natural language processing (natural language processing) is to study the language that people interacts with a computer to ask
One Men Xueke of topic.The key for handling natural language is computer to be allowed " understanding " natural language, the key of natural language processing
Technology includes participle, part-of-speech tagging, name Entity recognition, reference resolution, syntax dependency parsing of nature sentence etc..
Question and answer technology (question answering) is a kind of advanced form of information retrieval technique, it can with it is accurate,
Succinct natural language answers the natural language problem of user.Automatically request-answering system can automatically analyze problem and provide corresponding
Candidate answers, traditional automatically request-answering system is mainly by module compositions such as case study, information retrieval and answer generations.
Traditional automatic question answering is mainly what text-oriented set carried out, will be crucial including the keyword in problem analysis
Word is submitted to search engine, and relevant documentation is retrieved from text library, and acquisition returns the result the highest preceding several documents of middle certainty factor,
Answer is therefrom generated again.But with the development of semantic network technology with gradually popularize, knowledge mapping (knowledge graph),
The higher structural knowledge library of the information tissue degree such as data (linked data) is linked to rise, for example, DBpedia and
Freebase makes it possible new-type based on the realization automatic question answering of structural knowledge library.
The collection of document of considerable scale by semanteme after parsing, using representation of knowledge form (the common table of structuring
It is shown as the triple structure of " entity-attribute-value "), form the knowledge base comprising a large amount of triples.In this knowledge base
On the basis of carry out automatic question answering, it is highly efficient, more accurate than traditional text based automatic question answering.It is looked into if user is able to use
Knowledge base is putd question in inquiry, undoubtedly precisely can rapidly obtain answer.But in practice automatic question answering technology, greatly
Certain customers can not realize the question formulation of this " profession ", and the form that often only will use Human Natural Language carries out
It puts question to, therefore the knowledge base question and answer based on natural language question sentence have important value.Automatic question answering process in knowledge based library
In, after user inputs Chinese natural language question sentence, conventional method only takes question sentence simple process to obtain keyword, and generation is looked into
It is not high to ask structuring degree, can not achieve the inquiry to knowledge base data precise and high efficiency.
Summary of the invention
Knowledge base (hereinafter referred to as " knowledge base ") of the present invention towards triple structure proposes a kind of by by user
The Chinese natural language question sentence of input carries out the semantization based on template extraction, and Chinese natural language question sentence is converted into structuring
The automatic question-answering method of the type problem of the fact that towards knowledge base is realized in inquiry.
True type problem can be divided into simple fact type problem and complicated true type problem.Simple fact, i.e., in knowledge base
In be directly expressed as the form of single triple, such as " French capital is Paris " is a simple fact, the table in knowledge base
It is shown as<" France ", " capital ", " Paris ">.And the complicated fact is retouched usually in natural sentence comprising having time or point adverbial
It states, has further included participant's main body or object and true relevant behavior act, such as " nineteen fifty, Alan scheme spirit in Britain
University of Manchester proposes turing test ", for another example " Nobel died in 1896 ", have in knowledge base increasingly complex
Representation, similar blank node (blank node), is discussed further below, and such complicated true sentence often has in gio signal
In class text.The present invention is by taking complicated true type problem as an example, but method is applied equally to simple fact type problem.
The purpose of the present invention is: during the automatic question answering of knowledge base, use semantic net and natural language processing technique
Structuring conversion is carried out to Chinese natural language question sentence, to realize the fast and accurately inquiry response to knowledge base.
The technical scheme is that user inputs a true type problem, inquire part fact content therein (such as
Any content such as inquiry true relevant time, place or master, object), first using natural language processing tool to question sentence into
Row analysis processing, extracts corresponding keyword, reuses the extensive template and semantic analytic technique obtained based on statistical learning
Identify the compositions such as time and space (containing at least one) in question sentence, true main body and true object (containing at least one)
Element, part to be answered are replaced with interrogative, form complicated true triplet sets.Triple joint where part to be answered
Other relevant fact triples form the KnowledgeBase-query of conditional constraint, are carried out into knowledge base based on similarity calculation
Match query, extract ingredient to be answered from the highest candidate result of similarity, obtain final result.
The knowledge base automatic question-answering method of Chinese natural language question semanteme the following steps are included:
1. user inputs a true type problem, pass through the participle of natural language processing, part-of-speech tagging, name Entity recognition
Etc. technologies extract the keyword in question sentence and based on referring to that entity is extended altogether, natural language question sentence is converted into the semanteme that band marks
Dependency tree;
It is node template including dependency tree, interdependent 2. obtain one group of question matching template based on fairly large statistical learning
The structure canonical template (may have different structure canonical templates for different problems type) and intermediate result template of tree,
By the matching of question sentence and template, realizes all kinds of part of speech identifications, question sentence trunk contents extraction, finally obtain and can be used for constructing inquiry
The intermediate result of triplet sets;
3. use typical space-time restriction class fact type question template, extract in question sentence true " time ", " place ",
The components such as " true main body ", " true object ", " true behavior act " realize the semantization to intermediate result, Jin Ersheng
At multiple " entity-attribute-value " tuples.KnowledgeBase-query, the set are carried out to the true triplet sets of obtained complexity
The KnowledgeBase-query for having other triple constraints can be considered as, in query process of the reality to knowledge base, into
Match query of the row based on similarity calculation, extracts element value to be answered from similarity soprano, and replacement interrogative is raw
The final result asked at question sentence.
The beneficial effects of the present invention are: (1) defines one group of extensive template based on statistical learning, Chinese can be suitable for
The parsing of natural language question sentence and Match of elemental composition, mark sentence element to the maximum extent.(2) at using semantic net and natural language
Technical treatment fact type Chinese natural language question sentence is managed, the structuring semantic model an of clear logic is constructed for question sentence, than
The dependency tree content obtained after single natural language processing is more fine specific, it is easier to the understanding and processing of machine.(3) it is based on
The semantic model of template extraction and true type question sentence, obtains the KnowledgeBase-query of conditional constraint, is easier to look into knowledge base
To accurate answer.
Detailed description of the invention
Fig. 1 is disposed of in its entirety flow chart of the invention;
Fig. 2 is the semantization model for the space-time restriction class fact type problem that the present invention defines.
Specific embodiment
The knowledge base automatic question answering for the Chinese natural language question semanteme based on template extraction that the invention discloses a kind of
Method, comprising the following steps: the fact that input first to user type problem carries out Chinese natural language processing, realizes participle, word
Property mark, name Entity recognition and extension, generative semantics dependency tree;Next using the extensive template obtained based on statistical learning
The constituents such as time, space, true main body, the true object in question sentence are obtained with semantic analytic technique, are then carried out semantic
The relevant component attribute of all events and its value in question sentence are extracted in change processing, generate multiple " attributes-value " it is right,
In element to be answered replaced with interrogative, form complicated true triplet sets;Finally, triple connection where part to be answered
The KnowledgeBase-query that other relevant fact triples form the constraint of a conditional is closed, is carried out based on similarity into knowledge base
The match query of calculation is extracted from knowledge base as a result, obtaining final result.
Entire flow of the invention is as shown in Figure 1, include 3 parts: during the fact that inputted according to user type problem carries out
Literary natural language processing realizes keyword extraction and refers to that extension obtains semantic dependent tree altogether, uses thing according to obtained dependency tree
The one group of template first defined to semantic dependent tree matched to obtain more detailed part-of-speech tagging, trunk contents extraction and in
Between result generate, finally using space-time restriction class fact type problem semantic model structural belt constraint structuralized query pair
Knowledge base carries out the match query based on similarity calculation, obtains query result and therefrom extracts answer.
Specific embodiment is respectively described below:
1. the fact that inputted according to user type problem carries out, keyword extraction is realized in Chinese natural language processing and finger expands altogether
Exhibition obtains semantic dependent tree
For the true type problem of a Chinese of input, natural language processing is carried out to question sentence first, uses Open-Source Tools
Packet (such as FudanNLP of NLP Parser of Stanford Univ USA, Chinese Fudan University) segments question sentence, part of speech
Mark, name Entity recognition and keyword abstraction.
In this process, in order to which the accuracy rate for improving keyword abstraction adds after Open-Source Tools distich subprocessing
Enter some entity vocabularys (including extracting from the special noun vocabulary of urtext database documents quotation marks content, being derived from Chinese
The noun entry vocabulary of Wikipedia, noun list, name vocabulary etc.) secondary verification is carried out to sentence, script is increased income Chinese
(mainly Open-Source Tools packet is more unrecognized for issuable cutting mistake when natural language processing tool Packet analyzing sentence
Particular entity name, long physical name, name, place name etc.) it solves, the accuracy of participle is improved as far as possible.
On the basis of above-mentioned participle, the semantic dependent tree of question sentence is generated.
After extracting the keyword of question sentence, it is contemplated that it not necessarily include completely the same word in target text library, so
It carries out corresponding refer to altogether to these keywords to extend, mainly the synonym of keyword/near synonym extension.Addition extracts from Chinese
The synonym table of Wikipedia, the synonym of word woods and some manual sortings, near synonym vocabulary content.
2. being matched to obtain more to dependency tree with one group of template of predefined according to obtained semantic dependent tree
For the generation of specific part-of-speech tagging, trunk contents extraction and intermediate result
One group of question matching template obtained based on fairly large statistical learning of predefined, the node mould including dependency tree
Plate, the structure canonical template of dependency tree and intermediate result express form template, and the matching by question sentence for template is realized each
The identification of class part-of-speech tagging, question sentence trunk contents extraction, finally obtain the structuring triplet sets that can be used for inquiring.
It is as follows that this group of template is applied into the matching process on the dependency tree of question sentence:
(1) information that burl point template can parse all interdependent nodes in question sentence (plays the work for strengthening semantic tagger
With specifying interrogative pronoun, name substantive noun, a variety of sentence ingredients such as predicate).Regular expression of all categories is defined, is used for
Strengthen identification such as name, place name, time, physical name.Further according to name entity involved in above-mentioned natural language processing process
The mark of identification and extension vocabulary, marks interdependent tree node classification in detail.
Combining above-mentioned (including regular expression and extension vocabulary etc. the side of all categories that can be used for marking type of word
Method) after method, the structure and content of each tree node are stored with following tree node template:
Burl point template is used to accurately identify the node for meeting specified criteria, and system carries out natural language processing for problem
Later, during traversing to dependency tree interior joint, realize that the reinforcing to each node content marks, more with one kind
Detailed mode illustrates the type of the word content of each node.
On this basis, the content of each node can be categorizedly grouped under the classification of burl point template, as
The matched basis of second step tree construction canonical template.
(2) the node path matching syntax tree path that tree construction canonical template can parse question sentence dependency tree obtains effectively
Question sentence structure extracts the most useful content, generally question sentence trunk content and crucial qualifier.For true type problem
In template extraction process, first step node template, which matches, can parse the noun content of time, place name, and the time, it is dotted
Language does not generate other influences to true sentence structure trunk, selects to match it in the interdependent tree construction modulus of regularity plate of progress here
Before, make proper treatment, extract time, place noun, and removes time that may be present, the preposition in point adverbial (such as
" ", " in " etc.).
Specifically, according to the path of the root node of syntax tree to leaf node, the canonical template of Lai Dingyi tree construction is used for
Canonical matches the path of syntax tree, extracts useful field.In general, the question sentence structure node of question and answer type of the same race has its general character,
Such as often with the time with preposition or point adverbial and Subjective and Objective behavior act in typical true type problem, in node solution
It is with uniformity in analysis and structure extraction.This feature allows burl point template to have certain generalization ability, it can passes through one
(i.e. certain similar sentence patterns or similar theme question sentence are with the same or similar for a tree node template matching one kind general character node
Tree construction canonical template).
It will set first and carry out path by starting point of root node, and obtain a series of root nodes to the interdependent road of leaf node
Diameter.These route matchings use the form similar to regular expression.It is different from the place of regular expression, regular expressions
The ordinary item of formula is all character match, and the ordinary item of tree construction canonical template is all burl point template in system, such a
Template can be matched with same characteristic features but the generation path of the different tree of node content.
Canonical template supports canonical operation to have: connection (" ab "), side by side (" a | b " or " [ab] "), Kleene repeats (greedy
Mode " a* " and non-greedy mode " a*? "), it is common repeat (greedy mode " a+ " and non-greedy mode " a+? "), it is optional (" a? ")
And location matches (starting position " ^ " and end position " $ ").
The task of template is to identify specific minor structure and extract useful part from these minor structures, it would be desirable to be able to
The node of the specific position of compatible portion is enough extracted, therefore supports the anonymous capture group based on bracket, capture group content uses whole
Type serial number accesses.Therefore, matching result can easily pass through " canonical template name capture group # " (capture group,
The subexpression matching content of regular expression facilitates reference with digital number, with " (sequence of " appearance is successively compiled in expression formula
Number, in general, 0 indicates entire expression formula) access obtains.
In addition, each tree can generate several paths, after the completion of all route matchings, need to be formed tree construction.
Since different paths can share a part of node, when route matching result is integrated, therefore, to assure that same node
The result matched is also identical, i.e., the corresponding node under Different matching path will be aligned.So the canonical template of each tree construction is all
Added " CONSTRAINTS " field, the node to constrain matching result between different paths is aligned, together as above,
Matching result is obtained by " canonical template name captures group # ".The field only needs to express corresponding node matching content etc.
Or differ, therefore be expressed as " (=canonical template name capture group # ...) " or " (!=canonical template name capture group is compiled
Number ...) ".
According to described above, the problem of identical solution classification or the problem of similar clause, has the same or similar tree construction just
Then template, therefore corresponding applicable extensive template can be defined according to actual needs during practical problem parsing.Due to
The complexity features of Chinese language expression, the number of such structure of transvers plate is still relatively more (to be suitable for different Chinese and expresses sentence
The template of formula).
Herein for typical space-time restriction class fact type problem, the formwork style of definition is provided.Dependency tree matching template
An example flow it is as follows:
Example: " nineteen fifty, Alan scheme spirit where proposes turing test? "
It is segmented according to natural language, the question semanteme dependency tree result tentatively obtained are as follows:
" Alan schemes spirit ", " turing test " are to assert that the character is contacted by the name entity of entity vocabulary identifying processing
Continue indivisible.
It is noted here that extract with solve time for not influencing of part or point adverbial part after, root that template obtains
Set path be " proposition → Alan scheme spirit ", " propose → → where " and " proposition → turing test ".Template matching mistake later
Journey is as follows:
More than, as interdependent tree node, the definition of canonical structure template and a space-time restriction class fact type question sentence show
Example tree node, canonical structure template process of analysis.
Solution for space-time restriction class fact type problem solves different true elements, can be with other correspondences of like configurations
Template, replacement solve interrogative pronoun, remaining basic format content of template is almost consistent.
(3) intermediate result that intermediate result expression form template is used to indicate to obtain after two above template extraction, is just
The question and answer solution of beginning question sentence.Based on intermediate result, then the space-time restriction class fact type semantization model of predefined is used,
Corresponding entity relationship triple is generated, can be used in the structuralized query of next step.
Such as " what French capital is? " the triple that the intermediate result that question sentence obtains generates is < " France ", " first
All ", what >;" nineteen fifty, Alan scheme where spirit proposes turing test? " intermediate result generate triplet sets be Q:
{<Q, " time ", " nineteen fifty ">,<Q, " place ", where>,<Q, " main body ", " Alan schemes spirit ">,<Q, " object ", " figure spirit is surveyed
Examination ">,<" Alan schemes spirit ", " proposition ", " turing test ">}.
3. the semantization model using space-time restriction class fact type problem arranges intermediate result, the knot of structural belt constraint
Structureization inquiry carries out the match query based on similarity calculation to knowledge base, obtains query result and therefrom extracts answer
In general, a complicated fact can be parsed out multiple components, most Expressive Features is true relevant
Time, place, the behavior act that true relevant main body, object and main object are made.Space-time restriction class Fact Model
Time noun, the place noun obtained according to dependency tree node template, main body that canonical structure template obtains, object, action row
For (main body → object), accurate extract in sentence includes that generation true { time, place, main body, object, behavior act } is multiple
Component.Partial Elements are expressed as empty (NULL) if without value, and the triple comprising null value can generate or not as needed
It generates.
Using semantic network technology, the fact represented by sentence is described as blank node in semantic network technology.So-called blank
Node indicates that the node of the URI mark of a specific, concrete can not be used.Under this situation, blank node indicate one it is true
It states (statement), none specific value itself can describe it, but can be belonged to time, place, Subjective and Objective etc.
Property and its value come expand describe its extension.
After each component and its value for extracting event statements, using component as attribute, element value is
Specific literal, " entity-attribute-value " triples multiple to each sentence generation respectively.Specifically, triple indicates shape
Formula T=<s, p, o>, s indicates that the subject of the triple description content, p are predicate, and o is object.The event expressed with event statements
Q is center subject, then whole event (time, place, main body, object, behavior act) is represented by
Q:{Tt,Tl,Ts,To,Tact, Tt=<Q, time, tValue>, Tl=<Q, location, lValue>, Ts=< Q,
Subject, sValue >, To=<Q, object, oValue>, Tact=<sValue, actValue, oValue>.
Content expressed by above-mentioned triplet sets can be presented with a kind of visual means, (note: work as the fact as shown in Figure 2
When there was only true main body in sentence, object merges with main body, and object value merges with main body value, and behavior act is formed from ring).
Distinguishingly, in a true sentence, such as without object situation, such as " Nobel died in 1896 ", then object and main body are closed
And behavior act forms main body from ring.
KnowledgeBase-query matching is carried out to obtained question sentence triplet sets.The set can be considered as one with other three
The ternary group polling of tuple constraint, for question sentence fact triplet sets, the triple to be solved answered where element
Value is replaced with interrogative.In query process of the reality to knowledge base, ignore the literal and knowledge base of true central node Q
The literal similarity of the similar true central node of middle fact description scheme, to other words in each triple in addition to Q
Face amount be variate-value member carry out similarity calculation (i.e. the fixed attributes such as time, place, main body, object name must be matched strictly,
Attribute value then carries out similarity calculation matching), synonym extension (is added with famous Jaro-Winkler character range formula
Vocabulary, it is believed that synonymous Word similarity is that 1) measurement (each includes in the triple of first similarity calculation of multiple literal variables
The Similarity-Weighted for carrying out internal member again is average, and having the member of n literal variable, then similarity calculation weight is 1/n, such as Tact
The similarity calculation weight of each literal variable elements is 1/3) to obtain each group of fact component triple in triple
Similarity, obtaining 5 similarity values is { St,Sl,Ss,So,Sact}。
For question sentence triplet sets and each candidate triplet sets, the similarity weight of each triple is enabled to be
{Wt,Wl,Ws,Wo,Wact, it is set here for judging whether two true triplet sets express same facts, therein group
It is of equal value at the effect of judgement caused by element, therefore is all 1/5=0.2 to its 5 weight equivalent valuations, but also remain
A possibility that being adjusted flexibly according to the actual situation.Then the final similarity S for answering sentence for the question sentence fact and a candidate is calculated
I.e. are as follows:
S=WtSt+WlSl+WsSs+WoSo+WactSact.
Distinguishingly, it is possible to occur there was only time element or ground point element first, and only true main body situation, because
This is enabled again:
Wt+Wl=0.4, Ws+Wo=0.4.
And under this special case, value is that empty element similarity weight is 0, which is not included in similarity
Calculating process.
Based on above-mentioned calculating formula of similarity, each candidate thing in question sentence fact triplet sets and knowledge base is calculated
After the final similarity of real triplet sets, descending arrangement is carried out to it, taking similarity value the maximum is the three of the problem that is best suitable for
Tuple-set (multiple similarities and the very approximate situation of highest similarity if it exists, i.e., difference is less than 0.05, then it is assumed that they
All it is qualified), part to be answered therefrom is extracted, the query in former question sentence triple to be solved is replaced with corresponding content
Word, (by being then based on similarity calculation, therefore final result not necessarily complies fully with reality for the final result that can as provide
The fact, because may be without relevant information knowledge in knowledge base).
The present invention is different from text answering method, but in the Chinese natural language semantization method based on template extraction,
For true type problem, the conversion of Chinese natural language question sentence to structuralized query is realized, and based on similarity calculation
The mode of match query realizes the automatic question answering in knowledge based library, and it is more fine-grained wait ask can to provide sentence more whole than text answers
Solve the answer extracting of part.
Claims (1)
1. a kind of knowledge base automatic question-answering method of Chinese natural language question semanteme, which comprises the following steps:
(1.1) for a true type Chinese natural language problem of input, by the keyword abstraction in problem and based on finger altogether
Entity is extended, generative semantics dependency tree;
(1.2) semantic dependent tree obtained based on the step (1.1), uses the dependency tree node template and base of predefined
In the interdependent tree construction modulus of regularity plate that statistical learning obtains, a variety of true elements for including in question sentence and its value are extracted, is used
Intermediate result template generation intermediate result;The step (1.2) the following steps are included:
(2.1) using the semantic dependent tree obtained based on the step (1.1), reinforcing again is carried out with dependency tree node template
Match, mark out the specific part of speech classification of interdependent tree node, generates markup information;
(2.2) markup information obtained based on the step (2.1) carries out the trunk of question sentence using interdependent tree construction modulus of regularity plate
It extracts, the node path matching syntax tree path of parsing question sentence dependency tree obtains effective question sentence structure, extracts the most useful ask
Sentence trunk content and crucial qualifier;
(2.3) the question sentence trunk content obtained based on the step (2.2), using intermediate result template generation intermediate result, in
Between result indicate content be question sentence solve solution, be the base for being subsequently generated the KnowledgeBase-query based on triplet sets
Plinth;
(1.3) it is based on mentioning in the step (1.2) dependency tree via dependency tree node template and interdependent tree construction modulus of regularity plate
Relevant element value and intermediate result are obtained, with space-time restriction class Fact Model by the element value and intermediate result language
Adopted metaplasia forms the KnowledgeBase-query based on similarity calculation, answer is extracted from knowledge base at triplet sets;The step
(1.3) the following steps are included:
(3.1) attribute for defining space-time restriction class Fact Model is the { behavior of time, place, main body, object, main object
Movement }, the interdependent tree node and question sentence trunk content obtained by the step (2.1) and (2.2) determines that question sentence is true related
Time, place, main body, the value of object and behavior act, including wait answer element;
(3.2) obtained value is extracted based on the step (3.1), with Subject, Predicate and Object triple<s, p, o>form generate each thing
Real part point, wherein the value to be answered in the triple of element to be answered is replaced with interrogative, each question sentence can be expressed as one
A question sentence fact triplet sets;
(3.3) the question sentence fact triplet sets obtained based on the step (3.2), are organized into the knowledge of a with constraint conditions
Library inquiry;
(3.4) KnowledgeBase-query obtained based on the step (3.3), subgraph match is carried out into knowledge base, in knowledge base
The similar candidate triplet sets of each description scheme carry out the accurate matching and element property value of each element property name
Similarity calculation, then element weights are weighted and averaged to obtain each true triplet sets for the fact that define according to semantization model
Final similarity, by similarity height sort, therefrom extract part to be answered, as final result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610125710.6A CN105701253B (en) | 2016-03-04 | 2016-03-04 | The knowledge base automatic question-answering method of Chinese natural language question semanteme |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610125710.6A CN105701253B (en) | 2016-03-04 | 2016-03-04 | The knowledge base automatic question-answering method of Chinese natural language question semanteme |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105701253A CN105701253A (en) | 2016-06-22 |
CN105701253B true CN105701253B (en) | 2019-03-26 |
Family
ID=56220835
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610125710.6A Active CN105701253B (en) | 2016-03-04 | 2016-03-04 | The knowledge base automatic question-answering method of Chinese natural language question semanteme |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105701253B (en) |
Families Citing this family (96)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180025121A1 (en) * | 2016-07-20 | 2018-01-25 | Baidu Usa Llc | Systems and methods for finer-grained medical entity extraction |
CN106339366B (en) * | 2016-08-08 | 2019-05-31 | 北京百度网讯科技有限公司 | The method and apparatus of demand identification based on artificial intelligence |
CN106295187A (en) * | 2016-08-11 | 2017-01-04 | 中国科学院计算技术研究所 | Construction of knowledge base method and system towards intelligent clinical auxiliary decision-making support system |
CN106503194A (en) * | 2016-11-02 | 2017-03-15 | 大唐软件技术股份有限公司 | Information getting method and device |
CN106844335A (en) * | 2016-12-21 | 2017-06-13 | 海航生态科技集团有限公司 | Natural language processing method and device |
CN106815745A (en) * | 2016-12-30 | 2017-06-09 | 北京三快在线科技有限公司 | Vegetable recommends method and system |
CN106919655B (en) * | 2017-01-24 | 2020-05-19 | 网易(杭州)网络有限公司 | Answer providing method and device |
CN108446286B (en) * | 2017-02-16 | 2023-04-25 | 阿里巴巴集团控股有限公司 | Method, device and server for generating natural language question answers |
CN107169013B (en) * | 2017-03-31 | 2018-01-19 | 北京三快在线科技有限公司 | A kind of processing method and processing device of dish information |
CN106897273B (en) * | 2017-04-12 | 2018-02-06 | 福州大学 | A kind of network security dynamic early-warning method of knowledge based collection of illustrative plates |
CN107239481B (en) * | 2017-04-12 | 2021-03-12 | 北京大学 | Knowledge base construction method for multi-source network encyclopedia |
CN107247613A (en) * | 2017-04-25 | 2017-10-13 | 北京航天飞行控制中心 | Sentence analytic method and sentence resolver |
CN107256226B (en) * | 2017-04-28 | 2018-10-30 | 北京神州泰岳软件股份有限公司 | A kind of construction method and device of knowledge base |
CN107193798B (en) * | 2017-05-17 | 2019-06-04 | 南京大学 | A kind of examination question understanding method in rule-based examination question class automatically request-answering system |
CN107239450B (en) * | 2017-06-02 | 2021-11-23 | 上海对岸信息科技有限公司 | Method for processing natural language based on interactive context |
CN107423437B (en) * | 2017-08-04 | 2020-09-01 | 逸途(北京)科技有限公司 | Question-answer model optimization method based on confrontation network reinforcement learning |
CN107423439B (en) * | 2017-08-04 | 2021-03-02 | 识因智能科技(北京)有限公司 | Chinese problem mapping method based on LDA |
CN107748757B (en) * | 2017-09-21 | 2021-05-07 | 北京航空航天大学 | Question-answering method based on knowledge graph |
CN109684354A (en) * | 2017-10-18 | 2019-04-26 | 北京国双科技有限公司 | Data query method and apparatus |
CN107818148A (en) * | 2017-10-23 | 2018-03-20 | 南京南瑞集团公司 | Self-service query and statistical analysis method based on natural language processing |
CN107885844A (en) * | 2017-11-10 | 2018-04-06 | 南京大学 | Automatic question-answering method and system based on systematic searching |
CN107895037B (en) * | 2017-11-28 | 2022-05-03 | 北京百度网讯科技有限公司 | Question and answer data processing method, device, equipment and computer readable medium |
CN108052577B (en) * | 2017-12-08 | 2022-06-14 | 北京百度网讯科技有限公司 | Universal text content mining method, device, server and storage medium |
CN108108426B (en) * | 2017-12-15 | 2021-05-07 | 杭州汇数智通科技有限公司 | Understanding method and device for natural language question and electronic equipment |
CN110020015A (en) * | 2017-12-29 | 2019-07-16 | 中国科学院声学研究所 | A kind of conversational system answers generation method and system |
CN108287822B (en) * | 2018-01-23 | 2022-03-01 | 北京容联易通信息技术有限公司 | Chinese similarity problem generation system and method |
CN109344385B (en) * | 2018-01-30 | 2020-12-22 | 深圳壹账通智能科技有限公司 | Natural language processing method, device, computer equipment and storage medium |
CN108376287A (en) * | 2018-03-02 | 2018-08-07 | 复旦大学 | Multi-valued attribute segmenting device based on CN-DBpedia and method |
CN108491378B (en) * | 2018-03-08 | 2021-11-09 | 国网福建省电力有限公司 | Intelligent response system for operation and maintenance of electric power information |
CN110362662A (en) * | 2018-04-09 | 2019-10-22 | 北京京东尚科信息技术有限公司 | Data processing method, device and computer readable storage medium |
CN108549694B (en) * | 2018-04-16 | 2021-11-23 | 南京云问网络技术有限公司 | Method for processing time information in text |
CN108549710B (en) * | 2018-04-20 | 2023-06-27 | 腾讯科技(深圳)有限公司 | Intelligent question-answering method, device, storage medium and equipment |
CN108932278B (en) * | 2018-04-28 | 2021-05-18 | 厦门快商通信息技术有限公司 | Man-machine conversation method and system based on semantic framework |
CN108595696A (en) * | 2018-05-09 | 2018-09-28 | 长沙学院 | A kind of human-computer interaction intelligent answering method and system based on cloud platform |
CN108733359B (en) * | 2018-06-14 | 2020-12-25 | 北京航空航天大学 | Automatic generation method of software program |
CN108984527A (en) * | 2018-07-10 | 2018-12-11 | 广州极天信息技术股份有限公司 | A kind of method for recognizing semantics and device based on concept |
CN110852110B (en) * | 2018-07-25 | 2023-08-04 | 富士通株式会社 | Target sentence extraction method, question generation method, and information processing apparatus |
CN110851560B (en) * | 2018-07-27 | 2023-03-10 | 杭州海康威视数字技术股份有限公司 | Information retrieval method, device and equipment |
CN110858100B (en) * | 2018-08-22 | 2023-10-20 | 北京搜狗科技发展有限公司 | Method and device for generating association candidate words |
CN109344236B (en) * | 2018-09-07 | 2020-09-04 | 暨南大学 | Problem similarity calculation method based on multiple characteristics |
CN109408811B (en) * | 2018-09-29 | 2021-10-22 | 联想(北京)有限公司 | Data processing method and server |
CN110990541A (en) * | 2018-09-30 | 2020-04-10 | 北京国双科技有限公司 | Method and device for realizing question answering |
CN109613917A (en) * | 2018-11-02 | 2019-04-12 | 广州城市职业学院 | A kind of question and answer robot and its implementation |
CN109522418B (en) * | 2018-11-08 | 2020-05-12 | 杭州费尔斯通科技有限公司 | Semi-automatic knowledge graph construction method |
CN111241841B (en) * | 2018-11-13 | 2024-04-05 | 第四范式(北京)技术有限公司 | Semantic analysis method and device, computing device and readable medium |
CN111210824B (en) * | 2018-11-21 | 2023-04-07 | 深圳绿米联创科技有限公司 | Voice information processing method and device, electronic equipment and storage medium |
CN109753541A (en) * | 2018-12-10 | 2019-05-14 | 北京明略软件系统有限公司 | A kind of relational network construction method and device, computer readable storage medium |
CN109684448B (en) * | 2018-12-17 | 2021-01-12 | 北京北大软件工程股份有限公司 | Intelligent question and answer method |
CN109766994A (en) * | 2018-12-25 | 2019-05-17 | 华东师范大学 | A kind of neural network framework of natural language inference |
CN111400458A (en) * | 2018-12-27 | 2020-07-10 | 上海智臻智能网络科技股份有限公司 | Automatic generalization method and device |
CN109710939B (en) * | 2018-12-28 | 2023-06-09 | 北京百度网讯科技有限公司 | Method and device for determining theme |
CN109902087B (en) * | 2019-02-02 | 2023-05-30 | 上海来也伯特网络科技有限公司 | Data processing method and device for questions and answers and server |
CN109947914B (en) | 2019-02-21 | 2023-08-18 | 扬州大学 | Automatic software defect question-answering method based on template |
WO2020178626A1 (en) * | 2019-03-01 | 2020-09-10 | Cuddle Artificial Intelligence Private Limited | Systems and methods for adaptive question answering |
CN109949637B (en) * | 2019-03-13 | 2021-07-16 | 广东小天才科技有限公司 | Automatic answering method and device for objective questions |
CN110147436B (en) * | 2019-03-18 | 2021-02-26 | 清华大学 | Education knowledge map and text-based hybrid automatic question-answering method |
CN109977370B (en) * | 2019-03-19 | 2023-06-16 | 河海大学常州校区 | Automatic question-answer pair construction method based on document structure tree |
CN110019687B (en) * | 2019-04-11 | 2021-03-23 | 宁波深擎信息科技有限公司 | Multi-intention recognition system, method, equipment and medium based on knowledge graph |
CN109977421A (en) * | 2019-04-15 | 2019-07-05 | 南京邮电大学 | A kind of Knowledge Base of Programming subjects answering system after class |
CN110096580B (en) * | 2019-04-24 | 2022-05-24 | 北京百度网讯科技有限公司 | FAQ conversation method and device and electronic equipment |
CN111858861B (en) * | 2019-04-28 | 2022-07-19 | 华为技术有限公司 | Question-answer interaction method based on picture book and electronic equipment |
CN111858866A (en) * | 2019-04-30 | 2020-10-30 | 广东小天才科技有限公司 | Semantic analysis method and device based on triples |
CN110334179B (en) * | 2019-05-22 | 2020-12-29 | 深圳追一科技有限公司 | Question-answer processing method, device, computer equipment and storage medium |
CN110347808A (en) * | 2019-05-28 | 2019-10-18 | 成都美美臣科技有限公司 | One e-commerce website intelligent robot customer service construction method |
CN110532358B (en) * | 2019-07-05 | 2023-08-22 | 东南大学 | Knowledge base question-answering oriented template automatic generation method |
CN110321544B (en) * | 2019-07-08 | 2023-07-25 | 北京百度网讯科技有限公司 | Method and device for generating information |
CN110349477B (en) * | 2019-07-16 | 2022-01-07 | 长沙酷得网络科技有限公司 | Programming error repairing method, system and server based on historical learning behaviors |
CN110427471B (en) * | 2019-07-26 | 2022-10-18 | 四川长虹电器股份有限公司 | Natural language question-answering method and system based on knowledge graph |
CN110532366A (en) * | 2019-09-03 | 2019-12-03 | 出门问问(武汉)信息科技有限公司 | A kind of pattern rule management method, language generation method, apparatus and storage equipment |
CN110852067A (en) * | 2019-10-10 | 2020-02-28 | 杭州量之智能科技有限公司 | Question analysis method for non-entity word dependency extraction based on SVM |
CN110727780A (en) * | 2019-10-17 | 2020-01-24 | 福建天晴数码有限公司 | System and method for automatically expanding acquaintance text |
CN110727782A (en) * | 2019-10-22 | 2020-01-24 | 苏州思必驰信息科技有限公司 | Question and answer corpus generation method and system |
CN111125150B (en) * | 2019-12-26 | 2023-12-26 | 成都航天科工大数据研究院有限公司 | Search method for industrial field question-answering system |
CN111159345B (en) * | 2019-12-27 | 2023-09-05 | 中国矿业大学 | Chinese knowledge base answer acquisition method and device |
CN111339269B (en) * | 2020-02-20 | 2023-09-26 | 来康科技有限责任公司 | Knowledge graph question-answering training and application service system capable of automatically generating templates |
CN111382256B (en) * | 2020-03-20 | 2024-04-09 | 北京百度网讯科技有限公司 | Information recommendation method and device |
CN111553160B (en) * | 2020-04-24 | 2024-02-02 | 北京北大软件工程股份有限公司 | Method and system for obtaining question answers in legal field |
CN111651569B (en) * | 2020-04-24 | 2022-04-08 | 中国电力科学研究院有限公司 | Knowledge base question-answering method and system in electric power field |
CN111625623B (en) * | 2020-04-29 | 2023-09-08 | 奇安信科技集团股份有限公司 | Text theme extraction method, text theme extraction device, computer equipment, medium and program product |
CN111708800A (en) * | 2020-05-27 | 2020-09-25 | 北京百度网讯科技有限公司 | Query method and device and electronic equipment |
CN111782781A (en) * | 2020-05-29 | 2020-10-16 | 平安科技(深圳)有限公司 | Semantic analysis method and device, computer equipment and storage medium |
CN111709250B (en) * | 2020-06-11 | 2022-05-06 | 北京百度网讯科技有限公司 | Method, apparatus, electronic device, and storage medium for information processing |
CN111459973B (en) * | 2020-06-16 | 2020-10-23 | 四川大学 | Case type retrieval method and system based on case situation triple information |
CN111949781B (en) * | 2020-08-06 | 2021-11-19 | 贝壳找房(北京)科技有限公司 | Intelligent interaction method and device based on natural sentence syntactic analysis |
CN112256847B (en) * | 2020-09-30 | 2023-04-07 | 昆明理工大学 | Knowledge base question-answering method integrating fact texts |
CN112287080B (en) * | 2020-10-23 | 2023-10-03 | 平安科技(深圳)有限公司 | Method and device for rewriting problem statement, computer device and storage medium |
CN112380848B (en) * | 2020-11-19 | 2022-04-26 | 平安科技(深圳)有限公司 | Text generation method, device, equipment and storage medium |
CN112417170B (en) * | 2020-11-23 | 2023-11-14 | 南京大学 | Relationship linking method for incomplete knowledge graph |
CN112182230B (en) * | 2020-11-27 | 2021-03-16 | 北京健康有益科技有限公司 | Text data classification method and device based on deep learning |
CN112733547A (en) * | 2020-12-28 | 2021-04-30 | 北京计算机技术及应用研究所 | Chinese question semantic understanding method by utilizing semantic dependency analysis |
CN112906559B (en) * | 2021-02-10 | 2022-03-18 | 网易有道信息技术(北京)有限公司 | Machine-implemented method for correcting formulas and related product |
CN113590782B (en) * | 2021-07-28 | 2024-02-09 | 北京百度网讯科技有限公司 | Training method of reasoning model, reasoning method and device |
CN113761940B (en) * | 2021-09-09 | 2023-08-11 | 杭州隆埠科技有限公司 | News main body judging method, equipment and computer readable medium |
CN114357123B (en) * | 2022-03-18 | 2022-06-10 | 北京创新乐知网络技术有限公司 | Data matching method, device and equipment based on hierarchical structure and storage medium |
CN115080742B (en) * | 2022-06-24 | 2023-09-05 | 北京百度网讯科技有限公司 | Text information extraction method, apparatus, device, storage medium, and program product |
CN117332097B (en) * | 2023-11-30 | 2024-03-01 | 北京大数据先进技术研究院 | Knowledge question-answering method, device and product based on space-time semantic constraint |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001082125A1 (en) * | 2000-04-25 | 2001-11-01 | Invention Machine Corporation, Inc. | Creation of tree-based and customized industry-oriented knowledge base |
CN101373532A (en) * | 2008-07-10 | 2009-02-25 | 昆明理工大学 | FAQ Chinese request-answering system implementing method in tourism field |
CN101799802A (en) * | 2009-02-05 | 2010-08-11 | 日电(中国)有限公司 | Method and system for extracting entity relationship by using structural information |
-
2016
- 2016-03-04 CN CN201610125710.6A patent/CN105701253B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001082125A1 (en) * | 2000-04-25 | 2001-11-01 | Invention Machine Corporation, Inc. | Creation of tree-based and customized industry-oriented knowledge base |
CN101373532A (en) * | 2008-07-10 | 2009-02-25 | 昆明理工大学 | FAQ Chinese request-answering system implementing method in tourism field |
CN101799802A (en) * | 2009-02-05 | 2010-08-11 | 日电(中国)有限公司 | Method and system for extracting entity relationship by using structural information |
Non-Patent Citations (2)
Title |
---|
Proposed architectural model for optimal transformation of decision table and decision tree into knowledge base;M Shuaib Qureshi.etc;《Indian Journal of Science & Technology》;20100131;第362-364页 |
导游对话系统的相关技术研究;李静静;《中国优秀硕士学位论文全文数据库 信息科技辑》;20150315;第48-58页 |
Also Published As
Publication number | Publication date |
---|---|
CN105701253A (en) | 2016-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105701253B (en) | The knowledge base automatic question-answering method of Chinese natural language question semanteme | |
US10853357B2 (en) | Extensible automatic query language generator for semantic data | |
CN107038229A (en) | A kind of use-case extracting method based on natural semantic analysis | |
Athreya et al. | Template-based question answering using recursive neural networks | |
Abdelnabi et al. | Generating UML class diagram using NLP techniques and heuristic rules | |
Shekarpour et al. | Question answering on linked data: Challenges and future directions | |
CN109840255A (en) | Reply document creation method, device, equipment and storage medium | |
Steinmetz et al. | From natural language questions to SPARQL queries: a pattern-based approach | |
Ghosh et al. | Automated generation of er diagram from a given text in natural language | |
Lopez et al. | QuerioDALI: question answering over dynamic and linked knowledge graphs | |
Cabrio et al. | QALD-3: Multilingual Question Answering over Linked Data. | |
Li et al. | Neural factoid geospatial question answering | |
Banerjee et al. | Dblp-quad: A question answering dataset over the dblp scholarly knowledge graph | |
CN109857458A (en) | The method for transformation of the flattening of AltaRica 3.0 based on ANTLR | |
Di Buono | Information extraction for ontology population tasks. An application to the Italian archaeological domain | |
Nguyen et al. | Systematic knowledge acquisition for question analysis | |
Bai et al. | RDF snippets for Semantic Web search engines | |
Dileep et al. | Template-based question answering analysis on the LC-QuAD2. 0 dataset | |
Tang et al. | Ontology-based semantic retrieval for education management systems | |
Banerjee et al. | Natural language querying and visualization system | |
Li et al. | Automatic answer ranking based on sememe vector in KBQA | |
Yongyuth et al. | The AGROVOC Concept Server Workbench: A collaborative tool for managing multilingual knowledge | |
Shen et al. | OMReasoner: Combination of multi-matchers for ontology matching: Results for OAEI 2014 | |
Hong et al. | Extracting Web query interfaces based on form structures and semantic similarity | |
Seidel et al. | KESeDa: knowledge extraction from heterogeneous semi-structured data sources |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |