CN110147436B

CN110147436B - Education knowledge map and text-based hybrid automatic question-answering method

Info

Publication number: CN110147436B
Application number: CN201910203301.7A
Authority: CN
Inventors: 许斌; 刘阳; 杨玉基
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2019-03-18
Filing date: 2019-03-18
Publication date: 2021-02-26
Anticipated expiration: 2039-03-18
Also published as: CN110147436A

Abstract

The invention belongs to the technical field of intelligent education question answering, and particularly relates to a mixed automatic question answering method based on an education knowledge graph and a text, which comprises the following steps of: constructing a basic education knowledge graph by constructing a basic education ontology, performing semantic annotation and extracting information; building a general template of the problem according to the combination of the keywords and the regular expression; building a full-text search engine, and preprocessing massive texts; taking the question-answer pairs as a training set, and training until the deep text matching model is converged; identifying the user problem to obtain a subject list and endowing confidence with the subject list; carrying out template matching to obtain a predicate list and endowing confidence coefficients; inquiring a knowledge graph according to the subject and predicate lists to obtain an answer list, and endowing confidence coefficients; obtaining keywords by using a part-of-speech tagging method, and performing coarse and fine granularity matching to obtain answers and sequencing; returning an answer based on the educational knowledge graph if the highest confidence of the answer exceeds a threshold; otherwise, returning the answer with the top ranking based on the text.

Description

Education knowledge map and text-based hybrid automatic question-answering method

Technical Field

The invention belongs to the technical field of intelligent education question answering, and particularly relates to a mixed automatic question answering method based on an education knowledge graph and a text.

Background

Smart Education (Smart Education) has become an important form of development in the field of Education in the background of the information age. The essence of intelligent education lies in that an intelligent environment is constructed by utilizing an intelligent technical means, so that students can acquire knowledge and answer questions more quickly and better. The automatic question-answering system is undoubtedly a very effective method. On one hand, the automatic question-answering system can help primary and secondary school students to answer questions and ask puzzles, so that the primary and secondary school students can obtain answers to the questions in time in the everyday learning process. On the other hand, the enthusiasm of students for learning knowledge can be obviously improved due to good human-computer interaction. Therefore, it is necessary to construct a question-answering system capable of accurately understanding the questions posed by students and rapidly giving accurate answers.

The early question-answering system is an 'expert system' based on a template, a method is to manually make rules aiming at a specific field to construct the template, and the most obvious defect of the system is that the system can only process a small amount of data in the specific field; with the development of search technology, an open domain search query-answer (IE-QA) is created, that is, answers to questions are extracted from a large number of texts according to keywords and semantic relations in the questions, such as "Waston", "TREC" of IBM, and the like, and the question-answer mode solves the problem of narrow coverage area to a certain extent, but the extracted answers are not accurate due to the inequality of texts; later, internet communities are gradually emerging, and many internet companies develop community-oriented questions and answers such as "know", "Stack Overflow", and the like, wherein the nature of the question and answer form means that a converged platform is provided for users, and the correctness of the answers needs to be judged by the users.

The concept of "knowledge graph" proposed by google defines a completely new knowledge organization mode. It attempts to convert unstructured data into structured data and concatenate the various data together to form a graphical model containing a large amount of structured data, starting from the data itself. The structured graph model data provides a new development direction for the development of the question-answering system, namely the question-answering system (KB-QA) based on the knowledge graph, and the structured graph model data can fully utilize the structured data in the knowledge graph to provide very concise and accurate answers for users, so the structured graph model data gradually becomes an important research direction of the question-answering system. Meanwhile, very effective help can be provided for the development of the next-generation intelligent retrieval and the humanoid robot.

Currently, some work has been done on question-answering systems in the basic education field, but the work has the following problems: only a single source such as a knowledge graph or a text is used for asking and answering, the respective advantages of the two sources cannot be comprehensively utilized, and the method is specifically embodied in that: knowledge in the knowledge graph is accurate and has high structuring degree, but the coverage rate of the knowledge is not as good as that of the text; all knowledge is contained in the text, but semantic analysis is difficult due to unstructured text; if the user's questions are answered based solely on the knowledge-graph, many of the questions will not be answered; if the user's question is answered based on text only, many questions will be answered incorrectly. Only by combining the knowledge of the two sources well and comprehensively sequencing the answers of the two sources, the advantages of the two sources can be fully utilized, and the most comprehensive and accurate answer can be returned for the question provided by the user. In addition, for the basic education field, teaching materials and teaching aids are the most authoritative resources, and the knowledge in the teaching materials and teaching aids is not finely mined and processed by the existing basic education question-answering system; the knowledge points in the basic education field have more interdisciplinary associations, and the existing basic education question-answering system does not comprehensively consider knowledge of all disciplines.

Disclosure of Invention

Aiming at the technical problems, the invention provides a mixed automatic question-answering method based on an education knowledge map and a text, which comprises the following steps:

step 1: constructing a basic education ontology, performing semantic annotation on teaching materials of various disciplines, and extracting information of the teaching materials and internet encyclopedia text resources to construct a full-discipline basic education knowledge map; constructing a general template of the problem according to the keywords and the regular expression grammar;

step 2: building a full-text search engine, and preprocessing massive texts of teaching materials and internet encyclopedias to accord with the index format of the search engine; taking the large-scale test question and answer pairs of the basic education as a training set, and training by using a deep text matching model until the model converges;

and step 3: carrying out entity recognition on the user problem to obtain a subject list, and giving a corresponding confidence coefficient to each subject; carrying out template matching on the user problem to obtain a predicate list, and endowing each predicate with a corresponding confidence coefficient; inquiring the knowledge graph according to the subject list and the predicate list to obtain an answer list based on the education knowledge graph, and giving a corresponding confidence coefficient to each answer;

and 4, step 4: obtaining keywords with different grades in the question by using a part-of-speech tagging method, inputting the keywords into the search engine to perform coarse-grained matching to obtain a text-based answer list; performing fine-grained matching on a text-based answer list by using a pre-trained deep text matching model to obtain answers and sequencing;

and 5: returning an answer based on the educational knowledge graph if the highest confidence of the answer exceeds a threshold; otherwise, returning the answer with the top ranking based on the text.

The basic education ontology is constructed through a semi-automatic ontology construction method.

The information extraction is used to augment instances, relationships, and attributes of knowledge.

The general template for the construction problem specifically includes:

forming a general template aiming at the type of problems by combining regular expression grammar based on the relation or attribute in the education knowledge graph as a keyword;

analyzing the problems in the large-scale education question-answer data set by using a syntactic analysis tool, extracting keywords, and forming a general template aiming at the type of problems by combining regular expression grammar;

generating a template based on the high-discrimination-degree questioning words;

a template is generated based on the general question words.

The full-text search engine is an extensible open-source full-text search and analysis engine elastic search.

The giving of the corresponding confidence to each subject specifically includes:

the method is completely matched with the examples in the example table, and the confidence coefficient is 1;

obtaining and removing the examples of stop words through template segmentation, wherein the confidence coefficient is 0.8;

and the confidence coefficient of an example obtained by fuzzy matching similarity calculation and longest common substring matching is 0.6.

The corresponding confidence given to each predicate specifically includes:

generating a template based on the relationship or attribute in the educational knowledge graph, wherein the confidence coefficient is 1;

generating a template based on the keywords extracted by the syntactic analysis, wherein the confidence coefficient is 1;

generating a template based on the high-discrimination query words, wherein the confidence coefficient is 2;

the confidence is 3 based on the template generated by the general question word.

The corresponding confidence given to each answer specifically includes:

combining the subject list and the predicate list one by one to generate a spark ql query statement;

inquiring an education knowledge map to obtain an answer list;

giving a corresponding confidence coefficient to each answer according to a preset rule, wherein the confidence coefficient calculation method comprises the following steps:

the calculation formula is as follows: score is subjectscore × pscore; pscore is the score for the predicate and subjectscore is the subject score;

determining the pscore by the template confidence coefficient, wherein the pscore is 1/the template confidence coefficient;

subjectScore is determined by the subject confidence, which is 20 × rate × subject confidence;

the rate is determined by the longest common substring of the subject and question:

rate ═ square root function math.sqrt (length of longest common substring/length of subject) × power function math.pow (length of subject, 1.0/2).

The part-of-speech tagging method specifically comprises the following steps:

setting words with parts of speech being noun, verb v, name nr and other subjects or predicates as primary keywords;

setting adverbs d, numerators m, noumenon Ng and other words of the modified subject or predicate as secondary keywords;

and setting the conjunctive words c, the paralinguistic Dg, the sigh words e, the direction words f and the words irrelevant to the keywords as the third-level keywords.

The coarse grain size matching specifically comprises:

carrying out strict phrase query on each primary keyword, carrying out logical connection on all the phrase queries, and setting queries at least matched with 50%;

carrying out strict phrase query on each secondary keyword, carrying out logical connection on all phrase queries, and not setting at least the number of matched queries;

no query is made for the third level keywords.

The invention has the beneficial effects that:

the invention realizes the complete coverage of nine basic education subjects of Chinese, mathematics, English, politics, history, geography, physics, chemistry and biology, takes teaching materials and teaching assistance as the main part and massive internet resources as the assistance, fully exerts the characteristics of high efficiency and accuracy of KB-QA answer and the characteristics of wide coverage of IE-QA, and ensures that the most accurate answer is returned aiming at the problems of the user.

Drawings

FIG. 1: the embodiment of the invention provides a mixed question-answering system structure diagram based on an educational knowledge graph.

FIG. 2: the embodiment of the invention provides a structure diagram of a deep text matching model.

Detailed Description

The embodiments are described in detail below with reference to the accompanying drawings.

Fig. 1 is a flowchart illustrating a hybrid automatic question-answering method based on an educational knowledge graph according to an embodiment of the present invention.

Referring to fig. 1, a method for constructing a basic education knowledge graph according to an embodiment of the present invention includes:

s1, constructing an educational knowledge map and a template;

s2, electronic paper teaching material teaching assistance and Internet text preprocessing;

s3, question answering and scoring based on the education knowledge map;

s4, question answering and grading based on texts;

s5, answer selection based on education knowledge map and text.

In this embodiment, the offline processing step in step S1 further includes the following steps shown in fig. 1:

s11, constructing a basic education knowledge graph by using the measures of ontology construction, semantic annotation, information extraction and the like, mainly teaching materials and auxiliary resources of the Internet.

S12, establishing a template base according to the knowledge graph of the existing basic education field, and establishing a one-to-many regular expression template for the relation (or attribute) in the knowledge graph.

In this embodiment, in step S11, the method of using ontology construction, semantic labeling, information extraction, and the like, mainly using teaching materials and teaching aids, and using internet resources as aids, constructs a basic education knowledge graph, and further includes the following steps not shown in the drawings:

using TF-IDF and TextRank algorithms to process the teaching material auxiliary texts to obtain candidate terms in the basic education field;

referring to knowledge graphs in general fields such as schema.

Determining concepts and relationships between the concepts and constraints thereof according to the encyclopedic website information boxes;

inviting experts and teachers in the education field to carry out examination and verification to complete the body construction process;

labeling the knowledge list of each subject in a mode of crowdsourcing semi-automatic semantic labeling according to the body to obtain the most core knowledge of each subject;

expanding required structural data from internet related websites, for example, obtaining Chinese administrative division information from national statistics bureau websites, and adding the information into a knowledge graph;

extracting information from the text by using a machine learning method, wherein the method comprises the steps of entity set expansion, relation extraction and the like;

in this embodiment, in step S12, a template library is established according to the knowledge graph of the existing basic education domain, and a one-to-many regular expression template is mainly established for the relationship (or attribute) in the knowledge graph, which further includes the following steps not shown in the figure:

the template is constructed by using the regular expression, and the main sources are two aspects:

1. and generating a corresponding template by combining the regular expression according to the relation and the attribute contained in the education knowledge graph constructed in the step S11.

2. And processing the pre-acquired problems, and acquiring corresponding keywords, mainly predicates, questioning words and the like according to syntactic analysis. And generating a corresponding template by combining the regular expression grammar.

In this embodiment, the templates are stored by using a mysql database, each template table has a plurality of fields, such as attributes and priorities, corresponding to the templates in addition to a specific regular template, and a specific structure is shown in table 2.

Table 1 provides part-of-speech priority information for IE-QA in accordance with an embodiment of the present invention.

The use of the various fields of each template is described in detail below:

the column of content is the contents of the template constructed in step S12, written in regular expressions. For example, there is a template "(? ", if a question matches the template, then the" geo-location "is considered the predicate that the question may be predicated. "(. For example, the question "geographic location of east mountain tai shan is? "the subject captured when matching this template is" east Yue Taishan ";

subject indicates whether the template subject is definite, false if the subject is unknown, and true for other defaults; such as "who is known to be sweaty in the day", which is unknown in the subject, is false.

Value indicates whether the object is determined;

type represents the relationship or attribute corresponding to this template. The relationship is the "edge" connecting two entities in the knowledge graph, for example, the connection is established between two entities of "China" and "Beijing" through the relationship of "capital". The attribute is some knowledge of the entity itself, for example, the entity "Beijing" has the attribute of "climate type", and the attribute value is "warm-zone continental monsoon climate".

Class denotes the class of the subject of the question, which is used for some special questions to define the type of subject. The Class mainly comprises: most of the time, person and the like are empty and mainly identify the subject type in a specific field;

and 6, when solving certain results which cannot be obtained through spark ql query, performing special processing on the problems, wherein the use is used for identifying the problems.

Priority identifies the priority of the template, which is mainly used to calculate the score of the predicate.

Table 2 is a schematic diagram of a problem template in the basic education field provided by an embodiment of the present invention.

There are three priorities for templates:

the first priority is a template generated specifically from the predicates of the problem, the relationships or attributes in the knowledge graph, and the specific type of problem, with a high confidence, such as "(? Condition (.? ", identified as" 1 "in the database;

the second priority is a template generated with a query word with distinct features, mainly for some questions about attributes that the first priority cannot match, such as "(? ", the confidence level of which is lower relative to the first priority template, is identified as" 2 "in the database;

the third priority is that when neither of the first and second priorities can be matched, it is matched with some broader query words, such as "(? ", the class template has the lowest confidence, identified as" 3 "in the database, compared to the first two priorities.

In this embodiment, the electronic paper teaching material and internet text preprocessing in step S2 further includes the following steps not shown in the figure:

s21, building a highly extensible open source full text search and analysis engine elastic search to support instant query and retrieval of massive texts.

S22, preprocessing mass texts such as teaching materials, encyclopedia and the like, and adding an elastic search index according to an elastic search index format.

S23, taking the basic education large-scale question and answer pairs as a training set, and training by using a deep text matching model until the model converges;

in this embodiment, in step S22, mass texts such as teaching materials, encyclopedia, and the like are preprocessed, and an elastic search index is added according to an elastic search index format, which further includes the following steps not shown in the drawings:

the teaching materials are electronized, and webpage elements such as html tags and texts with irrelevant knowledge are filtered;

acquiring encyclopedia website text resources such as encyclopedia and the like;

segmenting the texts according to paragraphs to form paragraph texts;

if the segmented text can be linked with the entity in the knowledge base, adding the segmented text into an elastic search index;

connecting the triple knowledge in the knowledge base and adding the triple knowledge into the elasticsearch index;

in this embodiment, the step S23 of using the basic education large-scale question and answer pairs as a training set and training the basic education large-scale question and answer pairs to model convergence by using the deep text matching model further includes the following steps not shown in the figure:

electronizing the test question teaching aid, and filtering out webpage elements such as html labels and texts with irrelevant knowledge;

selecting a selection question and a filling-in-blank question from the question, replacing a blank part in the question with the most appropriate question word to be used as a question, and using a correct answer in the question as an answer to generate a question-answer pair;

according to the following steps: 3, dividing the question-answer pairs into a training set and a verification set;

inputting the question-answer pairs into the deep text matching model shown in FIG. 2, and training until the model converges;

referring to fig. 2, the deep text matching model includes an Embedding layer, a plurality of intermediate layers and an output layer, the intermediate layers may adopt a multi-layer perceptron or LSTM module, and the output layer finally outputs a confidence level indicating whether the input answer is a correct answer to the input question.

In this embodiment, the question answering and scoring based on the educational knowledge graph in step S3 further includes the following steps shown in fig. 1:

and S31, performing entity recognition and entity linkage on the user questions to obtain a possible subject list, and giving corresponding confidence to each subject according to preset rules.

And S32, carrying out template matching on the user problem and the template library to obtain a possible predicate list, and endowing each predicate with a corresponding confidence coefficient according to a preset rule.

S33, generating sparql sentences according to the obtained subject lists and predicate lists, inquiring the knowledge graph to obtain answer lists, and endowing each answer with a corresponding confidence coefficient according to preset rules;

in this embodiment, in step S31, the entity recognition and entity linking are performed on the user question to obtain a possible subject list, and a corresponding confidence is given to each subject according to a preset rule, and the method further includes the following steps that are not shown in the figure:

carrying out entity recognition and entity linkage on natural language questions input by a user to obtain a possible subject list, and giving corresponding confidence to each subject according to a preset rule; the method mainly adopts the methods of example table matching, template segmentation, synonym forest query, similarity calculation, longest common substring matching and the like, and sets the priority according to the confidence of each method to obtain a candidate entity set. Each priority setting rule is as follows:

matching the example table, namely completely matching the example table with a certain entity in the knowledge graph, wherein the confidence coefficient of the example table is 1;

template segmentation matching, namely acquiring a subject such as 'who is an author of' quiet night thinking? "first matched to template" (? ";

acquiring a capturing group 'meditation night thought' through a regular expression, and obtaining a subject 'meditation night thought' after stop words are removed, wherein the confidence coefficient of the method is 0.8;

synonym forest query, similarity calculation and longest common substring matching all use similar ideas, so that the confidence coefficient of the synonym forest query, the similarity calculation and the longest common substring matching is set to be 0.6.

In this embodiment, the template matching of the user question and the template library in step S32 to obtain a list of possible predicates, and assigning a confidence corresponding to each predicate according to a preset rule, further includes the following steps not shown in the figure:

carrying out template matching on the user problem and a template library to obtain a possible predicate list, and endowing each predicate with a corresponding confidence coefficient according to a preset rule;

the process of determining the predicates is to match the templates one by one, and if the predicates are matched, the attributes corresponding to the templates are regarded as problems. For example, the question "habitually, what mountains are used as a boundary to divide our country into monsoon regions and non-monsoon regions" is matched to a template "(? For border ", determine its corresponding attribute as [ borderline ].

The corresponding confidence level formulation rule is as follows:

for the template directly generated by using the relation (or attribute) in the knowledge graph and the template formulated aiming at the special type of problems, the confidence coefficient is set to be 1;

for the template generated by using the query words with higher discrimination (such as "who, when") and the like, the confidence coefficient is set to be 2;

for templates generated with ambiguous phrases or interrogative words (e.g., "what"), the confidence is set to 3;

in this embodiment, in step S33, a spark ql statement is generated according to the obtained subject list and predicate list, a knowledge graph is queried to obtain an answer list, and a corresponding confidence is given to each answer according to a preset rule, which further includes the following steps not shown in the figure: generating a spark ql statement according to the subject list and the predicate list obtained in the steps S22 and S23, querying a knowledge graph to obtain an answer list, and giving a corresponding confidence coefficient to each answer according to a preset rule; there may be a plurality of subjects and predicates, and when generating query statements, the query statements are combined one by one into triples, each query statement is generated, and the score of each query statement is determined. For example, a query sentence for dividing our country into a monsoon region and a non-monsoon region by what mountain is used as a boundary is:

and according to the confidence degrees of the entities and the predicates obtained in the steps S31 and S32 and the respective types of the entities and the predicates, scoring and sequencing the candidate answers in the candidate answer set, and screening answers reaching a threshold value as correct answers. The scoring according to the query result of the template is mainly scored according to the priorities of the subject and the template, and the calculation formula is as follows: score is reject score pscore. The pscore refers to the score of a predicate and is determined by the priority of a template, and the specific rule is as follows:

1 pscore ═ 1/template priority;

SubjectScore is the score of the subject, which is formulated as: confidence of subject score 20 rate subject;

rate is determined by the longest common substring of the subject and question:

rate ═ Math.sqrt (length of longest common substring/length of subject). Math.pow (length of subject, 1.0/2)

In this embodiment, the text-based question answering and scoring in step S4 further includes the following steps shown in fig. 1:

and S41, obtaining keywords with different grades in the question according to a preset strategy by using a part-of-speech tagging method.

And S42, inputting the keywords with different grades in the semantic parsing step into an elasticsearch engine, and performing coarse-grained matching on the massive indexes according to a preset query strategy to obtain a coarse-grained answer list.

And S43, performing fine-grained matching on the coarse-grained answer list obtained in the step S23 by using the trained deep text matching model in the step S23, obtaining answers, sorting, and returning the answer with the highest sorting order.

In this embodiment, the step S41 of obtaining the keywords with different levels in the question according to the preset policy by using the part-of-speech tagging method further includes the following steps not shown in the figure:

firstly, performing word segmentation and part-of-speech tagging on a user input problem to obtain part-of-speech information of each word;

adding each word in the question into a corresponding key level list by using the key level information of each part of speech shown in the table 1;

in this embodiment, in step S42, the method includes inputting keywords at different levels in the semantic parsing step into an elasticsearch engine, and performing coarse-grained matching on the massive indexes according to a preset query policy to obtain a coarse-grained answer list, and further includes the following steps not shown in the figure:

carrying out stricter phrase query on each primary keyword, carrying out logical connection on all phrase queries, and setting queries matched with at least 50%;

carrying out stricter phrase query on each secondary keyword, carrying out logical connection on all phrase queries, and not setting at least the number of matched queries;

no query is made for the third-level keywords;

the elastic search gives candidate answers and a corresponding confidence score of each candidate answer according to the strategy;

in this embodiment, in step S43, the deep text matching model trained in step S23 is used to perform fine-grained matching on the coarse-grained answer list obtained in the above step, so as to obtain answers and sort, and return the answer with the highest rank, which further includes the following steps not shown in the figure:

obtaining 10 answers with the highest confidence scores of the candidate answers obtained in the step S42;

inputting each answer and the question into a deep text matching model trained in S23 to obtain a confidence score of each answer;

and selecting the answer with the highest confidence score and returning the answer to the user.

In this embodiment, the answer selection based on both the educational knowledge graph and the text in step S5 further includes the following steps not shown in the figure, including:

sorting the knowledge-graph-based answers by score;

ranking the text-based answers by score;

if the highest scoring answer based on the knowledge-graph source exceeds a preset threshold, the answer is returned.

If the highest scoring answer based on the knowledge-graph source does not exceed the preset threshold, returning the highest scoring answer based on the text source.

The system is a mixed automatic question-answering system constructed on the basis of a basic education knowledge map and a large number of electronic texts. The basic education knowledge graph comprises 2200 million triples, 162 million instances, 1000 concepts and 4000 attributes. The knowledge source comprises a marking library and an external source library, wherein the marking library is obtained from marking knowledge points in teaching materials, and the external source library is extracted from encyclopedia and internet data. Basically covers all knowledge points of nine subjects in the middle and primary school. The electronic text mainly comprises 1300 books of basic education teaching materials of the current Chinese main basic education publisher and 10011 books of electronic out-of-class readings.

In the early preparation work, a large number of test questions are obtained from the existing teaching materials and auxiliary test paper through digitalization, and meanwhile, a large number of test questions are collected from the Internet. The question types mainly comprise blank filling questions, selection questions, reading and understanding questions, composition questions and the like, the questions cannot be directly analyzed by the KB-QA system, and the questions need to be sampled and extracted, and simultaneously subject modification is carried out to convert the questions into questions capable of being analyzed by the system. For example, "about ()" the world land-to-sea ratio is converted to "about how much about is the world land-to-sea ratio? ".

The details of the subject matters are shown in Table 3 after rule transformation.

Table 3 provides statistical information for the test case of the nine subjects in the basic education field according to the example of the present invention.

And (3) taking the answer accuracy as an evaluation index, recording answers given by the question-answering system when subject questions are input into the question-answering system for testing aiming at each subject question library, and designing test cases respectively aiming at each study. The disciplines comprise Chinese, mathematics, English, physics, chemistry, history, geography, biology and politics, and a total of 9020 test cases are designed, and the test results are shown in Table 4.

Table 4 provides the test results of the test cases of the nine subjects in the basic education field for the example of the present invention.

Test subject	Total number of use cases	Actual execution use case	Correct use case	Error case	Accuracy rate
						Chinese language	1007	1007	787	220	78.15％
Mathematics, and	926	926	862	64	93.09％
						english language	1033	1033	887	146	85.87％
Physics of physics	1000	1000	911	89	88.40％
						Chemistry	1001	1001	897	104	89.61％
History of	1040	1040	904	136	83.17％
						Geography	1017	1017	739	278	72.66％
Biological organisms	1000	1000	860	140	85.5％
						Politics	996	996	885	111	88.86％
Total up to	9020	9020	7732	1288	85.72％

Example (c):

in political discipline, is the question "concurrent meaning of enterprises? Because the knowledge graph comprises the entity 'enterprise combines' and the entity has the attribute 'meaning', the KB-QA method can be directly used to obtain an accurate answer 'manage the superior enterprises with good management and good economic benefit and combine the economic phenomena of the enterprises with relative disadvantages'. But for the national institution with the highest status of our country? Because the knowledge graph lacks entities and relations related to the knowledge graph, through the search and screening matching of IE-QA, the answer 'the world people representative is in the highest position in national institutions of China, and other central national institutions are generated by the knowledge graph, responsible for and supervised by the knowledge graph' can be obtained.

The present invention is not limited to the above embodiments, and any changes or substitutions that can be easily made by those skilled in the art within the technical scope of the present invention are also within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A mixed automatic question-answering method based on an educational knowledge graph and a text is characterized by comprising the following steps:

and 5: returning an answer based on the educational knowledge graph if the highest confidence of the answer exceeds a threshold; otherwise, returning the answer with the top ranking based on the text;

the corresponding confidence given to each answer specifically includes:

inquiring an education knowledge map to obtain an answer list;

the subject core is determined by the subject confidence, and the subject core is 20 × rate × subject confidence;

2. The automatic question-answering method according to claim 1, wherein the basic education ontology is constructed by a semi-automatic ontology construction method.

3. The automated question-answering method according to claim 1, wherein the information extraction is for augmenting instances, relationships and attributes of knowledge.

4. The automatic question-answering method according to claim 1, characterized in that said building a generic template of questions specifically comprises:

forming a general template aiming at the problems by combining regular expression grammar based on the relation or the attribute in the education knowledge graph as a keyword;

analyzing the problems in the large-scale education question-answer data set by using a syntactic analysis tool, extracting keywords, and forming a general template aiming at the problems by combining regular expression grammar;

a template is generated based on the general question words.

5. The automated question-answering method according to claim 1, wherein the full-text search engine is an extensible open-source full-text search and analysis engine elastic search.

6. The automatic question-answering method according to claim 1, characterized in that said assigning to each subject a respective confidence level specifically comprises:

7. The auto-quiz method according to claim 1, wherein the assigning each predicate has a corresponding confidence level that specifically comprises:

8. The automatic question answering method according to claim 1, wherein the part-of-speech tagging method specifically comprises:

9. The automatic question-answering method according to claim 1, wherein the coarse-grained matching specifically comprises:

no query is made for the third level keywords.