CN104794168B

CN104794168B - A kind of Knowledge Relation method and system

Info

Publication number: CN104794168B
Application number: CN201510145575.7A
Authority: CN
Inventors: 杨硕; 高飞; 冯岩松; 贾爱霞; 赵东岩; 卢作伟; 王冬
Original assignee: MAINBO EDUCATION TECHNOLOGY Co Ltd; Peking University
Current assignee: MAINBO EDUCATION TECHNOLOGY Co Ltd; Peking University
Priority date: 2015-03-30
Filing date: 2015-03-30
Publication date: 2018-06-05
Anticipated expiration: 2035-03-30
Also published as: CN104794168A

Abstract

The invention discloses a kind of Knowledge Relation method and system, belong to the Data Mining in Internet technology, including：Obtain the subject term of existing knowledge system construction to be added, the similarity between subject term in subject term to be added and the existing knowledge system construction in the field, it determines the position of subject term to be added in the architecture, improves existing knowledge system construction；The subject term in the language material of knowledge point to be extracted is obtained, and calculates the importance of each subject term；Finally according to the importance of subject term and its position in having knowledge system construction, the weight for having knowledge system construction interior joint position is calculated, the subject term at the node location of weight maximum is determined as to the knowledge point of the language material.This method and system are realized to having knowledge system construction constantly improve, can are that user matches maximally related knowledge point, to be user's recommendation and the relevant resource in the knowledge point, be improved the perception of user.

Description

A kind of Knowledge Relation method and system

Technical field

The present invention relates to the Data Minings in Internet technology, and in particular to a kind of knowledge based architecture is known Know point correlating method and system.

Background technology

With the continuous development of Internet technology, on-line education system is popularized.Online education is many learners More chances are provided, user can find oneself interested content.Such mode solves biography to a certain extent System education is excessively inflexible, and content is single, and educational resource distributes uneven, nonopen problem, and study is provided for more people Platform, for promote social education industry development play the role of it is positive.Educatee has such demand：For one section Learning materials, if it is possible to mark out knowledge point therein, according to these knowledge points, recommend scientific paper or one for educatee A little education resources, will further clear and definite learning objective, greatly improve their learning efficiency.For educatee's recommendation and language material It refers to the relevant academic resources in knowledge point, is undoubtedly one and significantly works.

And for now, most of the identification of knowledge point, online education service organization are allowed by the way of marking by hand Some teachers for having experience with students mark out the subject knowledge point in language material, and such method, mainly there is following defect：

(1) discipline of education classification is very abundant, and each subject education resource quantity is very huge, the mode meeting marked by hand Waste substantial amounts of human cost and time cost.

(2) mode marked by hand, there are certain omissions.Meanwhile facts proved that, even veteran old religion Teacher, the knowledge point uniformity of mark is poor, and arbitrariness is also big.

(3) for new subject and new knowledge, reaction speed is slower, does not possess scalability.

The present invention is exactly to be directed in existing online digital education to lack unified knowledge hierarchy and a large amount of manually marks of dependence Note carries out the problem of teaching material teaches auxiliary resource associations, studies the knowledge hierarchy structure based on existing digital teaching for extensive Teaching material teaches the knowledge point auto-associating technology of auxiliary resource, the final elementary education knowledge application frame for building oriented towards education informationization Frame, and correlated knowledge point is recommended for user according to language material content.

The content of the invention

For defect in the prior art, it is an object of the invention to provide a kind of Knowledge Relation method and it is System, by this method and system, can fast and accurately match the knowledge point in the language material that user is read.

To achieve the above object, the technical solution adopted by the present invention is as follows：

A kind of Knowledge Relation method, comprises the following steps：

(1) the subject term that field to be added in a certain field has knowledge system construction is obtained, according to admission to be added The similarity between subject term in section's term and the existing knowledge system construction in the field, determines subject term to be added Node location in having knowledge system construction improves existing knowledge system construction；

(2) the subject term in the language material of knowledge point to be extracted is obtained, and calculates the weight of each subject term got It spends；

(3) node position of the subject term in the existing knowledge system construction of the language material fields in language material is searched It puts, according to the importance of subject term in language material and its node location in having knowledge system construction, calculates and have knowledge The weight of each node location in architecture, is determined as knowledge to be extracted by the subject term at the node location of weight maximum The knowledge point of the language material of point.

Further, a kind of Knowledge Relation method as described above, it is described to determine subject term to be added in step (1) Have knowledge system construction in node location refer in the existing knowledge system construction of the subject term fields, really The position of the father node of fixed subject term to be added, if subject term to be added with it is a certain in the existing knowledge system construction Similarity between subject term is more than given threshold, it is determined that a certain subject term saves for the father of subject term to be added Point.

Further, a kind of Knowledge Relation method as described above, in step (1), the similarity bag between subject term Include characterization similarity and the semantic similarity between subject term；The characterization similarity refers to the phase in the composition of subject term Like degree；

Similarity in subject term x and the existing knowledge system construction in its field between subject term y The calculation formula of fatherProb (x, y) is：

FatherProb (x, y)=α_y×editSimi(x,y)+β_y×semanticFatherSimi(x,y)

Wherein, characterization similarities of the editSimi (x, y) between subject term x and subject term y, Average language between the subject term of all child nodes of semanticFatherSimi (x, y) subject term x and subject term y Adopted similarity, α_yTo characterize the weight of similarity, β_yFor the weight of average semantic similarity；

According to the characterization similarity between the editing distance computing discipline term between two subject terms, subject term x Characterization similarity editSimi (x, y) calculation formula between subject term y is：

Wherein, editDistance (x, y) represents the editing distance between subject term x and subject term y, length (y) the word length of subject term y is represented；

Average semantic similarity between the subject term of all child nodes of subject term x and subject term y The calculation formula of semanticFatherSimi (x, y) is：

SemanticSimi (x, z)=vector (x) vector (z)

Wherein, the set that all child nodes that sonSet (y) is subject term y are formed, | sonSet (y) | it is subject term The number of all child nodes of y, semantic similarities of the semanticSimi (x, z) between subject term x and subject term z, Vector (x) be subject term x term vector, vector (z) be subject term z term vector, vector (x) vector (z) dot product between term vector.

Further, a kind of Knowledge Relation method as described above, characterizes the weight of similarity and average semantic similarity Weight calculation formula be：

β_y=1- α_y；

Wherein, b be characterization similarity weight design factor, b ＞ 0.

Further, a kind of Knowledge Relation method as described above in step (3), has knowledge system construction interior joint i The calculation formula of weight Value (i) be：

Value (i)=I_i+αI_i1+α²I_i2+…+α^kI_ik

Wherein, I_iFor the importance of corresponding subject term at node i position, I_ikAt the k grade child nodes position of node i The importance of corresponding subject term, α are importance coefficient.

Further, a kind of Knowledge Relation method as described above in step (3), is calculated and had in knowledge system construction During the weight of each node location, the multistage of the node location and these nodes in the language material corresponding to scientific terminology is calculated The weight of father node position.

To achieve the above object, a kind of Knowledge Relation system is additionally provided in the embodiment of the present invention, including：

Subject term registration module, for obtaining the subject that field to be added in a certain field has knowledge system construction Term, the similarity between subject term in subject term to be added and the existing knowledge system construction in the field, It determines node location of the subject term to be added in having knowledge system construction, improves existing knowledge system construction；

Subject term importance computing module, for obtaining the subject term in the language material of knowledge point to be extracted, and calculates The importance of each subject term got；

Knowledge point determining module, for searching existing knowledge body of the subject term in language material in the language material fields Node location in architecture, according to the importance of subject term in language material and its node position in having knowledge system construction It puts, the weight for having each node location in knowledge system construction is calculated, by the subject term at the node location of weight maximum It is determined as the knowledge point of the language material of knowledge point to be extracted.

Further, a kind of Knowledge Relation system as described above, in subject term registration module, it is described determine it is to be added Node location of the subject term in having knowledge system construction refers to the existing knowledge hierarchy in the subject term fields In structure, the position of the father node of subject term to be added is determined, if subject term to be added and the existing knowledge hierarchy Similarity in structure between a certain subject term is more than given threshold, it is determined that a certain subject term is subject to be added The father node of term.

Further, a kind of Knowledge Relation system as described above, the similarity between subject term include subject term Between characterization similarity and semantic similarity；The characterization similarity refers to the similarity in the composition of subject term；

The subject term registration module includes characterization similarity calculated, Semantic Similarity Measurement unit and similarity Computing unit；Wherein,

Similarity calculated is characterized, in computing discipline term x and the existing knowledge system construction in its field Characterization similarity editSimi (x, y) between subject term y, characterization similarity calculated is according between two subject terms Editing distance computing discipline term between characterization similarity, the characterization similarity between subject term x and subject term y EditSimi (x, y) calculation formula is：

Semantic Similarity Measurement unit, in computing discipline term x and the existing knowledge system construction in its field Average semantic similarity semanticFatherSimi (x, y) between the subject term of all child nodes of subject term y, meter Calculating formula is：

SemanticSimi (x, z)=vector (x) vector (z)

Wherein, the set that all child nodes that sonSet (y) is subject term y are formed, | sonSet (y) | it is subject term The number of all child nodes of y, semantic similarities of the semanticSimi (x, z) between subject term x and subject term z, Vector (x) be subject term x term vector, vector (z) be subject term z term vector, vector (x) vector (z) dot product between term vector；

Similarity calculated, for subject in computing discipline term x and the existing knowledge system construction in its field Similarity fatherProb (x, y) between term y, calculation formula are：

FatherProb (x, y)=α_y×editSimi(x,y)+β_y×semanticFatherSimi(x,y)

Wherein, α_yTo characterize the weight of similarity, β_yFor the weight of average semantic similarity.

Further, a kind of Knowledge Relation system as described above, characterizes the weight of similarity and average semantic similarity The calculation formula of weight be：

β_y=1- α_y；

Wherein, b be characterization similarity weight design factor, b ＞ 0.

Further, a kind of Knowledge Relation system as described above in the determining module of knowledge point, has knowledge system construction The calculation formula of the weight Value (i) of interior joint i is：

Value (i)=I_i+αI_i1+α²I_i2+…+α^kI_ik

Further, a kind of Knowledge Relation system as described above in the determining module of knowledge point, calculates and has knowledge hierarchy In structure during the weight of each node location, node location and these nodes in the language material corresponding to scientific terminology are calculated Multistage father node position weight.

The beneficial effects of the present invention are：Method and system of the present invention utilize a certain subject term and the term The similarity between subject term in the existing knowledge system construction of fields finds out the subject term in knowledge body tying Most likely location in structure so as to which the subject term be added in existing knowledge system construction, is realized to having knowledge body Architecture constantly improve.For the language material that user is reading, the subject term in language material is mapped to existing knowledge hierarchy In structure, maximally related knowledge point is matched for user, to be user's recommendation and the relevant resource in the knowledge point, is improved The perception of user.The method and system of the present invention solve the existing method for building knowledge system construction using pure manual mode Limitation.

Description of the drawings

Fig. 1 is a kind of flow chart of Knowledge Relation method in the specific embodiment of the invention；

Fig. 2 is a kind of structure diagram of Knowledge Relation system in the specific embodiment of the invention；

Fig. 3 is the schematic diagram for having in the embodiment of the present invention knowledge system construction；

Fig. 4 is the schematic diagram after further being improved to the existing knowledge system construction in Fig. 3.

Specific embodiment

With reference to Figure of description, the present invention is described in further detail with specific embodiment.

Fig. 1 shows a kind of flow chart of Knowledge Relation method in the embodiment of the present invention, as can be seen from Figure the party Method may comprise steps of：

Step S100：Node location of the subject term to be added in having knowledge system construction is determined, by admission to be added Section's term is added in existing knowledge system construction, improves existing knowledge system construction；

The subject term that field to be added in a certain field has knowledge system construction is obtained, according to subject art to be added The similarity between subject term in language and the existing knowledge system construction in the field determines subject term to be added There is the node location in knowledge system construction, improve existing knowledge system construction.In present embodiment, the existing of field is known Know architecture and refer to structure of knowledge tree being made of the subject term in the field, reflecting hierarchical relationship between subject term, A node in structure of knowledge tree corresponds to a subject term.

It should be noted that the subject term in present embodiment include but not limited to it is special in each door ambit Door term can also include keyword, keyword etc. in the knowledge point of each ambit, that is to say, that subject term it is specific Qualifications can be set by user according to application demand.The source of the subject term to be added can be direct labor The subject term that is marked in language material or a large amount of related language materials of corresponding ambit are segmented by segmenter It obtains.

The node location that subject term to be added is determined in having knowledge system construction refers in the subject term In the existing knowledge system construction of fields, the position of the father node of subject term to be added is determined, if subject to be added Similarity in term and the existing knowledge system construction between a certain subject term is more than given threshold, it is determined that it is described certain One subject term is the father node of subject term to be added, that is to say, that the node location of the subject term to be added is institute State the child node of a certain subject term.

In present embodiment, the similarity between subject term includes the characterization similarity and semantic phase between subject term Like degree；The characterization similarity refers to the similarity in the composition of subject term, that is, the phase that subject term surface is formal Like degree.

In present embodiment, in the existing knowledge system construction in subject term x and its field between subject term y The calculation formula of similarity fatherProb (x, y) is：

FatherProb (x, y)=α_y×editSimi(x,y)+β_y×semanticFatherSimi(x,y)

Wherein, editDistance (x, y) represents the editing distance between subject term x and subject term y, length (y) the word length of subject term y is represented.

SemanticSimi (x, z)=vector (x) vector (z)

In present embodiment, two subject arts are weighed by the cosine similarity between the term vector of two subject terms Semantic similarity between language, i.e. semanticSimi (x, z)=vector (x) vector (z).Therefore, in computing discipline Before semantic similarity between term, first have to use word2vec models (word steering volume model) by each subject term A term vector is converted into, vector dimension can be chosen as needed, such as can be 100.It is existing that word is converted into term vector There is technology, Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey in 2013 can be selected in present embodiment In the paper Efficient Estimation of Word Representations in Vector Space that Dean is delivered Establish conversion of the word2vector model realizations word to term vector.

In having knowledge system construction, same category of subject term probably appears in the same of structure of knowledge tree In the node of level, therefore, in present embodiment, judge subject term x have it is much may be subject term y child node When, the language that both can be weighed using the average semantic similarity semanticFatherSimi (x, y) of the child node of x and y Adopted similarity, which more fully consider subject art to be added compared with individually calculating the semantic similarity between x and y Relevance in language and knowledge structure system between same category of subject term, accuracy rate higher.

In addition, in present embodiment, α_yAnd β_yIt is the corresponding weight of two parts similarity, meets α_y+β_y=1.Due to first Phase be only used only a small amount of knowledge system construction information (in the starting stage, have subject term in knowledge system construction it is less, Structure is simpler, and data volume is less), there are problems that Deta sparseness.Consider when the less situation of the number of child nodes of y nodes Under, more considering the similarity (characterization similarity) of x and y nodes characterization level, the number of child nodes with y nodes increases, The weight of semantic similarity improves, so α_yAnd β_yCalculation formula it is as follows：

β_y=1- α_y；

Wherein, b is the design factor of characterization similarity weight, and the value of b ＞ 0, b are bigger, represents to consider the power of editing distance Weight is smaller (weight for characterizing similarity is smaller), and vice versa, in practical applications, according to the important Sexual behavior mode of editing distance The specific value of b.

The method changed automatically using this parameter not only solves the sparse sex chromosome mosaicism of initial stage knowledge system construction, together When also have better adaptability.

Similarity between subject term y in subject term x and the existing knowledge system construction in its field When fatherProb (x, y) is more than given threshold, then child nodes of the subject term x to be added for subject term y, i.e. subject are judged Node locations of the term x in having knowledge system construction is located under subject term y, and subject term x has been added to this Have in knowledge system construction, perfect this has knowledge system construction.If subject term x and the existing knowledge in its field The similarity between all subject terms in architecture is respectively less than or during equal to given threshold, then illustrates subject term x It is too low with the existing knowledge system construction degree of association, it is impossible to add it in existing knowledge system construction.

By the aforesaid way provided in present embodiment, using term vector and editing distance model carry out subject term it Between similarity judge, find the most possible father of present node, and present node be added among knowledge tree, Zhi Daoliu Under subject term it is uncorrelated to current area until, constantly improve enriches existing structure of knowledge tree.

Step S200：The subject term in the language material of knowledge point to be extracted is obtained, calculates each subject term got Importance；

For the language material that one section of user is reading, the subject term occurred among language material, and computing discipline art are found The significance level of language, the measurement of subject term importance consider the frequency occurred in the language material read in user of term and inverse respectively Text frequency.For the language material that user is reading, the mode of string matching can be used to find occur among language material Section's term finds language material compared with the subject term in the existing subject terminological data bank of the language material fields Subject term in language material, if the collection of the subject term occurred in the language material that user is reading is combined into K={ k₁,k₂, k₃,…,k_m, m is the number of subject term, in present embodiment, uses TF-IDF (term frequency-inverse Document frequency) formula weigh computing discipline term significance level, specifically, a certain subject term k_iIt is important It is as follows to spend calculation formula：

Wherein, TF (k_i) represent k_iThe frequency occurred in language material, IDF (k_i) represent k_iThe reverse text occurred in language material Part frequency, document_jRepresent the jth piece document in system knowledge base (corpus), N represents total number of documents, and corpus is represented The language material that user is currently reading, count (k_i, corpus) and represent k_iThe number occurred in language material,Represent k_iThe total degree occurred in the N piece documents of system knowledge base, | { j:k_i∈ document_jRepresent to include word k in system knowledge base_iNumber of documents.

Step S300：According to subject term in the importance of subject term in language material and language material in existing knowledge system construction In node location, calculate the weight for having each node location in knowledge system construction, determined according to weight in language material Knowledge point.

After finding out the subject term described in step S200 in language material, determine the subject term found in institute's predicate Expect the node location in the existing knowledge system construction of fields, had according to the importance of subject term in language material and its Node location in knowledge system construction calculates the weight for having each node location in knowledge system construction, and weight is maximum Node location at subject term be determined as knowledge point to be extracted language material knowledge point.The weight reflects section's science Significance level in the language material that language is read in user.

In present embodiment, the calculation formula for having the weight Value (i) of knowledge system construction interior joint i is：

Value (i)=I_i+αI_i1+α²I_i2+…+α^kI_ik

In present embodiment, the weight of calculate node by the way of the upward stipulations of node existing is known actually calculating When knowing the weight of architecture interior joint, and the weight of all nodes in counting system structure is not required, it is only necessary to calculate language material In subject term where node location and these node locations multistage father node weight, present node i's The increase of weight is equal to the importance I of the corresponding subject term of the node_i；The upward recurrence since present node, the node i The weight of all father nodes increases as α^kI_i, k represents the distance between its certain level-one father node of present node i (level is poor).

By the above method provided in present embodiment, realize to simply having the continuous expansion of knowledge system construction Greatly with perfect, and the knowledge point for the language material that user is read can be fast and accurately matched, it is more so as to provide to the user Related data.

Corresponding to the method shown in Fig. 1, a kind of Knowledge Relation system, such as Fig. 2 are additionally provided in the embodiment of the present invention Shown, which can determine including subject term registration module 100, subject term importance computing module 200 and knowledge point Module 300, wherein：

Subject term registration module 100 has knowledge system construction for obtaining field to be added in a certain field Subject term, according to similar between subject term to be added and the subject term in the existing knowledge system construction in the field Degree determines node location of the subject term to be added in having knowledge system construction, improves existing knowledge system construction.Its In, the existing knowledge system construction in a field refers to interbed be made of the subject term in the field, reflection subject term The structure of knowledge tree of grade relation, a node one subject term of correspondence in structure of knowledge tree.It is described to determine subject to be added Node location of the term in having knowledge system construction refers to the existing knowledge system construction in the subject term fields In, the position of the father node of subject term to be added is determined, if subject term to be added and the existing knowledge system construction In similarity between a certain subject term be more than given threshold, it is determined that a certain subject term is subject term to be added Father node.

In the present embodiment, the characterization similarity that the similarity between subject term is included between subject term is similar with semanteme Degree；The characterization similarity refers to the similarity in the composition of subject term；

The subject term registration module 100 includes characterization similarity calculated, Semantic Similarity Measurement unit and phase Like degree computing unit；Wherein,

Wherein, editDistance (x, y) represents the editing distance between subject term x and subject term y, length (y) the word length (being the number of letter for English for the number that Chinese is word) of subject term y is represented；

SemanticSimi (x, z)=vector (x) vector (z)

FatherProb (x, y)=α_y×editSimi(x,y)+β_y×semanticFatherSimi(x,y)

Wherein, α_yTo characterize the weight of similarity, β_yFor the weight of average semantic similarity, characterize similarity weight and The calculation formula of the weight of average semantic similarity is：

β_y=1- α_y；

Wherein, b be characterization similarity weight design factor, b ＞ 0,.

Subject term importance computing module 200, for obtaining the subject term in the language material of knowledge point to be extracted, and is counted The importance of each subject term got.

Knowledge point determining module 300 is known for searching the subject term in language material in the existing of language material fields The node location in architecture is known, according to the importance of subject term in language material and its section in having knowledge system construction Point position calculates the weight for having each node location in knowledge system construction, by the subject at the node location of weight maximum Term is determined as the knowledge point of the language material of knowledge point to be extracted.In the present embodiment, has the weight of knowledge system construction interior joint i The calculation formula of Value (i) is：

Value (i)=I_i+αI_i1+α²I_i2+…+α^kI_ik

The method and system of the present invention are further detailed with reference to specific embodiment.

Embodiment

In the present embodiment, using junior school student's English Grammar knowledge as a specific ambit, it is therefore an objective to improve junior middle school The relevant existing knowledge system construction of English Grammar knowledge, and the language material for the association area read according to user, match Knowledge point in language material.

Fig. 3 shows an existing knowledge system construction of junior English grammer in the present embodiment, as seen from the figure, institute Even if state knowledge point (the subject term in the present embodiment) and knowledge point that existing knowledge system construction reflects the ambit Between hierarchical relationship structure of knowledge tree, if subject clause, predicative clause and appositive clause are the same level-one in the architecture, Noun clause is the level-one father node of three, and syntactic knowledge is two level father node.

In the present embodiment, the existing knowledge system construction being made of a small amount of scientific terminology shown in Fig. 3 is improved first, is had Body process is as follows：

Get the subject term of the existing knowledge system construction to be added：The acquisition of scientific terminology to be added can be from It is extracted in some language materials in junior English grammer field or straight in existing junior school student's English Grammar scientific terminology set It obtains and takes.In the present embodiment, in provided from certain the online education company and relevant language material of junior school student's English Grammar, some are utilized Simple setting rule removes the example sentence and exercise in language material, reservation as much as possible and the highly relevant content of the knowledge of grammar, The scientific terminology in these contents is obtained, obtained scientific terminology includes：" object clause ", " adverbial clause ", " attributive clause ", " adverbial modifier ", " compound sentence ", " predicate verb " etc..

After getting above-mentioned subject term to be added, according to subject term to be added and the existing knowledge body in the field The similarity between subject term in architecture determines node position of the subject term to be added in having knowledge system construction It puts, improves existing knowledge system construction：In the present embodiment, by taking scientific terminology to be added " object clause " as an example, " object clause " For a scientific terminology in the above-mentioned scientific terminology set to be added got, it is not in existing knowledge system construction In, therefore, whether can be more than by judging the similarity of " object clause " and the scientific terminology in existing knowledge system construction Given threshold, to determine whether to add it in existing knowledge system construction and its position in having knowledge system construction It puts.In the present embodiment, to calculate " object clause " and the mistake of the similarity of " noun clause " in existing knowledge system construction Illustrated exemplified by journey, that is, judge " object clause " whether be " noun clause " child node, " object clause " be above In x, " noun clause " be y.

As seen from Figure 3, the child node for having " noun clause " in knowledge system construction includes three：Subject from Sentence, predicative clause and appositive clause, i.e.,

SonSet (y)={ object clause, predicative clause and appositive clause }, | sonSet (y) |=3,

In the present embodiment, the value of b is 3, then：

" object clause " and the calculating of the characterization similarity of " noun clause "：

Average semantic similarity between the subject term of three child nodes of " object clause " and " noun clause " The calculating of semanticFatherSimi (x, y)：First, calculate " object clause " and " subject clause ", " predicative clause ", " together The semantic similarity of position language subordinate clause ", by word2vec models obtain " object clause ", " subject clause ", " predicative clause ", The term vector of " appositive clause " is then calculated by semanticSimi (x, z)=vector (x) vector (z) " object clause " and " subject clause ", " predicative clause ", the semantic similarity of " appositive clause " in the present embodiment, calculate knot Fruit is as shown in the table：

Subject term	Semantic similarity
		Subject clause	0.78
Predicative clause	0.84
		Appositive clause	0.58

By the above results, semanticFatherSimi (x, y) is calculated,

Comprehensive consideration two-part above, it may be that the son that " noun clause " is saves to judge that " object clause " has much Point is weighed with fatherProb (x, y)：

In the present embodiment, given threshold δ=0.5, therefore, the similarity of " object clause " and " noun clause " FatherProb (x, y) is more than given threshold, it is believed that " object clause " is one and the relevant subject term of current area, And by the child node of " object clause " as " noun clause ", it is added on existing knowledge system construction, having after addition Knowledge system construction is as shown in Figure 4.

Likewise, it can judge whether other scientific terminologies to be added can be added to existing knowledge body by upper type In architecture and its position on having knowledge system construction is realized to having the expansion of knowledge system construction and perfect.

For one section and the relevant language material of junior English grammer that user is reading, using existing knowledge hierarchy after improving Immediate knowledge point in the structure extraction language material, to provide other data with the Knowledge Relation to the user.This reality It applies in example, one section of language material that user is reading is：" article is function word, itself cannot be used alone, and also without the meaning of a word, it is used Help indicates the meaning of noun before noun.Article is divided into two kinds of indefinite article a (an) and definite article "the".”

For above-mentioned language material, first by the way of string matching, all scientific terminologies therein (institute included is found out Have knowledge point), and calculate the importance of each scientific terminology.Scientific terminology in the above-mentioned language material found in the present embodiment For：Article, noun, indefinite article, definite article, a (an), the；Using the document of already existing knowledge point explanation in system, The importance that each subject term is calculated is respectively 0.44,0.32,0.58,0.52,0.78,0.12, these sections are academic Language is mapped in existing knowledge system construction, that is, determines node location of the subject term in having knowledge system construction, such as Shown in the node location of picture oblique line portion in Fig. 3 and Fig. 4.Afterwards, by the way of the upward stipulations in knowledge point, calculating is existing to be known Know the final weight of the node in architecture, in the present embodiment, calculated to optimize, it is only necessary to calculate the section searched in language material The weight of the father node of the corresponding node of technics and these nodes is importance factor alpha value 0.35 in the present embodiment, tool Body calculates as follows：

Value (knowledge of grammar)=α²I (noun)+α²I (article)+α³I (definite article)+α³I (indefinite article)

+α⁴I(a(an))+α⁴I (the)=0.1538

Value (part of speech)=α I (noun)+α I (article)+α²I (definite article)+α²I (indefinite article)

+α³I(a(an))+α³I (the)=0.439

Value (noun)=I (noun)=0.32

Value (article)=I (article)+α I (definite article)+α I (indefinite article)

+α²I(a(an))+α²I(a(an))+α²I (the)=0.93525

Value (definite article)=I (definite article)+α I (a (an))=0.793

Value (indefinite article)=I (indefinite article)+α I (the)=0.622

Value (a (an))=I (a (an))=0.78

Value (the)=I (the)=0.12

It can be obtained by above-mentioned weight calculation result, the highest scientific terminology of weight is " article ", so speculating user just " article " this knowledge point is most possibly referred in this section of language material of reading.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art God and scope.In this way, if these modifications and changes of the present invention belongs to the scope of the claims in the present invention and its equivalent technology Within, then the present invention is also intended to including including these modification and variations, for example, may be employed in inventive concept between scientific terminology The computational methods of similarity carry out the classification of word, by directly calculating the characterization similarity between two words and semantic phase Like degree, obtain two words by similarity, so as to judge whether two words can be classified as a classification.

Claims

1. a kind of Knowledge Relation method, comprises the following steps：

(1) the subject term that field to be added in a certain field has knowledge system construction is obtained, according to subject art to be added The similarity between subject term in language and the existing knowledge system construction in the field determines subject term to be added There is the node location in knowledge system construction, improve existing knowledge system construction；

(2) the subject term in the language material of knowledge point to be extracted is obtained, and calculates the importance of each subject term got；

(3) node location of the subject term in the existing knowledge system construction of the language material fields in language material is searched, According to the importance of subject term in language material and its node location in having knowledge system construction, calculate and have knowledge hierarchy The weight of each node location in structure, is determined as knowledge point to be extracted by the subject term at the node location of weight maximum The knowledge point of language material.

2. a kind of Knowledge Relation method according to claim 1, which is characterized in that described definite to be added in step (1) Enter node location of the subject term in having knowledge system construction and refer to existing knowledge body in the subject term fields In architecture, the position of the father node of subject term to be added is determined, if subject term to be added and the existing knowledge body Similarity in architecture between a certain subject term is more than given threshold, it is determined that a certain subject term is admission to be added The father node of section's term.

3. a kind of Knowledge Relation method according to claim 2, which is characterized in that in step (1), between subject term Similarity include subject term between characterization similarity and semantic similarity；The characterization similarity refers to subject term Similarity in composition；

Similarity fatherProb in subject term x and the existing knowledge system construction in its field between subject term y The calculation formula of (x, y) is：

FatherProb (x, y)=α_y×editSimi(x,y)+β_y×semanticFatherSimi(x,y)

α_y+β_y=1

According to the characterization similarity between the editing distance computing discipline term between two subject terms, subject term x is with learning Characterization similarity editSimi (x, y) calculation formula between section term y is：

Wherein, editDistance (x, y) represents the editing distance between subject term x and subject term y, length (y) tables The word length of dendrography section term y；

<mrow> <mi>s</mi> <mi>e</mi> <mi>m</mi> <mi>a</mi> <mi>n</mi> <mi>t</mi> <mi>i</mi> <mi>c</mi> <mi>F</mi> <mi>a</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mi>i</mi> <mrow> <mo>(</mo> <mrow> <mi>x</mi> <mo>,</mo> <mi>y</mi> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>|</mo> <mi>s</mi> <mi>o</mi> <mi>n</mi> <mi>S</mi> <mi>e</mi> <mi>t</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>|</mo> </mrow> </mfrac> <munder> <mi>&Sigma;</mi> <mrow> <mi>z</mi> <mo>&Element;</mo> <mi>s</mi> <mi>o</mi> <mi>n</mi> <mi>S</mi> <mi>e</mi> <mi>t</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>)</mo> </mrow> </mrow> </munder> <mi>s</mi> <mi>e</mi> <mi>m</mi> <mi>a</mi> <mi>n</mi> <mi>t</mi> <mi>i</mi> <mi>c</mi> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mi>i</mi> <mrow> <mo>(</mo> <mrow> <mi>x</mi> <mo>,</mo> <mi>z</mi> </mrow> <mo>)</mo> </mrow> </mrow>

SemanticSimi (x, z)=vector (x) vector (z)

Wherein, the set that all child nodes that sonSet (y) is subject term y are formed, | sonSet (y) | for subject term y's The number of all child nodes, semantic similarities of the semanticSimi (x, z) between subject term x and subject term z, Vector (x) be subject term x term vector, vector (z) be subject term z term vector, vector (x) vector (z) dot product between term vector.

4. a kind of Knowledge Relation method according to claim 3, which is characterized in that characterize the weight of similarity and be averaged The calculation formula of the weight of semantic similarity is：

<mrow> <msub> <mi>&alpha;</mi> <mi>y</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>|</mo> <mi>s</mi> <mi>o</mi> <mi>n</mi> <mi>S</mi> <mi>e</mi> <mi>t</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>+</mo> <mi>b</mi> </mrow> </mfrac> </mrow>

β_y=1- α_y；

Wherein, b be characterization similarity weight design factor, b ＞ 0.

5. a kind of Knowledge Relation method according to claim 1, which is characterized in that in step (3), have knowledge hierarchy The calculation formula of the weight Value (i) of structure interior joint i is：

Value (i)=I_i+αI_i1+α²I_i2+…+α^kI_ik

Wherein, I_iFor the importance of corresponding subject term at node i position, I_ikTo be corresponded at the k grade child nodes position of node i Subject term importance, α be importance coefficient.

6. a kind of Knowledge Relation method according to claim 1 or 5, which is characterized in that in step (3), calculate and have In knowledge system construction during the weight of each node location, calculate node location in the language material corresponding to subject term and The weight of the multistage father node position of these nodes.

7. a kind of Knowledge Relation system, including：

Subject term registration module, for obtaining the subject art that field to be added in a certain field has knowledge system construction Language, the similarity between subject term in subject term to be added and the existing knowledge system construction in the field, really Node location of the fixed subject term to be added in having knowledge system construction, improves existing knowledge system construction；

Subject term importance computing module, for obtaining the subject term in the language material of knowledge point to be extracted, and calculates acquisition The importance of each subject term arrived；

Knowledge point determining module, for searching existing knowledge body tying of the subject term in language material in the language material fields Node location in structure, according to the importance of subject term in language material and its have knowledge system construction in node location, The weight for having each node location in knowledge system construction is calculated, the subject term at the node location of weight maximum is determined For the knowledge point of the language material of knowledge point to be extracted.

8. a kind of Knowledge Relation system according to claim 7, which is characterized in that in subject term registration module, institute Node location of the definite subject term to be added in having knowledge system construction is stated to refer in the subject term fields Have in knowledge system construction, determine the position of the father node of subject term to be added, if subject term to be added with it is described Have the similarity in knowledge system construction between a certain subject term and be more than given threshold, it is determined that a certain subject term For the father node of subject term to be added.

A kind of 9. Knowledge Relation system according to claim 8, which is characterized in that the similarity bag between subject term Include characterization similarity and the semantic similarity between subject term；The characterization similarity refers to the phase in the composition of subject term Like degree；

The subject term registration module includes characterization similarity calculated, Semantic Similarity Measurement unit and similarity calculation Unit；Wherein,

Similarity calculated is characterized, for subject in computing discipline term x and the existing knowledge system construction in its field Characterization similarity editSimi (x, y) between term y, characterization similarity calculated is according to the volume between two subject terms Collect the characterization similarity between computing discipline term, the characterization similarity between subject term x and subject term y EditSimi (x, y) calculation formula is：

Semantic Similarity Measurement unit, for subject in computing discipline term x and the existing knowledge system construction in its field Average semantic similarity semanticFatherSimi (x, y) between the subject term of all child nodes of term y calculates public Formula is：

SemanticSimi (x, z)=vector (x) vector (z)

Wherein, the set that all child nodes that sonSet (y) is subject term y are formed, | sonSet (y) | for subject term y's The number of all child nodes, semantic similarities of the semanticSimi (x, z) between subject term x and subject term z, Vector (x) be subject term x term vector, vector (z) be subject term z term vector, vector (x) vector (z) dot product between term vector；

Similarity calculated, for subject term y in computing discipline term x and the existing knowledge system construction in its field Between similarity fatherProb (x, y), calculation formula is：

FatherProb (x, y)=α_y×editSimi(x,y)+β_y×semanticFatherSimi(x,y)

10. a kind of Knowledge Relation system according to claim 9, which is characterized in that characterize the weight peace of similarity The calculation formula of the weight of equal semantic similarity is：

β_y=1- α_y；

Wherein, b be characterization similarity weight design factor, b ＞ 0.

11. a kind of Knowledge Relation system according to claim 7, which is characterized in that in the determining module of knowledge point, have The calculation formula of the weight Value (i) of knowledge system construction interior joint i is：

Value (i)=I_i+αI_i1+α²I_i2+…+α^kI_ik

12. a kind of Knowledge Relation system according to claim 7 or 11, which is characterized in that in the determining module of knowledge point, When calculating the weight of each node location in existing knowledge system construction, the node corresponding to subject term in the language material is calculated Position and the weight of the multistage father node position of these nodes.