CN106874380A - The method and apparatus of knowledge base triple inspection - Google Patents

The method and apparatus of knowledge base triple inspection Download PDF

Info

Publication number
CN106874380A
CN106874380A CN201710011368.1A CN201710011368A CN106874380A CN 106874380 A CN106874380 A CN 106874380A CN 201710011368 A CN201710011368 A CN 201710011368A CN 106874380 A CN106874380 A CN 106874380A
Authority
CN
China
Prior art keywords
triple
saturation
probability distribution
probability
extension
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710011368.1A
Other languages
Chinese (zh)
Other versions
CN106874380B (en
Inventor
赵伟华
张日崇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201710011368.1A priority Critical patent/CN106874380B/en
Publication of CN106874380A publication Critical patent/CN106874380A/en
Application granted granted Critical
Publication of CN106874380B publication Critical patent/CN106874380B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Abstract

The present invention provides a kind of method and apparatus of knowledge base triple inspection, by obtaining the corresponding rule of extension triple, the regular corresponding saturation is determined according to initial saturation and EM algorithms, and determine whether extension triple is credible according to the saturation, and then may determine whether to be put into the extension triple in knowledge base, knowledge base is expanded, the accuracy of knowledge base expansion is improve.

Description

The method and apparatus of knowledge base triple inspection
Technical field
The present invention relates to knowledge base extended technology, more particularly to the method and apparatus that a kind of knowledge base triple is checked.
Background technology
Knowledge base is a kind of database of the form structureization ground stored knowledge with triple, for a certain field or Mass knowledge is structurally stored in certain industry.For example, historical knowledge base can store the sea in history field Amount knowledge, including each historical personage, historical events etc..Knowledge base with example be main description object, using object-oriented Method represents knowledge, and an example is the reference of specific to one in reality or abstract affairs.For example, example can represent one Personage, it is also possible to represent city, a something etc..
One knowledge base generally includes multiple examples, and the relation between multiple attributes and each example of example is used The structure storage of triple.Triple is for representing the foundation structure of knowledge in knowledge base.Its structure can be expressed as<The One sentence, relational statement, the second sentence>, relational statement is for representing the pass between first sentence and second sentence System.
It refers in the case where former knowledge base is incomplete, by the method for data mining, using known that knowledge base expands Represent that the triple of knowledge predicts unknown triple, to extend new triple in former knowledge base so that knowledge base is more It is complete.Therefore, whether the new credible technical problem as urgent need to resolve of triple is checked.
The content of the invention
The present invention provides a kind of method and apparatus of knowledge base triple inspection, with the ternary for solving to extend in the prior art Organize the defect such as insincere.
One side of the invention provides a kind of method of knowledge base triple inspection, including:
Obtain the extension corresponding rule of triple, the extension triple be based on the former triple in existing knowledge storehouse and The rule is extended the triple that obtains of operation, and the extension triple is included at least by the first sentence, relational statement, the The ordered set of two sentences composition, the relational statement is used to represent the pass between first sentence and second sentence System;
Determine the regular corresponding saturation, the saturation is used to represent that whether correct the rule is general Rate, the saturation is obtained according to initial saturation and EM algorithms;
Determine whether the extension triple is credible according to the saturation.
It is alternatively, described that whether the extension triple is determined according to the saturation according to method as described above It is credible including:
Corresponding first probability distribution of the extension triple and second are determined according to belief propagation and the saturation Probability distribution, first probability distribution is used to represent that the extension triple should believable probability, second probability point Cloth is used to represent the extension incredible probability of triple, and the first probability distribution described in second probability distribution=1-;
Determine whether the extension triple is credible according to destination probability distribution and predetermined threshold value, the destination probability distribution It is first probability distribution or second probability distribution.
It is alternatively, described according to destination probability distribution and the predetermined threshold value determine according to method as described above Extension triple it is whether credible including:
If the predetermined threshold value is believable threshold value, the destination probability is distributed as the first probability distribution, if the mesh Mark probability distribution is more than or equal to the predetermined threshold value, it is determined that the extension triple is credible;If the destination probability distribution Less than the predetermined threshold value, it is determined that the extension triple is insincere;
If the predetermined threshold value is incredible threshold value, the destination probability is distributed as the second probability distribution, if described Destination probability distribution is more than or equal to the predetermined threshold value, it is determined that the extension triple is insincere;If the destination probability Distribution is less than the predetermined threshold value, it is determined that the extension triple is credible.
It is alternatively, described to determine that the regular corresponding saturation includes according to method as described above:
Determined to be iterated the saturation f (t+1) after operation by the EM algorithms according to equation below:
F (t+1)=f (t) * [f ' (t)/p (t)];
Wherein, f (t) represents the value that the saturation is taken turns in t, t be the positive integer more than or equal to 0 and t just Initial value is the value of the saturation that 0, f (0) is initialization, and f ' (t) represents the experience distribution that the saturation is taken turns in t, p (t) The sample distribution that the saturation is taken turns in t is represented, the experience distribution and the sample distribution are changed in EM algorithms For what is obtained in operating process.
According to method as described above, alternatively, the iterative operation stops when the value of f (t) no longer changes.
Another aspect of the invention provides a kind of device of knowledge base triple inspection, including:
Acquisition module, for obtaining the corresponding rule of extension triple, the extension triple is based on existing knowledge storehouse In former triple and the rule be extended the triple that obtains of operation, the extension triple is included at least by the first language Sentence, relational statement, the ordered set of the second sentence composition, the relational statement are used to represent first sentence and described second Relation between sentence;
Determining module, for determining the regular corresponding saturation, the saturation is used to represent the rule Whether correct probability, the saturation is obtained according to initial saturation and EM algorithms;
Processing module, for determining whether the extension triple is credible according to the saturation.
According to device as described above, alternatively, the processing module includes:
First submodule, for determining the extension triple corresponding first according to belief propagation and the saturation Probability distribution and the second probability distribution, first probability distribution be used for represent it is described extension triple should believable probability, Second probability distribution extends the incredible probability of triple for representing described, and described in second probability distribution=1- First probability distribution;
Second submodule, for determining whether the extension triple is credible according to destination probability distribution and predetermined threshold value, The destination probability is distributed as first probability distribution or second probability distribution.
According to device as described above, alternatively second submodule specifically for:
If the predetermined threshold value is believable threshold value, the destination probability is distributed as the first probability distribution, if the mesh Mark probability distribution is more than or equal to the predetermined threshold value, it is determined that the extension triple is credible;If the destination probability distribution Less than the predetermined threshold value, it is determined that the extension triple is insincere;
If the predetermined threshold value is incredible threshold value, the destination probability is distributed as the second probability distribution, if described Destination probability distribution is more than or equal to the predetermined threshold value, it is determined that the extension triple is insincere;If the destination probability Distribution is less than the predetermined threshold value, it is determined that the extension triple is credible.
According to device as described above, alternatively, the determining module specifically for:
Determined to be iterated the saturation f (t+1) after operation by the EM algorithms according to equation below:
F (t+1)=f (t) * [f ' (t)/p (t)];
Wherein, f (t) represents the value that the saturation is taken turns in t, t be the positive integer more than or equal to 0 and t just Initial value is the value of the saturation that 0, f (0) is initialization, and f ' (t) represents the experience distribution that the saturation is taken turns in t, p (t) The sample distribution that the saturation is taken turns in t is represented, the experience distribution and the sample distribution are changed in EM algorithms For what is obtained in operating process.
According to device as described above, alternatively, the determining module is additionally operable to:
The iterative operation stops when the value of f (t) no longer changes.
The method and apparatus of knowledge base triple inspection of the invention, by obtaining the corresponding rule of extension triple Then, the regular corresponding saturation is determined according to initial saturation and EM algorithms, and is determined to extend according to the saturation Whether triple is credible, and then may determine whether to be put into the extension triple in knowledge base, and knowledge base is expanded, and carries The accuracy that knowledge base high expands.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are this hairs Some bright embodiments, for those of ordinary skill in the art, without having to pay creative labor, can be with Other accompanying drawings are obtained according to these accompanying drawings.
The schematic flow sheet of the method for the knowledge base triple inspection that Fig. 1 is provided for one embodiment of the invention;
The schematic flow sheet of the method for the knowledge base triple inspection that Fig. 2 is provided for another embodiment of the present invention;
The structural representation of the device of the knowledge base triple inspection that Fig. 3 is provided for one embodiment of the invention;
The structural representation of the device of the knowledge base triple inspection that Fig. 4 is provided for another embodiment of the present invention;
Fig. 5 is the factor graph of structure in the embodiment of the present invention.
Specific embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is A part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Embodiment one
The method that the present embodiment provides a kind of inspection of knowledge base triple, for check knowledge base extension triple whether It is credible.The executive agent of the present embodiment is the device of knowledge base triple inspection.
As shown in figure 1, the schematic flow sheet of the method for the inspection of knowledge base triple, the method includes:
Step 101, obtains the corresponding rule of extension triple, and extension triple is based on the former ternary in existing knowledge storehouse Group and rule are extended the triple that operation is obtained, and extension triple is included at least by the first sentence, relational statement, the second language The ordered set of sentence composition, relational statement is used to represent the relation between the first sentence and the second sentence.
Knowledge base is made up of multiple triples of expression knowledge, such as Freebase knowledge bases, and a triple can be with table (the first sentence, relational statement, the second sentence) is shown as, wherein, relational statement is used to represent between the first sentence and the second sentence Relation, for example, a triple in knowledge base is (Li Ming, nationality, China), its expression means that the nationality of Li Ming is China, another triple is (Li Ming, residence, Beijing), then it represents that Li Ming stays in Beijing.Knowledge base expands i.e. according to knowledge Original triple in storehouse, some rules are found using rule discovery method, are entered further according to original triple and these rules Row extended operation is expanded triple.
Wherein, the rule discovery method of use can be the conventional method of prior art, such as associated rule discovery method AMIE or other discovery methods.It is specific how triple to be extended according to Rule, there are a variety of modes.For example, existing know There is multigroup following former triple in knowledge storehouse:(A, daughter, B), (C, husband, B), (A, daughter, C), wherein, (A, daughter, B) Represent that A is that the daughter of B, (C, husband, B) represent that C is that the husband of B, (A, daughter, C) represent that A is the daughter of C, then it can be found that have Such a rule:(H, daughter, Z) can be deduced by (H, daughter, Y) and (Z, husband, Y), by this Rule Expression for (H, Daughter, Y)+(Z, husband, Y)=>(H, daughter, Z), have this rule, if in knowledge base exist (small red, daughter, Wang Ying) and (Zhang San, husband, Wang Ying), but when simultaneously in the absence of small red knowledge triple with Zhang San, according to regular (H, daughter, Y)+(Z, Husband, Y)=>(H, daughter, Z) and existing two former triples (small red, daughter, Wang Ying) and (Zhang San, husband, Wang Ying), can To draw extension triple (small red, daughter, Zhang San).Now, the extension triple it is corresponding rule be (H, daughter, Y)+ (Z, husband, Y)=>(H, daughter, Z).
It is to be appreciated that what the first sentence and the 3rd sentence in each triple in rule were not to determine, but It is unknown as the variable in equation, our each triple of composition rule is called atomic rule, the atomic rule It is unknown triple, the former triple in knowledge base or extension triple can be met the rule, substitutes into the rule Then, an example for rule is drawn.
According to above-mentioned process, multiple rules can be obtained on the basis of the triple of knowledge base Central Plains, and can be according to multiple Rule and former triple are extended operation and obtain multiple extension triples.
It is to be appreciated that according to multiple rules and former triples be extended operation obtain multiple extension triples it Afterwards, extended operation can also be proceeded further according to multiple rules and former triple and multiple extension triples, until without new Extension triple draw.
By in all former triples that can meet multiple rules in knowledge base or the extension equal rule of substitution of triple, draw Multiple examples of multiple rules, it is possible to understand that, a rule can have multiple examples.
Wherein, if there is certain extension triple in an example for rule, the corresponding rule of extension triple is The rule.
It is to be appreciated that in actual mechanical process, strictly all rules can be used, and our purpose is to check extension Whether triple is credible, therefore, it is intended merely to achieve the goal herein, so being to obtain the corresponding rule of extension triple.
Step 102, it is determined that regular corresponding saturation, saturation be used to representing rule whether correct probability, because Subfunction is obtained according to initial saturation and EM algorithms.
For each rule, it is possible to understand that ground, can be made up of at least two atomic rules, we claim composition one rule The number of atomic rule then is the regular length, and whether a correct probability of rule is represented with saturation.
And the corresponding saturation of rule be (can be with based on the triple being related in the regular example and example Including former triple and extension triple) obtained using EM algorithms according to initial saturation.
It should be noted that for convenience of description, in practical operation, we by each example of rule with one because Subfunction represents that is, each example has a saturation, and a rule may correspond to multiple examples, then one in calculating process Individual rule may correspond to multiple saturations, and our the corresponding multiple saturations of a rule are called the factor for belonging to same family Function, and whether just the result with the saturation of family in same calculation procedure is identical, can represent the rule True probability.Therefore, we can be described as determining regular corresponding saturation herein, namely determine regular corresponding example because Subfunction.
EM algorithms are EM algorithm (Expectation Maximization Algorithm), are that a kind of iteration is calculated Method, for the maximal possibility estimation or maximum posteriori probability of the probability parameter model containing hidden variable (latent variable) Estimate, alternately calculated by two steps, be that regular corresponding factor letter is determined by successive ignition in the present embodiment Number.
It is to be appreciated that during the corresponding saturation of computation rule, due to not expanding only in the example of rule Exhibition triple, also there is a former triple, and former triple is likely to correspondence other rules, therefore, calculate rule it is corresponding because The information for being related to other regular is likely to during subfunction.
Initial saturation is the value of random initializtion, can specifically be selected according to actual needs.
Step 103, determines whether extension triple is credible according to saturation.
After the saturation that the corresponding example of extension triple is determined, calculating extension triple according to saturation can Whether letter or incredible probability are credible to determine extension triple.
Alternatively, the credible or incredible probability for calculating extension triple according to saturation can also be calculated using EM Method, step 102 and step 103 are complementary processes, in actual mechanical process, first according to all examples for initializing Saturation tries to achieve the credible or incredible probability of all triples (including knowledge base original triple and extension triple), then The saturation of example is updated by series of computation with the probability, and the saturation will also be used further to update all triples Credible or incredible probability, so constantly iteration, until the credible or incredible probability and all realities of all triples Example saturation no longer change, then with last take turns update saturation obtain the credible of last all triples or Incredible probability.Then can be obtained from the credible or incredible probability of all triples extension triple credible or Incredible probability, it is whether credible to determine the extension triple.
The method of the knowledge base triple inspection that the present embodiment is provided, by obtaining the corresponding rule of extension triple, root Determine the regular corresponding saturation according to initial saturation and EM algorithms, and extension triple is determined according to the saturation It is whether credible, and then may determine whether to be put into the extension triple in knowledge base, knowledge base is expanded, improve and know Know the accuracy that storehouse is expanded.
Embodiment two
The method of the knowledge base triple inspection that the present embodiment is provided embodiment one does further supplementary notes.
As shown in Fig. 2 the schematic flow sheet of the method for the knowledge base triple inspection provided for the present embodiment.The method bag Include:
Step 201, obtains the corresponding rule of extension triple, and extension triple is based on the former ternary in existing knowledge storehouse Group and rule are extended the triple that operation is obtained, and extension triple is included at least by the first sentence, relational statement, the second language The ordered set of sentence composition, relational statement is used to represent the relation between the first sentence and the second sentence.
The concrete operations of the step are consistent with step 101, and here is omitted.
Step 202, determines to be iterated the saturation f after operation by the EM algorithms according to equation below (t+1):
F (t+1)=f (t) * (f ' (t)/p (t))
Wherein, f (t) represents the value that the saturation is taken turns in t, t be the positive integer more than or equal to 0 and t just Initial value is the value of the saturation that 0, f (0) is initialization, and f ' (t) represents the experience distribution that the saturation is taken turns in t, p (t) The sample distribution that the saturation is taken turns in t is represented, the experience distribution and the sample distribution are changed in EM algorithms For what is be calculated in operating process.Saturation be used to representing rule whether correct probability, saturation is according to initial What saturation and EM algorithms were obtained.
Step 203, corresponding first probability distribution of extension triple and second are determined according to belief propagation and saturation Probability distribution, the first probability distribution be used for represent extension triple should believable probability, the second probability distribution be used for represent expand The exhibition incredible probability of triple, and the probability distribution of the second probability distribution=1- first.
EM algorithms include two steps, and the first step is to calculate to expect (E), and referred to as E-step is in this step in the present embodiment Corresponding first probability distribution of extension triple and the second probability distribution are determined according to belief propagation and saturation in rapid.
Second step is to maximize (M), and referred to as M-step is in this step, according to what is tried to achieve in upper step in the present embodiment First probability distribution and the second probability distribution carry out updating factor function.The saturation for obtaining is used further to be obtained according to belief propagation New the first probability distribution and the second probability distribution.Two step alternating iterations, it is final after iteration stopping to determine extension triple correspondence The first probability distribution and the second probability distribution.
It is to be appreciated that step 202 and step 203 are complementary processes, in actual mechanical process, first according to just The saturation of all examples of beginningization tries to achieve all triples (including knowledge base by the belief propagation of the E-step of EM algorithms Former triple and extension triple) probability distribution (including the first probability distribution and second probability distribution), the probability distribution use In the M-step of EM algorithms, by the calculating of certain process, carry out updating factor function, and the saturation will also be used further to E- Step updates the probability distribution of all triples, and M-step is then arrived again, so continuous iteration, until the probability of all triples The saturation of distribution and all examples no longer changes, and the saturation for updating is taken turns with last and is obtained finally to E-step All triples probability distribution, the iterative process of EM algorithms is to terminate.Then the probability point of extension triple can be obtained Cloth.
Step 204, determines whether the extension triple is credible, the target according to destination probability distribution and predetermined threshold value Probability distribution is first probability distribution or second probability distribution.
After corresponding first probability distribution of extension triple and the second probability distribution is determined, according to first probability Whether distribution or the second probability distribution are compared credible to determine the extension triple with predetermined threshold value.
Alternatively, predetermined threshold value can be set to believable threshold value, then destination probability should be the first probability distribution, when target is general Rate distribution is more than predetermined threshold value, then the extension triple is credible, and when destination probability is less than predetermined threshold value, then the extension triple is not It is credible.
Alternatively, predetermined threshold value can be set to incredible threshold value, then destination probability should be the second probability distribution, work as target Probability is less than predetermined threshold value, then the extension triple is credible, and when destination probability is more than the threshold value, then the extension triple can not Letter.
The method of the knowledge base triple inspection that the present embodiment is provided, by constantly updating extension triple in EM algorithms Destination probability distribution and the corresponding regular corresponding saturation of the extension triple so that rule correct probability be Further learn on the basis of the confidence level knowledge base after expansion of rule, and extend the calculating of the confidence level of triple be It is calculated based on its corresponding regular corresponding saturation, and the calculating of its corresponding regular corresponding saturation Be related to the confidence level of knowledge base Central Plains triple, thus the confidence level of extension triple calculating contemplate knowledge it Between global relevance such that it is able to the high-effect high-quality high accuracy ground storehouse that expands knowledge.
Embodiment three
The method of the knowledge base triple inspection that the present embodiment is provided above-described embodiment does concrete example explanation.
For example, the former triple in 5 knowledge bases is chosen, number consecutively is e1~e5, is had found according to knowledge base Regular number be 4, number consecutively is r1~r4, and the number of extension triple that operation is obtained is extended according to rule It it is 3, number consecutively is e6, e7, e8.
For each triple, each former triple should believable probability represented with the 3rd probability distribution, each original three The incredible probability of tuple represents that each extends triple first probability distribution table of believable probability with the 4th probability distribution Show, each extension incredible probability of triple is represented with the second probability distribution.Wherein, the probability of four probability distribution=1- the 3rd Distribution, the probability distribution of the second probability distribution=1- first.For example, 5 former believable probability of triple be represented by successively b1~ B5, then its 4th probability distribution be represented sequentially as (1-b1)~(1-b5), 3 the first probability distribution of extension triple table successively A1~a3 is shown as, then its incredible probability is represented sequentially as (1-a1)~(1-a3).5 former triples and 3 extension ternarys Group it is credible or it is insincere represented with x1~x8 successively, x=[x1, x2, x3, x4, x5, x6, x7, x8] represents that all triples can Letter or incredible set, wherein, x1~x8 is the two-valued function for taking 0 or 1, and we are referred to as triple e1~e8's Variable, it is insincere to take the 0 expression triple, takes the 1 expression triple credible.
For example, 5 triples chosen from knowledge base, and 4 rules having been found that, such as Tables 1 and 2 institute Show.
The former triple of table 1
The rule of table 2
Numbering Rule
r1 (H, residence, Y)+(Y, country, Z)=>(H, nationality, Z)
r2 (H, area, Y)+(Y, city, Z)=>(H, under city area, Z)
r3 (H, nationality, Y)+(H, birthplace, Z)=>(Z, country, Y)
r4 (H, under city area, Y)=>(H, country, Y)
Wherein, in same rule, H represents same unknown sentence, and the H between each rule not necessarily represents identical Sentence, it will be apparent that, Y and Z is also in this way, can be the first language that corresponding relation sentence in respective rule is met in existing triple Sentence and the second sentence are substituted into, and obtain the example or extension triple of rule.
Operation is extended according to former triple and rule, be expanded triple, as shown in table 3.
Table 3 extends triple
Extended operation is proceeded according to former triple and extension triple and rule, new extension triple is obtained, such as Shown in table 4.
The new extension triple of table 4
Finally give the data including former triple and all extension triples.
The all triples of table 5 and the variable x tables of comparisons
Triple Numbering x
(B,R1,A) e1 x1
(A,R2,C) e2 x2
(B,R3,D) e3 x3
(E,R4,C) e4 x4
(D,R5,E) e5 x5
(B,R6,C) e6 x6
(D,R7,C) e7 x7
(D,R2,C) e8 x8
The example for obtaining strictly all rules by former triple and extension triple includes:
Example 1:(B, R1, A)+(A, R2, C)=>(B,R6,C)
Example 2:(B, R3, D)+(B, R6, C)=>(D,R2,C)
Example 3:(D, R5, E)+(E, R4, C)=>(D,R7,C)
Example 4:(D, R7, C)=>(D,R2,C)
Understand, example 1 is the example of regular r1, and example 2 is the example of regular r3, and example 3 is the example of regular r2, example 4 is the example of regular r4.If it is to be appreciated that in the case that the triple chosen from knowledge base is more, having multiple examples pair A situation for rule is answered, is merely illustrative herein, be not intended as limitation.
Factor graph is built according to all triples and example, as shown in figure 5, wherein, q=[q1, q2, q3, q4, q5, q6, Q7, q8] represent that triple is credible or incredible probability distribution, i.e. and if the triple is former triple, such as e1 then works as x1 When=1, q1=b1, as x1=0, q1=1-b1, if the triple is extension triple, such as e7, then as x7=1, q7 =a2, as x7=0, q7=1-a2.F1-f4 represents the saturation of each example respectively, specific as shown in table 6.
The triple of table 6 and the table of comparisons that variable x is saturation q
Such as one number of the atomic rule of rule be 3 (as (H, daughter, Y)+(Z, husband, Y)=>(H, daughter, Z)), then regular example is all made up of three triples accordingly, such as above-mentioned example 1, is made up of triple e1, e2 and e6, its Variable distinguish x1, x2, x6, then the saturation of the example be expressed as f1=[f11, f12, f13, f14, f15, f16, f17, F18], wherein, f11~f18 represent respectively three variables of triple take respectively 1 or 0 two kind the 8 of situation kinds combination when the reality The correct probability of the corresponding rule of example, as shown in table 7.If it is to be appreciated that the number of the atomic rule of rule be 2, strictly according to the facts Example 4, then the saturation f4=[f41, f42, f43, f44] of the example.
It is easy to description in order to follow-up, the node of circle in factor graph is referred to as variable node, square nodes are referred to as by us Factor nodes, then factor nodes include q nodes and f nodes.
Table 7
x1 x2 X6 f1
0 0 0 f11
0 0 1 f12
0 1 0 f13
0 1 1 f14
1 0 0 f15
1 0 1 f16
1 1 0 f17
1 1 1 f18
The present embodiment has 4 examples, there is 4 saturations, i.e. f=[f1, f2, f3, f4] accordingly, f represent it is all because The set of subfunction.
The corresponding saturation of 4 examples is as shown in table 8:
Table 8
Then, f2=[f21, f22, f23, f24, f25, f26, f27, f28]
F3=[f31, f32, f33, f34, f35, f36, f37, f38]
F4=[f41, f42, f43, f44]
After constructing factor graph, using EM algorithms, iteration tries to achieve saturation and the corresponding variable of all triples Probability distribution q.
EM algorithms are EM algorithm (Expectation Maximization Algorithm), are that a kind of iteration is calculated Method, for the maximal possibility estimation or maximum posteriori probability of the probability parameter model containing hidden variable (latent variable) Estimate, alternately calculated by two steps:
The first step is to calculate to expect (E), and referred to as E-step, using the existing estimate to hidden variable, calculates it maximum Likelihood estimator, is probability distribution (including the first probability distribution and second general for calculating variable x in this step in the present embodiment Rate is distributed) q;
Second step is to maximize (M), and referred to as M-step, the maximum likelihood value that maximization is tried to achieve in E steps carrys out calculating parameter Value, be that in this step, the probability distribution according to the variable tried to achieve in upper step is come updating factor function in the present embodiment.
The saturation obtained in M-step is used in next E-step calculating, and this process is continuous alternately, It is that correspondingly, the first probability distribution and the second probability distribution are not yet untill saturation no longer changes until the distribution q of variable x Can change again, whether the first probability distribution and the second probability distribution now are determined for extension triple credible.
Detailed process is as follows:
(1) initialize
, it is necessary to be initialized to all of q and f after factor graph builds and finishes.
1) initialization of q
As shown in the above, each triple such as e1, its q1 are two-valued function, are somebody's turn to do when representing that variable x1 takes 0 or 1 respectively The probability of triple, we can be represented with { 1-b1, b1 }, before represent the probability become when measuring 0, behind represent that change measures 1 Probability.For former triple, it is already present in knowledge base, then can be initialized as { 0.01,0.99 }.For extension three Tuple, the corresponding regular correctness of the value of its q is consistent.
The initialization of all q is as follows in the present embodiment:
The initialization value of q1, q2, q3, q4, q5 is { 0.01,0.99 };
Q6 initialization values are { 0.2,0.8 };
Q7 initialization values are { 0.3,0.7 };
Q8 initialization values are { 0.1,0.9 }.
2) initialization of saturation f, is random initializtion, as shown in table 9.
Table 9
(x1,x2,x6) f1 (x3,x6,x8) f2 (x4,x5,x8) f3 (x7,x8) f4
(0,0,0) 0.125 (0,0,0) 0.125 (0,0,0) 0.125 (0,0) 0.25
(0,0,1) 0.125 (0,0,1) 0.125 (0,0,1) 0.125 (0,1) 0.25
(0,1,0) 0.125 (0,1,0) 0.125 (0,1,0) 0.125 (1,0) 0.25
(0,1,1) 0.125 (0,1,1) 0.125 (0,1,1) 0.125 (1,1) 0.25
(1,0,0) 0.125 (1,0,0) 0.125 (1,0,0) 0.125 - -
(1,0,1) 0.125 (1,0,1) 0.125 (1,0,1) 0.125 - -
(1,1,0) 0.125 (1,1,0) 0.125 (1,1,0) 0.125 - -
(1,1,1) 0.125 (1,1,1) 0.125 (1,1,1) 0.125 - -
(2)E-step
Belief propagation is a kind of algorithm of the edge distribution of solution variable node in factor graph.We with variable x8 and with The connected factor as a example by, explain the process of belief propagation.
Factor nodes are to variable node transmission information (first round):We use s (f->X) with s (q->X) factor is represented respectively The information that node f and factor nodes q is transmitted to variable node x, the information initializing is the edge point of the x obtained according to f and q Cloth, then have:
s1(q8->X8)={ 0.1,0.9 };
s1(f2->X8)={ 0.5,0.5 };
s1(f3->X8)={ 0.5,0.5 };
s1(f4->X8)={ 0.5,0.5 };
Wherein, s1 represents first round factor nodes to variable node transmission information, s1 (q8->X8 first round q8) is then represented The information that factor nodes are transmitted to connected x8 variable nodes, will q8 initialization value { 0.1,0.9 } be delivered to variable Node x8;s1(f2->X8)={ 0.5,0.5 } represent and be added the probability of x8=0 in f2 initialization values, the probability of x8=1 is added { 0.5,0.5 } is obtained, coupled variable node x8 is passed to;The transmittance process of f3 and f4 is similar to f2, repeats no more.
It is to be appreciated that factor nodes are saved with factor nodes to the process that other variable nodes x1~x7 nodes are transmitted to x8 The process of point transmission is similar, and here is omitted.
Variable node is to factor nodes transmission information (first round):We use h (x->F) represent variable node x to factor section The information of point f transmission, has:
h1(x8->F2)=s1 (q8->x8)*s1(f3->x8)*s1(f4->X8)={ 0.1*0.5*0.5,0.9*0.5* 0.5};
h1(x8->F3)=s1 (q8->x8)*s1(f2->x8)*s1(f4->X8)={ 0.1*0.5*0.5,0.9*0.5* 0.5};
h1(x8->F4)=s1 (q8->x8)*s1(f2->x8)*s1(f3->X8)={ 0.1*0.5*0.5,0.9*0.5* 0.5};
Wherein, h1 represents the information that first round variable node is transmitted to factor nodes, h1 (x8->F2 the first round) is then represented The information that x8 variable nodes are transmitted to connected f2 factor nodes, will previous step q8 factor nodes and except f2 in itself in addition to The information multiplication that f3 and f4 nodes pass to x8 is then passed to factor nodes f2, x8 to the process and x8 of f3 and f4 transmission to f2 The process of transmission is similar, and here is omitted, and the * in formula represents that each c1 of the respectively two-valued function of { c1, d1 } form is multiplied, and respectively D1 is multiplied, and obtains a new two-valued function { result that each c1 is multiplied, the result that each d1 is multiplied }.
It should be understood that the process transmitted to coupled factor nodes of other variable nodes and variable x8 to its The process of connected factor nodes transmission is similar, and here is omitted.
It should be noted that in the present embodiment, described factor nodes are transmitted to variable node and variable node is to the factor Node is transmitted, and is transmitted between two nodes being connected with.
The result normalization that multiplication is obtained:
h1(x8->F2)={ 0.1*0.5*0.5,0.9*0.5*0.5 }={ 0.025,0.225 }={ 0.1,0.9 };
h1(x8->F3)={ 0.1*0.5*0.5,0.9*0.5*0.5 }={ 0.025,0.225 }={ 0.1,0.9 };
h1(x8->F4)={ 0.1*0.5*0.5,0.9*0.5*0.5 }={ 0.025,0.225 }={ 0.1,0.9 };.
Factor nodes are to variable node transmission information (the second wheel):
Also by taking variable node x8 as an example, coupled factor nodes have q8 nodes, f2 nodes, f3 nodes and f4 nodes, Then
s2(q8->X8)=s1 (q8->x8);
s2(f2->X8)=s1 (f2->x8)&h1(x3->f2)&h1(x6->f2);
s2(f3->X8)=s1 (f3->x8)&h1(x4->f3)&h1(x5->f3);
s2(f4->X8)=s1 (f4->x8)&h1(x7->f4);
Wherein, s2 represents the second wheel factor nodes to variable node transmission information, s2 (q8->X8) represent q8 nodes to x8 Node transmission information, in a cyclic process of E-step, the Information invariability that q8 nodes are transmitted to x8 nodes, until this E- Step terminates, and can try to achieve the probability distribution of new q8, and when arriving E-step again after M-step, q8 can be new to the information that x8 is transmitted Q8 probability distribution, s2 (f2->X8) information that factor nodes f2 is transmitted to variable node x8 in the wheel of expression second, Ji Jiang Information s1 (the f2- that f2 is transmitted to x8 in one wheel>X8), the information h1 (x3- that x3 is transmitted to f2 in the first round>) and first f2 Information h1 (the x6- that x6 is transmitted to f2 in wheel>F2) combination passes to x8 after being multiplied, and & represents the two-value of respectively { c2, d2 } form in formula Combination of function is multiplied, and obtains 8 value functions, as shown in table 10, then adding up x8=0, x8=1's adds up, and obtains New two-valued function.
For example, from the foregoing, it will be observed that s1 (f2->X8)={ 0.5,0.5 }, due to not specifically giving h1 in the above (x3->) and h1 (x6- f2>F2 the value (can all be obtained during actual operation and preserved) after) normalizing, in order to more clearly say Bright transmittance process, it will be assumed that h1 (x3->F2)={ 0.2,0.8 }, h1 (x6->F2)={ 0.3,0.7 }, then:
s2(f2->X8)={ 0.5,0.5 }s &h1 { 0.2,0.8 } & { 0.3,0.7 }={ 0.5,0.5 }
{ 0.03,0.07,0.12,0.28,0.03,0.07,0.12,0.28 } represent to dependent variable x8, x3, when x6 takes 0 or 1 8 in combined situation corresponding f2 transmission information:
Table 10
x8 x3 x6 s2(f2->x8)
0 0 0 0.5*0.2*0.3=0.03
0 0 1 0.5*0.2*0.7=0.07
0 1 0 0.5*0.8*0.3=0.12
0 1 1 0.5*0.8*0.7=0.28
1 0 0 0.5*0.2*0.3=0.03
1 0 1 0.5*0.2*0.7=0.07
1 1 0 0.5*0.8*0.3=0.12
1 1 1 0.5*0.8*0.7=0.28
It is also possible to obtain the information that f3 and f4 are transmitted to x8:s2(f3->) and s2 (f4- x8>x8)
Variable node is to factor nodes transmission information (the second wheel):Computational methods are consistent with the first round, and here is omitted.
Factor nodes are to variable node transmission information (third round):Computational methods are consistent with the second wheel, and here is omitted.
Continuous iteration said process, until the information change very little (restraining) transmitted.
Above-mentioned iteration is completed after (algorithmic statement), and we calculate variable according to the factor to the information that variable node is transmitted Probability distribution.It is same by taking variable x8 as an example, it is assumed that restrained after 15 wheels of iteration:
Q8=s15 (q8->x8)*s15(f2->x8)*s15(f3->x8)*s15(f4->x8)
Wherein, s15 (q8->X8)=s1 (q8->x8).
For convenience of explanation, it is assumed that:
Q8={ 0.5*0.5*0.4*0.35,0.5*0.5*0.6*0.65 }={ 0.035,0.0975 }
After normalization, the probability point of the variable x8 obtained after q8={ 0.264,0.736 }, as E-step first waves circulation Cloth q8 is { 0.264,0.736 }, then first probability distribution of corresponding triple e8 is 0.264, and the second probability distribution is 0.736。
Likewise it is possible to obtain the probability distribution of its dependent variable, here is omitted.
It should be understood that this result be E-step first waves circulation after obtain, to be used in M-step, update because Subfunction f, then will also again carry out the second ripple circulation of E-step, then draw variable one group according to the saturation after renewal New probability distribution, for M-step, such iteration is multiple, until the probability distribution of final variable and the saturation for updating Untill no longer changing.The probability distribution of the variable for now obtaining as destination probability distribution, for judging corresponding triple Whether believable standard.
(3)M-step
The renewal of saturation:
For convenience of description, in this step, the experience that we set saturation is distributed as f ' (0), i.e. f ' (0)=[f1 ' (0), f2 ' (0), f3 ' (0), f4 ' (0)], f ' is tried to achieve according to equation below:
F1 ' (0)=q1^q2^q6;
F2 ' (0)=q3^q6^q8;
F3 ' (0)=q4^q5^q8;
F4 ' (0)=q7^q8;
Wherein, q1~q8 is the probability distribution of the variable x1~x8 tried to achieve in E-step, and f1 ' (0)~f4 ' (0) represents t The experience distribution of the saturation obtained in=0 wheel circulation, it is 8 value functions or 4 value functions of f1 shapes as described above, ^ tables Show that combination is multiplied.
It is to be appreciated that in a cyclic process of complete M-step, the experience distribution of the saturation used is not Become, i.e., be all that the probability distribution of the variable that E-step is obtained according to before is tried to achieve, namely
F1 ' (t)=f1 ' (3)=f1 ' (2)=f1 ' (1)=f1 ' (0);
Until this M-step terminates, then the probability distribution of new variable is tried to achieve through E-step, according to the general of new variable The experience distribution of new saturation is tried to achieve in rate distribution.
With f1 ' (0) for example, if q1, q2, q3 for being tried to achieve in E-step respectively { 0.01,0.99 }, 0.01, 0.99 }, { 0.1,0.9 }, then the calculating of the experience distribution of f1 ' (0) is as shown in table 11.
Table 11
(x1,x2,x6) f1’(0)
(0,0,0) 0.01*0.01*0.1=0.00001
(0,0,1) 0.01*0.01*0.9=0.00009
(0,1,0) 0.01*0.99*0.1=0.00099
(0,1,1) 0.01*0.99*0.9=0.00891
(1,0,0) 0.99*0.01*0.1=0.00099
(1,0,1) 0.99*0.01*0.9=0.00891
(1,1,0) 0.99*0.99*0.1=0.09801
(1,1,1) 0.99*0.99*0.9=0.88209
Then the sample distribution of saturation is obtained by sampling, detailed process is as follows:
First, initialize:T=0, t represent the number of times of M-step outer loops;
Secondly, for each variable x1~x8, random initializtion its value (0 or 1), such as m=[x1, x2, x3, x4, X5, x6, x7, x8]=[0,1,0,0,1,1,0,1], m is array, and we are referred to as sampled data, the sampling for preserving variable Value;
Again, to saturation f, random initializtion, same to E-step, here is omitted.
Finally, cyclic process is as follows:
Work as t<During T (T=50, T represent M-step outer loop maximum cycles, it is possible to understand that ground, or other Numerical value, is not limited herein):
If 1) t=0, L take a larger value, and (such as L=100, L represent the largest loop time of M-step interior loops Number), if t>0, then L take a larger value (such as L=30);
Initialization l=0, l represent the number of times of M-step interior loops;
Work as l<During L:
1st, to each variable x1~x8, in the case of assuming that other several variable-values determine, the general of the variable is obtained Rate is distributed, and the value of the variable is determined according to its probability distribution.
For example, by taking variable x8 as an example, it is assumed that the value of x1~x7 is respectively [0,1,0,0,1,1,0], works as prefactor Function is initialized, it is believed that be known, because the probability distribution of f2, f3, f4 is related to x8, then can obtain x8 on The probability distribution of f2, f3, f4, by taking f3 as an example, it is assumed that current f3 value conditions are as shown in the table, for convenience of description, Wo Menke With with f3 (0) represent initialization f3, as shown in table 12 because the value of x4 and x5 be [0,1], in can be obtain x8 on The probability distribution of f3 is that { 0.12,0.18 } (needs to normalize the two values, their sums is equal to 1), be after normalization (0.4,0.6).Similarly, probability distribution of the x8 on f2 and f4 can be obtained, it is assumed that respectively { 0.3,0.7 } and { 0.2,0.8 }. Then the probability distribution of x8 be equal to this 3 probability distribution be multiplied (and normalization), i.e. x8 probability distribution for 0.4*0.3*0.2, 0.6*0.7*0.8 }, be after normalization { 0.067,0.933 }.Obtain after this distribution, we enter according to this distribution to x8 Row sampling, and update sampling array m.
Table 12
(x4,x5,x8) f3(0)
(0,0,0) 0.025
(0,0,1) 0.025
(0,1,0) 0.12
(0,1,1) 0.18
(1,0,0) 0.05
(1,0,1) 0.05
(1,1,0) 0.10
(1,1,1) 0.45
Likewise it is possible to obtain the sampled value of other several variable x1~x7, finally, the sampled value of all variables in m All updated, obtained one group of sampled value of variable, and preserve in addition.
2nd, l=l+1 is updated, into next interior loop.
This time cyclic variable is initialized as that group of sampled value that last time circulation is obtained, by L circulation, interior loop knot Beam, can obtain the sampled value of L group variables, and each factor is calculated according to the M group sampled values that last M wheels (such as M=15) circulations are obtained The sample distribution of function, detailed process is as follows:
For example, by taking f3 as an example, it is assumed that the sampled result of (x4, x5, x8) last 15 wheel is as shown in table 13, for the ease of Illustrate, we can represent that f3 takes turns the sample distribution of circulation in t=0 with p3 (0).
Table 13
(x4,x5,x8) The number of times of middle appearance in 15 groups of sampled values p3(0)
(0,0,0) 1 1/15
(0,0,1) 2 2/15
(0,1,0) 1 1/15
(0,1,1) 1 1/15
(1,0,0) 0 0/15
(1,0,1) 1 1/15
(1,1,0) 3 3/15
(1,1,1) 6 6/15
The sample distribution of other several saturations can be equally obtained, here is omitted.
Then updating factor function is as follows:
For example, by taking f3 as an example:
F3 (1)=f3 (0) * (f3 ' (0)/p3 (0))
Wherein, the value of the saturation f3 after f3 (1) expressions outer loop t=0 is cyclically updated, f3 (0) represents initialization Saturation f3 value, f3 ' (0) is the experience of saturation f3 point tried to achieve of probability distribution of the variable according to E-step Cloth, the sample distribution of saturation f3 during p3 (0) expression t=0.
If it is to be appreciated that work as t=2, f3 (2)=f3 (1) * (f3 ' (1)/p3 (1));
Wherein, the value of the saturation f3 that f3 (2) is cyclically updated for t=1, saturation f3's when f3 ' (1) is for t=1 Experience is distributed, the sample distribution of saturation f3 when p3 (1) is for t=1, the like, here is omitted.
Value after the same renewal that can obtain other saturations, here is omitted.
It is to be appreciated that if the example of same regular r3 has multiple, f3 ' (0) is the multiple warps tried to achieve by multiple examples The result that distribution is added is tested, similarly, p3 (0) is the result that the sample distribution tried to achieve by multiple examples is added.
For example, if also three triples x9, x10, x11 are also the example of regular r3, according to said process, meeting Same x4, x5, x8 are the same, then obtain an experience distribution and a sample distribution, then, f3 ' (0) is the two experiences point in formula Cloth is added, and p3 (0) is added for this two sample distributions.
T=t+1 is updated, into next outer loop.
After outer loop terminates for T times, the saturation f1 (T) after one group of renewal, f2 (T), f3 (T), f4 (T) are obtained.
It is to be appreciated that the saturation that M-step circulations are obtained is the result of epicycle circulation, the result saturation will Be used further to initialize the saturation of E-step, continue cycling through, until the saturation and the probability distribution of variable that obtain no longer During change (restraining), the saturation for obtaining is final saturation, can be used for the correctness of judgment rule.
After EM algorithms terminate, the probability distribution of all variables is obtained, wherein, the probability distribution for extending triple is mesh Mark probability distribution, we set a predetermined threshold value, determine that the extension triple is according to destination probability distribution and predetermined threshold value No credible, destination probability is distributed as the first probability distribution or the second probability distribution.
As an example it is assumed that the probability distribution of extension triple e8 is 0.9 for { 0.1,0.9 }, i.e. its first probability distribution, Its second probability distribution is 0.1, if setting predetermined threshold value as believable threshold value i (such as i=0.98), takes destination probability distribution For the first probability distribution is 0.9, then when 0.9>During j, it is believed that the extension triple is believable, can put it into knowledge Storehouse, expands with to knowledge base, if 0.9<J, then it is assumed that the extension triple is insincere, does not just put it into knowledge base.If Given threshold be incredible threshold value j (such as j=0.2), then take destination probability and be distributed as the second probability distribution i.e. 0.1, then when Second probability distribution<When 0.2, the extension triple is credible, when the second probability distribution>When 0.2, the extension triple is insincere.
It should be noted that for determination formula f (t+1)=f (t) * (f ' (t)/p (t)) of saturation, can pass through Obtained by procedure below is derived:
After building factor graph, the problem concerning study of the confidence level of rule is converted into the study of the factor:
Assuming thatA factor graph is represented, the set of wherein variable node and factor nodes is respectivelyWithFor any OneIn node, we represent the node set that is connected with node u with N (u), wherein, u is representedA section in set The node being connected with factor nodes f2 in point, such as embodiment of the present invention has variable node x3, x6 and x8.WithRepresent any one Individual set, for any one setWe useRepresent fromArriveMapping function set, whereinTable ShowIt isSubset,ComprisingIn some or all variable nodes.WithRepresentIn belong to same family not Know the set of function, i.e.,The set of multiple functions is represented, that is, the set of the saturation for belonging to same family is represented, in set Each function corresponds to same rule, and their value condition is consistent, such as the every rule in the embodiment of the present invention be from The multiple examples for belonging to same family sum up the unknown function for.ForIn a function f:There is m (f) individual variable, and each variable fromMiddle value.Allow factor graphIn each nodeWith a functionCorrelation,Represent guIt isIn a function, wherein, m (gu)=| N (u) |.Wherein u represents factor graph In factor nodes, guRepresent saturation corresponding with factor nodes u;N (u) expressions are joined directly together with factor nodes u The set of variable node;m(gu)=| N (u) | the size of the set of the variable node being joined directly together with factor nodes u is represented, i.e., The quantity of variable node, it is necessary to explanation, for different node u,guAnd gu′May identical, even guAnd gu′ Corresponding to same rule, then guAnd gu′It is identical.To each functionWe represent set with ∧ (f)That is, ∧ (f) is representedMiddle and function f related all node sets, namely ∧ (f) is representedIn All saturations correspond to same rule during and function f belongs to the set of all functions of same family, the i.e. set.Then The product form of representative functionAssuming that the product represents one group of stochastic variableConnection Close distributionWherein,Represent the set of variable node in factor graph;Represent and variable nodeRelated variable, here Each variable be consistent with each variable in the embodiment of the present invention;Variable as representing one group, such as embodiment In three set of variable corresponding to a saturation, namely hereVariable x as in the embodiment of the present invention, Due toIt is unknown, it is meant that this Joint Distribution is also unknown.
Assuming that observing an experience distributionRepresent Joint DistributionExperience distribution, the as present invention is real Apply the experience distribution of the saturation being calculated according to the distribution of the experience of the variable obtained by the E-step of EM algorithms in example.Mesh Be according to this experience distribution go estimate each saturationSpecifically, by rightSampling M times, each valueObserveSet { a that is secondary, observing(i):I=1,2 ..., } it is expressed as D.
WithExpression logp (D | F), wherein, logp (D | F) likelihood letter of the expression in the case of given observation set D Number, p (D | F) represents the Posterior distrbutionp of F, then we need to maximizeTo eachUse D:SRepresent observation in D To mapping of the combination on S, such as one group observation of variable S=[x3, x6, x8] may be [0,0,0], [0,0, 1],…,[1,1,1].To eachWith any one vectorUse mSA () represents that the combination for observing is a Number of times.To anyUse a:SExpression is mapped in the subvector on S, that is, represent for variable combination S Specific value condition is a.
We define:
Wherein,One group of variable, such as (x3, x6, x8) are represented, a represents this group of specific value of variable, for example [0,0, 0] ..., [1,1,1], then have,
Represented with ACan obtain:
:=A-M log z (10)
Now for eachWe deriveObtain,WithIt isOn function:
Wherein, ∧*F () represents the combination of all variables corresponding to function set ∧ (f).
For each a ∈ ∧*F (), has:
Lemma 1:To each
Above-mentioned local derviation formula is set to 0, to eachCan obtain:
F (b) is moved on to one side of equation as ft+1B (), can obtain the iterative formula of algorithm:
Wherein,Represent t wheel through over-sampling obtain using distribution, such as in the embodiment of the present invention by one The sample distribution p3 (0) of the saturation that individual example calculation is obtained.
It should be noted that for the ease of description, iterative formula (16) is expressed as in the embodiment of the present invention:
F (t+1)=f (t) * [f ' (t)/p (t)],
Wherein, f (t) represents the f in the value that the saturation is taken turns in t, i.e. formula (16)t(b), t be more than or The initial value of positive integer and t equal to 0 is the value of the saturation that 0, f (0) is initialization, and f ' (t) represents the saturation In the experience distribution of t wheels, i.e., in formula (16)P (t) represents the saturation in adopting that t takes turns Sample is distributed, i.e., in formula (16)
The method of the knowledge base triple inspection that the present embodiment is provided, by constantly updating extension triple in EM algorithms Destination probability distribution and the extension triple it is corresponding rule saturation so that rule correct probability be rule Confidence level knowledge base after expansion on the basis of further learn, and it is to be based on to extend the calculating of the confidence level of triple What the saturation of its corresponding rule was calculated, and the calculating of the saturation of its corresponding rule has further related to knowledge The confidence level of storehouse Central Plains triple, therefore extend the association that the calculating of the confidence level of triple contemplates the overall situation between knowledge Property such that it is able to the high-effect high-quality high accuracy ground storehouse that expands knowledge.
Example IV
The present embodiment provides a kind of device of knowledge base triple inspection, the knowledge base triple for performing embodiment one The method of inspection.
As shown in figure 3, the structural representation of the device of the knowledge base triple inspection provided for the present embodiment.The present embodiment Knowledge base triple inspection device 40 include acquisition module 41, determining module 42 and processing module 43.
Wherein, acquisition module 41 is used to obtain the corresponding rule of extension triple, and extension triple is based on existing knowledge Former triple and rule in storehouse are extended the triple that operation is obtained, and extension triple includes at least by the first sentence, closes It is sentence, the ordered set of the second sentence composition, relational statement is used to represent the relation between the first sentence and the second sentence;Really Cover half block 42 is used to determine the regular corresponding saturation that acquisition module 41 is obtained whether saturation to be used to just to represent rule True probability, saturation is obtained according to initial saturation and EM algorithms;Processing module 43 is used for according to determining module Saturation determined by 42 determines whether extension triple is credible.
On the device in the present embodiment, wherein modules perform the concrete mode of operation in relevant the method It has been described in detail in embodiment, explanation will be not set forth in detail herein.
The device of the knowledge base triple inspection according to the present embodiment, by obtaining the corresponding rule of extension triple, root Determine the regular corresponding saturation according to initial saturation and EM algorithms, and extension triple is determined according to the saturation It is whether credible, and then may determine whether to be put into the extension triple in knowledge base, knowledge base is expanded, improve and know Know the accuracy that storehouse is expanded.
Embodiment five
The present embodiment does further supplementary notes to the device of the knowledge base triple inspection of example IV, to perform implementation The method of the style editor of example two.
As shown in figure 4, the processing module 43 in the device 40 of the knowledge base triple inspection of the present embodiment includes the first son The submodule 52 of module 51 and second.
Wherein, the first submodule 51 is used to determine that extension triple is corresponding first general according to belief propagation and saturation Rate is distributed and the second probability distribution, and the first probability distribution is used to represent that extension triple should believable probability, the second probability point Cloth is used to represent the extension incredible probability of triple, and the probability distribution of the second probability distribution=1- first;Second submodule 52 For determining whether extension triple is credible, and destination probability is distributed as the first probability point according to destination probability distribution and predetermined threshold value Cloth or the second probability distribution.
Optionally it is determined that module 42 according to equation below specifically for determining to be iterated after operation by EM algorithms Saturation f (t+1):
F (t+1)=f (t) * [f ' (t)/p (t)];
Wherein, f (t) represents the value that saturation is taken turns in t, and t is the initial value of the positive integer more than or equal to 0 and t It is the value of the saturation that 0, f (0) is initialization, f ' (t) represents the experience distribution that saturation is taken turns in t, and p (t) represents the factor The sample distribution that function is taken turns in t, experience distribution and sample distribution are obtained in EM algorithms are iterated operating process.
Optionally it is determined that module 42 is additionally operable to iterative operation stopping when the value of f (t) no longer changes.
On the device in the present embodiment, wherein modules perform the concrete mode of operation in relevant the method It has been described in detail in embodiment, explanation will be not set forth in detail herein.
The device of the knowledge base triple inspection according to the present embodiment, by constantly updating extension triple in EM algorithms Destination probability distribution and the corresponding regular corresponding saturation of the extension triple so that rule correct probability be Further learn on the basis of the confidence level knowledge base after expansion of rule, and extend the calculating of the confidence level of triple be It is calculated based on its corresponding regular corresponding saturation, and the calculating of its corresponding regular corresponding saturation Be related to the confidence level of knowledge base Central Plains triple, thus the confidence level of extension triple calculating contemplate knowledge it Between global relevance such that it is able to the high-effect high-quality high accuracy ground storehouse that expands knowledge.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass through Programmed instruction related hardware is completed, and foregoing program can be stored in a computer read/write memory medium, the program Upon execution, the step of including above method embodiment is performed;And foregoing storage medium includes:ROM, RAM, magnetic disc or light Disk etc. is various can be with the medium of store program codes.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although The present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those within the art that:It still may be used Modified with to the technical scheme described in foregoing embodiments, or equivalent is carried out to which part technical characteristic; And these modifications or replacement, do not make the scope of the essence disengaging various embodiments of the present invention technical scheme of appropriate technical solution.

Claims (10)

1. a kind of method that knowledge base triple is checked, it is characterised in that including:
Obtain the extension corresponding rule of triple, the extension triple is based on the former triple in existing knowledge storehouse and described Rule is extended the triple that operation is obtained, and the extension triple is included at least by the first sentence, relational statement, the second language The ordered set of sentence composition, the relational statement is used to represent the relation between first sentence and second sentence;
Determine the regular corresponding saturation, the saturation be used to representing the rule whether correct probability, institute Stating saturation is obtained according to initial saturation and EM algorithms;
Determine whether the extension triple is credible according to the saturation.
2. method according to claim 1, it is characterised in that described that the extension ternary is determined according to the saturation Group whether it is credible including:
Corresponding first probability distribution of the extension triple and the second probability are determined according to belief propagation and the saturation Distribution, first probability distribution is used to represent that the extension triple should believable probability, the second probability distribution use The incredible probability of triple, and the first probability distribution described in second probability distribution=1- are extended in representing described;
Determine whether the extension triple is credible, and the destination probability is distributed as institute according to destination probability distribution and predetermined threshold value State the first probability distribution or second probability distribution.
3. method according to claim 2, it is characterised in that described true according to destination probability distribution and the predetermined threshold value Whether the fixed extension triple credible including:
If the predetermined threshold value is believable threshold value, the destination probability is distributed as the first probability distribution, if the target is general Rate distribution is more than or equal to the predetermined threshold value, it is determined that the extension triple is credible;If the destination probability distribution is less than The predetermined threshold value, it is determined that the extension triple is insincere;
If the predetermined threshold value is incredible threshold value, the destination probability is distributed as the second probability distribution, if the target Probability distribution is more than or equal to the predetermined threshold value, it is determined that the extension triple is insincere;If the destination probability distribution Less than the predetermined threshold value, it is determined that the extension triple is credible.
4. the method according to any one of claim 1-3, it is characterised in that the determination regular corresponding factor Function includes:
Determined to be iterated the saturation f (t+1) after operation by the EM algorithms according to equation below:
F (t+1)=f (t) * [f ' (t)/p (t)];
Wherein, f (t) represents the value that the saturation is taken turns in t, and t is the initial value of the positive integer more than or equal to 0 and t It is the value of the saturation that 0, f (0) is initialization, f ' (t) represents the experience distribution that the saturation is taken turns in t, and p (t) is represented The sample distribution that the saturation is taken turns in t, the experience distribution and the sample distribution are to be iterated behaviour in EM algorithms Obtained during work.
5. method according to claim 4, it is characterised in that the iterative operation is when the value of f (t) no longer changes Stop.
6. the device that a kind of knowledge base triple is checked, it is characterised in that including:
Acquisition module, for obtaining the corresponding rule of extension triple, the extension triple is based in existing knowledge storehouse Former triple and the rule are extended the triple that obtains of operation, the extension triple include at least by the first sentence, The ordered set of relational statement, the second sentence composition, the relational statement is used to represent first sentence and second language Relation between sentence;
Determining module, for determining the regular corresponding saturation, whether the saturation is used to represent the rule Correct probability, the saturation is obtained according to initial saturation and EM algorithms;
Processing module, for determining whether the extension triple is credible according to the saturation.
7. device according to claim 6, it is characterised in that the processing module includes:
First submodule, for determining corresponding first probability of the extension triple according to belief propagation and the saturation Distribution and the second probability distribution, first probability distribution be used for represent it is described extension triple should believable probability, it is described Second probability distribution extends the incredible probability of triple for representing described, and first described in second probability distribution=1- Probability distribution;
Second submodule, it is described for determining whether the extension triple is credible according to destination probability distribution and predetermined threshold value Destination probability is distributed as first probability distribution or second probability distribution.
8. device according to claim 7, it is characterised in that second submodule specifically for:
If the predetermined threshold value is believable threshold value, the destination probability is distributed as the first probability distribution, if the target is general Rate distribution is more than or equal to the predetermined threshold value, it is determined that the extension triple is credible;If the destination probability distribution is less than The predetermined threshold value, it is determined that the extension triple is insincere;
If the predetermined threshold value is incredible threshold value, the destination probability is distributed as the second probability distribution, if the target Probability distribution is more than or equal to the predetermined threshold value, it is determined that the extension triple is insincere;If the destination probability distribution Less than the predetermined threshold value, it is determined that the extension triple is credible.
9. the device according to any one of claim 6-8, it is characterised in that the determining module specifically for:
Determined to be iterated the saturation f (t+1) after operation by the EM algorithms according to equation below:
F (t+1)=f (t) * [f ' (t)/p (t)];
Wherein, f (t) represents the value that the saturation is taken turns in t, and t is the initial value of the positive integer more than or equal to 0 and t It is the value of the saturation that 0, f (0) is initialization, f ' (t) represents the experience distribution that the saturation is taken turns in t, and p (t) is represented The sample distribution that the saturation is taken turns in t, the experience distribution and the sample distribution are to be iterated behaviour in EM algorithms Obtained during work.
10. device according to claim 9, it is characterised in that the determining module is additionally operable to:
The iterative operation stops when the value of f (t) no longer changes.
CN201710011368.1A 2017-01-06 2017-01-06 Method and device for checking triple of knowledge base Active CN106874380B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710011368.1A CN106874380B (en) 2017-01-06 2017-01-06 Method and device for checking triple of knowledge base

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710011368.1A CN106874380B (en) 2017-01-06 2017-01-06 Method and device for checking triple of knowledge base

Publications (2)

Publication Number Publication Date
CN106874380A true CN106874380A (en) 2017-06-20
CN106874380B CN106874380B (en) 2020-01-14

Family

ID=59164781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710011368.1A Active CN106874380B (en) 2017-01-06 2017-01-06 Method and device for checking triple of knowledge base

Country Status (1)

Country Link
CN (1) CN106874380B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569335A (en) * 2018-03-23 2019-12-13 百度在线网络技术(北京)有限公司 triple verification method and device based on artificial intelligence and storage medium
CN111506623A (en) * 2020-04-08 2020-08-07 北京百度网讯科技有限公司 Data expansion method, device, equipment and storage medium
CN113204650A (en) * 2021-05-14 2021-08-03 深圳市曙光信息技术有限公司 Evaluation method and system based on domain knowledge graph
CN113901151A (en) * 2021-09-30 2022-01-07 北京有竹居网络技术有限公司 Method, apparatus, device and medium for relationship extraction

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8037008B2 (en) * 2006-08-28 2011-10-11 Korea Institute Of Science & Technology Information DBMS-based knowledge extension and inference service method recorded on computer-readable medium
CN103500208A (en) * 2013-09-30 2014-01-08 中国科学院自动化研究所 Deep layer data processing method and system combined with knowledge base
CN104866498A (en) * 2014-02-24 2015-08-26 华为技术有限公司 Information processing method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8037008B2 (en) * 2006-08-28 2011-10-11 Korea Institute Of Science & Technology Information DBMS-based knowledge extension and inference service method recorded on computer-readable medium
CN103500208A (en) * 2013-09-30 2014-01-08 中国科学院自动化研究所 Deep layer data processing method and system combined with knowledge base
CN104866498A (en) * 2014-02-24 2015-08-26 华为技术有限公司 Information processing method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569335A (en) * 2018-03-23 2019-12-13 百度在线网络技术(北京)有限公司 triple verification method and device based on artificial intelligence and storage medium
US11275810B2 (en) 2018-03-23 2022-03-15 Baidu Online Network Technology (Beijing) Co., Ltd. Artificial intelligence-based triple checking method and apparatus, device and storage medium
CN111506623A (en) * 2020-04-08 2020-08-07 北京百度网讯科技有限公司 Data expansion method, device, equipment and storage medium
CN111506623B (en) * 2020-04-08 2024-03-22 北京百度网讯科技有限公司 Data expansion method, device, equipment and storage medium
CN113204650A (en) * 2021-05-14 2021-08-03 深圳市曙光信息技术有限公司 Evaluation method and system based on domain knowledge graph
CN113204650B (en) * 2021-05-14 2022-03-11 深圳市曙光信息技术有限公司 Evaluation method and system based on domain knowledge graph
CN113901151A (en) * 2021-09-30 2022-01-07 北京有竹居网络技术有限公司 Method, apparatus, device and medium for relationship extraction

Also Published As

Publication number Publication date
CN106874380B (en) 2020-01-14

Similar Documents

Publication Publication Date Title
CN106874380A (en) The method and apparatus of knowledge base triple inspection
Calgaro et al. Incremental incomplete LU factorizations with applications
Banerjee Contiguity and non-reconstruction results for planted partition models: the dense case
CN111242268A (en) Method for searching convolutional neural network
Han et al. Hyperattention: Long-context attention in near-linear time
Kuznetsov Development and application of the Fourier method to the mean-square approximation of iterated Ito and Stratonovich stochastic integrals
Bogfjellmo Algebraic structure of aromatic B-series
Cortinovis et al. Divide-and-conquer methods for functions of matrices with banded or hierarchical low-rank structure
CN109074348A (en) For being iterated the equipment and alternative manner of cluster to input data set
Shojaei-Fard The analytic evolution of Dyson–Schwinger equations via homomorphism densities
Toda Gepner point and strong Bogomolov-Gieseker inequality for quintic 3-folds
Katara et al. A modified version of the K-means clustering algorithm
Kolokolov et al. Analysis and solving SAT and MAX-SAT problems using an L-partition approach
Henzinger et al. Fine-grained complexity lower bounds for families of dynamic graphs
Goryachih et al. Multidimensional global optimization method using numerically calculated derivatives
Kukush et al. Simultaneous estimation of baseline hazard rate and regression parameters in Cox proportional hazards model with measurement error
Kaveh et al. Combinatorial optimization of special graphs for nodal ordering and graph partitioning
Bernard et al. Inferring stochastic L-systems using a hybrid greedy algorithm
Dvoenko Recognition of dependent objects based on acyclic Markov models
Bobenko et al. The asymptotic behavior of the discrete holomorphic map Z a via the Riemann–Hilbert method
Zhao et al. EAMCD: an efficient algorithm based on minimum coupling distance for community identification in complex networks
Yui et al. A cost effective graph-based partitioning algorithm for a system of linear equations
CN113505827B (en) Machine learning classification method
Lindahl MULTI-DIMENSIONAL RESPONSE MATRIX METHOD.
Razborov On Turán’s (3, 4)-problem with forbidden subgraphs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant