CN106874380B

CN106874380B - Method and device for checking triple of knowledge base

Info

Publication number: CN106874380B
Application number: CN201710011368.1A
Authority: CN
Inventors: 赵伟华; 张日崇
Original assignee: Beijing University of Aeronautics and Astronautics
Current assignee: Beijing University of Aeronautics and Astronautics
Priority date: 2017-01-06
Filing date: 2017-01-06
Publication date: 2020-01-14
Anticipated expiration: 2037-01-06
Also published as: CN106874380A

Abstract

The invention provides a method and a device for inspecting a knowledge base triple, which can determine whether an extended triple is put into a knowledge base or not by acquiring a rule corresponding to the extended triple, determining a factor function corresponding to the rule according to an initial factor function and an EM algorithm and determining whether the extended triple is credible or not according to the factor function, thereby expanding the knowledge base and improving the accuracy of the expansion of the knowledge base.

Description

Method and device for checking triple of knowledge base

Technical Field

The invention relates to a knowledge base expansion technology, in particular to a method and a device for checking knowledge base triples.

Background

The knowledge base is a database for structurally storing knowledge in a form of triples and is used for structurally storing massive knowledge in a certain field or a certain industry. For example, a historical knowledge base may store a vast amount of knowledge in the historical domain, including individual historical characters, historical events, and the like. The knowledge base mainly describes objects by taking examples as main description objects, and expresses knowledge by adopting an object-oriented method, wherein one example is a reference to a concrete or abstract transaction in reality. For example, an instance may represent a person, may represent a city, a thing, etc.

A repository typically includes multiple instances, and multiple attributes of the instances and relationships between the instances are stored in a structure of triples. Triplets are the infrastructure in the knowledge base used to represent knowledge. Its structure can be expressed as < first statement, relational statement, second statement >, relational statement is used to express the relation between the first statement and the second statement.

The knowledge base expansion means that under the condition that an original knowledge base is incomplete, an unknown triple is predicted by using a known triple representing knowledge through a data mining method, so that a new triple is expanded in the original knowledge base, and the knowledge base is more complete. Therefore, checking whether the new triplet is credible becomes a technical problem to be solved urgently.

Disclosure of Invention

The invention provides a method and a device for inspecting a triple of a knowledge base, which aim to overcome the defects of unreliable expanded triples and the like in the prior art.

The invention provides a method for checking knowledge base triples, which comprises the following steps:

acquiring a rule corresponding to an extended triple, wherein the extended triple is a triple obtained by performing extension operation based on an original triple in an existing knowledge base and the rule, the extended triple comprises an ordered set at least comprising a first statement, a relational statement and a second statement, and the relational statement is used for representing the relationship between the first statement and the second statement;

determining a factor function corresponding to the rule, wherein the factor function is used for representing the probability of whether the rule is correct or not, and the factor function is obtained according to an initial factor function and an EM algorithm;

and determining whether the extension triple is credible according to the factor function.

According to the method as described above, optionally, the determining whether the extension triplet is trusted according to the factor function includes:

determining, according to belief propagation and the factor function, a first probability distribution and a second probability distribution corresponding to the extended triplets, the first probability distribution representing a probability that the extended triplets should be trustworthy, the second probability distribution representing a probability that the extended triplets should be trustworthy, and the second probability distribution being 1 — the first probability distribution;

and determining whether the extension triple is credible according to a target probability distribution and a preset threshold, wherein the target probability distribution is the first probability distribution or the second probability distribution.

According to the method as described above, optionally, the determining whether the extension triplet is trusted according to the target probability distribution and the preset threshold includes:

if the preset threshold is a credible threshold, the target probability distribution is a first probability distribution, and if the target probability distribution is greater than or equal to the preset threshold, the credibility of the extension triple is determined; if the target probability distribution is smaller than the preset threshold, determining that the extension triple is not credible;

if the preset threshold is an untrusted threshold, the target probability distribution is a second probability distribution, and if the target probability distribution is greater than or equal to the preset threshold, the extension triple is determined to be untrusted; and if the target probability distribution is smaller than the preset threshold, determining that the extension triple is credible.

According to the method as described above, optionally, the determining a factor function corresponding to the rule includes:

determining the factor function f (t +1) after iterative operation by the EM algorithm according to the formula:

f(t+1)＝f(t)*[f’(t)/p(t)]；

wherein f (t) represents the value of the factor function in the t-th round, t is a positive integer greater than or equal to 0, the initial value of t is 0, f (0) is the value of the initialized factor function, f' (t) represents the empirical distribution of the factor function in the t-th round, p (t) represents the sampling distribution of the factor function in the t-th round, and the empirical distribution and the sampling distribution are obtained in the iterative operation process of the EM algorithm.

According to the method described above, optionally, the iterative operation is stopped when the value of f (t) no longer changes.

Another aspect of the present invention provides an apparatus for triple verification of a knowledge base, comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a rule corresponding to an extended triple, the extended triple is a triple obtained by performing extension operation based on an original triple in an existing knowledge base and the rule, the extended triple comprises an ordered set at least comprising a first statement, a relation statement and a second statement, and the relation statement is used for representing the relation between the first statement and the second statement;

the determining module is used for determining a factor function corresponding to the rule, the factor function is used for representing the probability of whether the rule is correct, and the factor function is obtained according to an initial factor function and an EM algorithm;

and the processing module is used for determining whether the extension triple is credible according to the factor function.

According to the apparatus as described above, optionally, the processing module includes:

a first submodule, configured to determine, according to belief propagation and the factor function, a first probability distribution and a second probability distribution corresponding to the extended triplet, where the first probability distribution is used to represent a probability that the extended triplet should be trusted, the second probability distribution is used to represent a probability that the extended triplet should be trusted, and the second probability distribution is 1 — the first probability distribution;

and the second submodule is used for determining whether the extension triple is credible according to a target probability distribution and a preset threshold, wherein the target probability distribution is the first probability distribution or the second probability distribution.

According to the apparatus as described above, optionally the second sub-module is specifically configured to:

According to the apparatus as described above, optionally, the determining module is specifically configured to:

f(t+1)＝f(t)*[f’(t)/p(t)]；

According to the apparatus as described above, optionally, the determining module is further configured to:

the iterative operation stops when the value of f (t) no longer changes.

According to the method and the device for inspecting the triple of the knowledge base, the rule corresponding to the expansion triple is obtained, the factor function corresponding to the rule is determined according to the initial factor function and the EM algorithm, whether the expansion triple is credible or not is determined according to the factor function, whether the expansion triple is put into the knowledge base or not can be further determined, the knowledge base is expanded, and the accuracy of the expansion of the knowledge base is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a method for triple inspection of a knowledge base according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating a method for triple verification of a knowledge base according to another embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an apparatus for triple verification of a knowledge base according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an apparatus for triple verification of a knowledge base according to another embodiment of the present invention;

FIG. 5 is a factor graph constructed in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

The embodiment provides a method for checking the triple of the knowledge base, which is used for checking whether the extension triple of the knowledge base is credible. The execution subject of the present embodiment is a means of knowledge base triple verification.

As shown in fig. 1, a schematic flow chart of a method for triple inspection of a knowledge base is shown, and the method includes:

step 101, obtaining a rule corresponding to an extended triple, where the extended triple is a triple obtained by performing an extension operation based on an original triple and the rule in an existing knowledge base, and the extended triple includes an ordered set composed of at least a first statement, a relational statement, and a second statement, and the relational statement is used to represent a relationship between the first statement and the second statement.

The knowledge base is composed of a plurality of triples representing knowledge, such as a Freebase knowledge base, and one triplet may be represented as (first sentence, relational sentence, second sentence), where the relational sentence is used to represent the relationship between the first sentence and the second sentence, for example, one triplet in the knowledge base is (lyming, nationality, china) which represents that the nationality of the lyming is china, and the other triplet is (lyming, residence, beijing) which represents that the lyming lives in beijing. The knowledge base is expanded, namely, according to the original triples in the knowledge base, some rules are discovered by adopting a rule discovery method, and then expansion operation is carried out according to the original triples and the rules to obtain the expanded triples.

The rule discovery method may be a method commonly used in the prior art, such as an association rule discovery method AMIE or other discovery methods. There are many ways to obtain the extension triples according to the rule. For example, there are multiple sets of original triples in the existing knowledge base as follows: (A, daughter, B), (C, husband, B), (A, daughter, C), where (A, daughter, B) denotes that A is the daughter of B, (C, husband, B) denotes that C is the husband of B, (A, daughter, C) denotes that A is the daughter of C, a rule can be found that (H, daughter, Y) and (Z, husband, Y) can infer (H, daughter, Z), which is expressed as (H, daughter, Y) + (Z, husband, Y) > (H, daughter, Z), which is present if there are (minired, King, Ying) and (Zhang, husband, King) in the knowledge base, but there is no knowledge of minired and Zhang, according to rule (H, English, Y) + (Z, husband, Y) > (H, daughter, Z), and two already existing minired and original, daughter, wang ying) and (zhang san, husband, wang ying), an extended triplet (pink, daughter, zhang san) can be derived. At this time, the rule corresponding to the extended triplet is (H, daughter, Y) + (Z, husband, Y) > (H, daughter, Z).

It is understood that the first statement and the third statement in each triple in the rule are not determined, but are unknown as variables in the equation, and we call each triple that constitutes the rule as an atomic rule, and the atomic rule is an unknown triple, and the original triple or the extended triple in the knowledge base can be substituted into the rule to obtain an instance of the rule, which satisfies the rule.

According to the process, a plurality of rules are obtained on the basis of the original triples in the knowledge base, and a plurality of extended triples are obtained by performing extension operation according to the plurality of rules and the original triples.

It can be understood that after the plurality of extended triples are obtained by performing the extension operation according to the plurality of rules and the original triples, the extension operation may be continued according to the plurality of rules, the original triples and the plurality of extended triples until no new extended triples are obtained.

All the original triples or the extended triples which can meet the multiple rules in the knowledge base are substituted into the rules to obtain multiple examples of the multiple rules, and it can be understood that one rule can have multiple examples.

If there is a certain extended triple in an instance of a rule, the rule corresponding to the extended triple is the rule.

It can be understood that, in the actual operation process, all rules are used, and our purpose is to check whether the extension triple is trusted, so that the purpose here is only to achieve, and therefore, the rule corresponding to the extension triple is obtained.

And 102, determining a factor function corresponding to the rule, wherein the factor function is used for indicating the probability of whether the rule is correct or not, and the factor function is obtained according to the initial factor function and the EM algorithm.

For each rule, it is understood that the rule may be composed of at least two atomic rules, where we refer to the number of atomic rules composing a rule as the length of the rule, and a factor function is used to indicate the probability of whether a rule is correct.

And the factor function corresponding to a rule is obtained by adopting an EM algorithm according to the initial factor function based on the example of the rule and the triplet (which may comprise the original triplet and the extended triplet) involved in the example.

It should be noted that, for convenience of description, in actual operation, each instance of a rule is represented by a factor function, that is, each instance has a factor function, and a rule may correspond to multiple instances, and then a rule may correspond to multiple factor functions in a calculation process, and we refer to multiple factor functions corresponding to a rule as factor functions belonging to the same family, and the results of the factor functions of the same family in the same calculation step are the same, and may indicate the probability whether the rule is correct or not. Therefore, we can refer to herein as determining the factor function to which the rule corresponds, i.e., determining the factor function of the instance to which the rule corresponds.

The EM Algorithm is a maximum Expectation Algorithm (Expectation Maximization Algorithm), which is an iterative Algorithm, and is used for maximum likelihood estimation or maximum posterior probability estimation of a probability parameter model containing a hidden variable (latent variable), and calculation is performed alternately through two steps, in this embodiment, a factor function corresponding to a rule is determined through multiple iterations.

It can be understood that, in the process of calculating the factor function corresponding to the rule, since the rule instance includes not only the extended triplet but also the original triplet, and the original triplet may also correspond to other rules, information of other rules may also be involved in the process of calculating the factor function corresponding to one rule.

The initial factor function is a value initialized randomly, and can be selected according to actual needs.

And 103, determining whether the extension triple is credible according to the factor function.

After determining the factor function of the corresponding instance of the extension triple, calculating the credibility or the incredibility probability of the extension triple according to the factor function to determine whether the extension triple is credible or not.

Optionally, an EM algorithm may be used to calculate the credibility or the untrusted probability of the extended triplet according to the factor function, and step 102 and step 103 are interdependent processes, in the actual operation process, firstly, the credibility or the untrusted probability of all triples (including the original triplet of the knowledge base and the extended triplet) is obtained according to the initialized factor functions of all instances, and then the probability is used to update the factor function of the instance through a series of calculations, and the factor function is further used to update the credibility or the untrusted probability of all triples, and so on, until the credibility or the untrusted probability of all triples and the factor function of all instances are not changed, the credibility or the untrusted probability of all triples is obtained by using the factor function updated in the last round. The probability of trustworthiness or untrustworthiness of an extended triplet may then be obtained from the probabilities of trustworthiness or untrustworthiness of all triples to determine whether the extended triplet is trustworthy.

According to the method for inspecting the triple of the knowledge base, provided by the embodiment, the rule corresponding to the expansion triple is obtained, the factor function corresponding to the rule is determined according to the initial factor function and the EM algorithm, and whether the expansion triple is credible or not is determined according to the factor function, so that whether the expansion triple is put into the knowledge base or not can be determined, the knowledge base is expanded, and the accuracy of expanding the knowledge base is improved.

Example two

The embodiment further provides a supplementary description of the method for checking the triple of the knowledge base provided in the first embodiment.

Fig. 2 is a schematic flow chart of the method for checking the triple of the knowledge base according to this embodiment. The method comprises the following steps:

step 201, obtaining a rule corresponding to an extended triple, where the extended triple is a triple obtained by performing an extension operation based on an original triple and the rule in an existing knowledge base, and the extended triple includes an ordered set composed of at least a first statement, a relational statement, and a second statement, and the relational statement is used to represent a relationship between the first statement and the second statement.

The specific operation of this step is consistent with step 101, and is not described herein again.

Step 202, determining the factor function f (t +1) after iterative operation by the EM algorithm according to the following formula:

f(t+1)＝f(t)*(f’(t)/p(t))

wherein f (t) represents the value of the factor function in the t-th round, t is a positive integer greater than or equal to 0, the initial value of t is 0, f (0) is the value of the initialized factor function, f' (t) represents the empirical distribution of the factor function in the t-th round, p (t) represents the sampling distribution of the factor function in the t-th round, and the empirical distribution and the sampling distribution are calculated in the iterative operation process of the EM algorithm. The factor function is used for representing the probability whether the rule is correct or not, and the factor function is obtained according to the initial factor function and the EM algorithm.

And step 203, determining a first probability distribution and a second probability distribution corresponding to the extended triples according to the belief propagation and the factor function, wherein the first probability distribution is used for representing the probability that the extended triples should be credible, the second probability distribution is used for representing the probability that the extended triples are not credible, and the second probability distribution is 1-the first probability distribution.

The EM algorithm includes two steps, the first step is to calculate the expectation (E), referred to as E-step, and in this embodiment, the first probability distribution and the second probability distribution corresponding to the extended triplet are determined according to the belief propagation and the factor function in this step.

The second step is the maximization (M), called M-step, in this embodiment, the factor function is updated according to the first probability distribution and the second probability distribution obtained in the previous step. The resulting factor function is then used to derive a new first probability distribution and a second probability distribution based on belief propagation. And performing two-step alternate iteration, and finally determining a first probability distribution and a second probability distribution corresponding to the extension triple after the iteration is stopped.

It will be appreciated that

steps

202 and 203 are interdependent processes, and that in actual operation, firstly, obtaining probability distribution (including a first probability distribution and a second probability distribution) of all triples (including the original triples and the extended triples of the knowledge base) through E-step belief propagation of the EM algorithm according to initialized factor functions of all instances, the probability distribution is used in M-step of EM algorithm, and factor function is updated through calculation in a certain process, and the factor function is also used again for E-step updating the probability distribution of all triples, and then, repeating the M-step, continuously iterating until the probability distribution of all the triples and the factor functions of all the examples are not changed, solving the final probability distribution of all the triples by using the factor function updated in the last round to the E-step, and finishing the iteration process of the EM algorithm. The probability distribution of the extended triples may then be obtained.

Step 204, determining whether the extension triplet is credible according to a target probability distribution and a preset threshold, wherein the target probability distribution is the first probability distribution or the second probability distribution.

After determining the first probability distribution and the second probability distribution corresponding to the extended triples, determining whether the extended triples are credible according to the comparison between the first probability distribution or the second probability distribution and a preset threshold.

Optionally, the preset threshold may be set as a trusted threshold, the target probability is a first probability distribution, when the target probability distribution is greater than the preset threshold, the extended triplet is trusted, and when the target probability is less than the preset threshold, the extended triplet is not trusted.

Optionally, the preset threshold may be set as an untrusted threshold, the target probability is a second probability distribution, when the target probability is smaller than the preset threshold, the extension triplet is trusted, and when the target probability is greater than the threshold, the extension triplet is untrusted.

In the method for checking the triple of the knowledge base, the target probability distribution of the extended triple and the factor function corresponding to the rule corresponding to the extended triple are continuously updated in the EM algorithm, so that the correct probability of the rule, that is, the reliability of the rule is further learned on the basis of the extended knowledge base, and the calculation of the reliability of the extended triple is calculated based on the factor function corresponding to the corresponding rule, and the calculation of the factor function corresponding to the corresponding rule involves the reliability of the original triple in the knowledge base, so that the calculation of the reliability of the extended triple also takes the global correlation between the knowledge into consideration, and the knowledge base can be efficiently, high-quality and accurate extended.

EXAMPLE III

The embodiment specifically exemplifies the method for checking the triple of the knowledge base provided in the above embodiment.

For example, original triples in 5 knowledge bases are selected and sequentially numbered as e 1-e 5, the number of rules found according to the knowledge bases is 4, and sequentially numbered as r 1-r 4, and the number of extended triples obtained by performing an extension operation according to the rules is 3, and sequentially numbered as e6, e7, and e 8.

For each triplet, the probability that each original triplet should be trusted is represented by a third probability distribution, the probability that each original triplet is untrustworthy is represented by a fourth probability distribution, the probability that each expanded triplet is trusted is represented by a first probability distribution, and the probability that each expanded triplet is untrustworthy is represented by a second probability distribution. Wherein the fourth probability distribution is 1-the third probability distribution, and the second probability distribution is 1-the first probability distribution. For example, the probabilities of 5 original triples being credible are sequentially represented as b 1-b5, the fourth probability distributions thereof are sequentially represented as (1-b1) to (1-b5), and the first probability distributions of 3 extended triples being a1 to a3, the probabilities of being incredible are sequentially represented as (1-a1) to (1-a 3). The credibility or the incredibility of 5 original triples and 3 expansion triples is sequentially represented by x 1-x 8, wherein x is [ x1, x2, x3, x4, x5, x6, x7 and x8] to represent the credibility or the incredibility of all triples, wherein x 1-x 8 are binary functions with 0 or 1, the variables are respectively called triples e 1-e 8, 0 represents the incredibility of the triples, and 1 represents the credibility of the triples.

For example, 5 triples are selected from the knowledge base, and 4 rules have been discovered, as shown in tables 1 and 2.

TABLE 1 original triplet

TABLE 2 rules

Numbering	Rules
		r1	(H, residence, Y) + (Y, country, Z) ═>(H, nationality, Z)
r2	(H, block, Y) + (Y, commercial, Z) ═>(H, lower City, Z)
		r3	(H, nationality, Y) + (H, radix rehmanniae, Z)>(Z, national, Y)
r4	(H, lower municipal, Y) ═>(H, national, Y)

In the same rule, H represents the same unknown statement, H between the rules does not necessarily represent the same statement, and obviously, Y and Z are also the same, so that the first statement and the second statement which satisfy the corresponding relation statement in the corresponding rule in the existing triple can be substituted to obtain the instance of the rule or the extended triple.

And performing an extension operation according to the original triple and the rule to obtain an extended triple, as shown in table 3.

Table 3 extension triplets

And continuing to perform the expansion operation according to the original triple, the expansion triple and the rule to obtain a new expansion triple, as shown in table 4.

TABLE 4 New extension triplets

And finally obtaining the data comprising the original triple and all the extended triples.

TABLE 5 all triplets against variable x

Triple unit	Numbering	x
			(B,R1,A)	e1	x1
(A,R2,C)	e2	x2
			(B,R3,D)	e3	x3
(E,R4,C)	e4	x4
			(D,R5,E)	e5	x5
(B,R6,C)	e6	x6
			(D,R7,C)	e7	x7
(D,R2,C)	e8	x8

Examples of deriving all rules from the original triples and the extended triples include:

example 1: (B, R1, a) + (a, R2, C) ═ B, R6, C)

Example 2: (B, R3, D) + (B, R6, C) ═ (D, R2, C)

Example 3: (D, R5, E) + (E, R4, C) ═ (D, R7, C)

Example 4: (D, R7, C) ═ (D, R2, C)

It is appreciated that instance 1 is an instance of rule r1, instance 2 is an instance of rule r3, instance 3 is an instance of rule r2, and instance 4 is an instance of rule r 4. It is to be understood that, in the case of multiple triples selected from the knowledge base, there may be a case where multiple instances correspond to one rule, and this is only an example and is not a limitation.

Factor graphs are constructed from all triples and examples, as shown in fig. 5, where q ═ q1, q2, q3, q4, q5, q6, q7, q8 represent the probability distribution of whether a triplet is authentic or untrusted, i.e., if the triplet is the original triplet, such as e1, q1 ═ b1 when x1 is 1, q1 is 1-b1 when x1 is 0, and if the triplet is the extension, such as e7, q7 a2 when x7 is 1, and q7 is 1-a2 when x7 is 0. f1-f4 represent factor functions for each instance, respectively, as shown in Table 6.

TABLE 6 comparison of triplets with variable x, i.e. factor function q

For example, if the number of atomic rules of a rule is 3 (e.g., (H, daughter, Y) + (Z, husband, Y) > (H, daughter, Z)), then the examples of the rule are respectively composed of three triplets, such as the above example 1, the triplets e1, e2, and e6, and the variables thereof are x1, x2, and x6, respectively, and the factor function of the example is represented as f1 ═ f11, f12, f13, f14, f15, f16, f17, f18, where f11 to f18 respectively represent the probability that the rule corresponding to the example is correct when the variables of the three triplets respectively take 8 combinations of 1 or 0, as shown in table 7. It is understood that if the number of atomic rules of a rule is 2, as in example 4, the factor function f4 of this example is [ f41, f42, f43, f44 ].

For convenience of description later, we refer to the nodes of the circles in the factor graph as variable nodes, and the square nodes as factor nodes, and then the factor nodes include q nodes and f nodes.

TABLE 7

x1	x2	X6	f1
				0	0	0	f11
0	0	1	f12
				0	1	0	f13
0	1	1	f14
				1	0	0	f15
1	0	1	f16
				1	1	0	f17
1	1	1	f18

There are 4 examples of this embodiment, and there are 4 factor functions accordingly, i.e., [ f1, f2, f3, f4], where f denotes the set of all factor functions.

The factor functions for the 4 examples are shown in table 8:

TABLE 8

Then, f2 ═ f21, f22, f23, f24, f25, f26, f27, f28]

f3＝[f31,f32,f33,f34,f35,f36,f37,f38]

f4＝[f41,f42,f43,f44]

After the factor graph is constructed, the probability distribution q of the factor function and the variables corresponding to all the triples is obtained by iteration through an EM algorithm.

The EM Algorithm is a maximum Expectation Algorithm (Expectation Maximization Algorithm), which is an iterative Algorithm, and is used for maximum likelihood estimation or maximum posterior probability estimation of a probability parameter model containing hidden variables (latent variables), and the calculation is performed alternately through two steps:

the first step is to calculate the expectation (E), called E-step, using the existing estimate of the hidden variable to calculate its maximum likelihood estimate, which in this embodiment is to calculate the probability distribution (including the first probability distribution and the second probability distribution) q of the variable x in this step;

the second step is to maximize (M), called M-step, the maximum likelihood value found in step E to calculate the value of the parameter, in this embodiment, the factor function is updated based on the probability distribution of the variable found in the previous step.

The factor function obtained in M-step is used in the next E-step calculation, and the process is continuously alternated until the distribution q of the variable x, namely the factor function, is not changed any more, and correspondingly, the first probability distribution and the second probability distribution are not changed any more, and the first probability distribution and the second probability distribution can be used for determining whether the extension triples are credible or not.

The specific process is as follows:

(1) initialization

After the factor graph is constructed, all q and f need to be initialized.

1) Initialization of q

From the above, each triplet, e.g. e1, with q1 being a binary function, represents the probability of the triplet when the variable x1 takes 0 or 1, respectively, which can be represented by {1-b1, b1}, where the front represents the probability of the variable taking 0 and the rear represents the probability of the variable taking 1. For the original triple, which already exists in the knowledge base, it may be initialized to 0.01, 0.99. For the extension triple, the value of q is consistent with the correctness of the rule corresponding to the value.

All q's in this example are initialized as follows:

the initialization values of q1, q2, q3, q4 and q5 are all {0.01,0.99 };

q6 initialization value is {0.2,0.8 };

q7 initialization value is {0.3,0.7 };

the q8 initialization value is 0.1, 0.9.

2) The initialization of the factor function f is random initialization as shown in table 9.

TABLE 9

(x1,x2,x6)	f1	(x3,x6,x8)	f2	(x4,x5,x8)	f3	(x7,x8)	f4
								(0,0,0)	0.125	(0,0,0)	0.125	(0,0,0)	0.125	(0,0)	0.25
(0,0,1)	0.125	(0,0,1)	0.125	(0,0,1)	0.125	(0,1)	0.25
								(0,1,0)	0.125	(0,1,0)	0.125	(0,1,0)	0.125	(1,0)	0.25
(0,1,1)	0.125	(0,1,1)	0.125	(0,1,1)	0.125	(1,1)	0.25
								(1,0,0)	0.125	(1,0,0)	0.125	(1,0,0)	0.125	-	-
(1,0,1)	0.125	(1,0,1)	0.125	(1,0,1)	0.125	-	-
								(1,1,0)	0.125	(1,1,0)	0.125	(1,1,0)	0.125	-	-
(1,1,1)	0.125	(1,1,1)	0.125	(1,1,1)	0.125	-	-

(2)E-step

Belief propagation is an algorithm in a factor graph that solves for the edge distribution of variable nodes. We take the variable x8 and the factor connected to it as an example to explain the process of belief propagation.

Factor nodes pass information to variable nodes (first round): we denote by s (f- > x) and s (q- > x) the information passed by the factor node f and the factor node q, respectively, to the variable node x, initialized to the edge distribution of x obtained from f and q, then:

s1(q8->x8)＝{0.1,0.9}；

s1(f2->x8)＝{0.5,0.5}；

s1(f3->x8)＝{0.5,0.5}；

s1(f4->x8)＝{0.5,0.5}；

wherein s1 represents the first round of factor node passing information to variable node, s1(q8- > x8) represents the first round of q8 factor node passing information to x8 variable node connected to it, i.e. passing the initialized value {0.1,0.9} of q8 to variable node x 8; s1(f2- > x8) ═ 0.5,0.5} indicates that the probabilities of x8 ═ 0 in the f2 initialization value are added, and the probabilities of x8 ═ 1 are added to obtain {0.5,0.5}, and the sum is transmitted to the variable node x8 connected with the sum; the transfer process of f3 and f4 is similar to f2 and will not be described in detail.

It can be understood that the process of the factor node transferring to the other variable nodes x 1-x 7 is similar to the process of the factor node transferring to the x8 node, and is not described herein again.

Variable nodes pass information to factor nodes (first round): we denote by h (x- > f) the information passed by the variable node x to the factor node f, and there are:

h1(x8->f2)＝s1(q8->x8)*s1(f3->x8)*s1(f4->x8)＝{0.1*0.5*0.5,0.9*0.5*0.5}；

h1(x8->f3)＝s1(q8->x8)*s1(f2->x8)*s1(f4->x8)＝{0.1*0.5*0.5,0.9*0.5*0.5}；

h1(x8->f4)＝s1(q8->x8)*s1(f2->x8)*s1(f3->x8)＝{0.1*0.5*0.5,0.9*0.5*0.5}；

wherein h1 represents the information transferred from the variable node of the first round to the factor node, h1(x8- > f2) represents the information transferred from the variable node of the first round x8 to the factor node of f2 connected with the variable node of the first round x8, i.e. the q8 factor node in the previous step is multiplied by the information transferred to x8 from the nodes of f3 and f4 except f2 and then transferred to the factor node f2, the process transferred from x8 to f3 and f4 is similar to the process transferred from x8 to f2, which is not repeated here, and c1 of the binary function in the form of { c1, d1} and multiplication of d1 are multiplied to obtain a new binary function { the result multiplied by c1 and the result multiplied by d1 }.

It is understood that the process of the other variable nodes to the factor node connected to the other variable nodes is similar to the process of the variable x8 to the factor node connected to the other variable nodes, and the details are not repeated here.

In this embodiment, the factor node and the variable node are transferred to the factor node and between two connected nodes.

Normalizing the result of the multiplication:

h1(x8->f2)＝{0.1*0.5*0.5,0.9*0.5*0.5}＝{0.025,0.225}＝{0.1,0.9}；

h1(x8->f3)＝{0.1*0.5*0.5,0.9*0.5*0.5}＝{0.025,0.225}＝{0.1,0.9}；

h1(x8->f4)＝{0.1*0.5*0.5,0.9*0.5*0.5}＝{0.025,0.225}＝{0.1,0.9}；。

factor nodes pass information to variable nodes (second round):

taking the variable node x8 as an example, the factor nodes connected with the variable node x8 comprise a q8 node, an f2 node, an f3 node and an f4 node, and then

s2(q8->x8)＝s1(q8->x8)；

s2(f2->x8)＝s1(f2->x8)&h1(x3->f2)&h1(x6->f2)；

s2(f3->x8)＝s1(f3->x8)&h1(x4->f3)&h1(x5->f3)；

s2(f4->x8)＝s1(f4->x8)&h1(x7->f4)；

Wherein s represents the second round of information transfer from the factor node to the variable node, s (q- > x) represents the q node to the x node, the information transfer from the q node to the x node is not changed in the process of one E-step cycle, until the E-step is finished, a new q probability distribution is obtained, when the E-step is finished after M-step, the information transfer from q to x is the new q probability distribution, s (f- > x) represents the information transfer from the factor node f to the variable node x in the second round, namely the information transfer from f to x in the first round is s (f- > x), the information transfer from x to f in the first round is h (x- > f), and the information transfer from x to f in the first round is h (x- > f), the information transfer is multiplied by the combination of binary functions in the form of { c, d } in the formula, and then the multiplication is carried out on x, an 8-value function is obtained, and then a new binary function is obtained by adding x8 to 0 and x8 to 1, as shown in table 10.

For example, as shown above, s1(f2- > x8) ═ 0.5,0.5}, since the normalized values of h1(x3- > f2) and h1(x6- > f2) are not specifically given in the above (both obtained and saved in the actual operation process), in order to explain the transfer process more clearly, we assume that h1(x3- > f2) ═ 0.2,0.8}, and h1(x6- > f2) {0.3,0.7}, then:

s2(f2->x8)＝{0.5,0.5}&h1{0.2,0.8}&{0.3,0.7}＝{0.5,0.5}

{0.03,0.07,0.12,0.28, 0.03,0.07,0.12,0.28} information conveyed by f2 corresponding to the combination of 8 of the corresponding variables x8, x3, x6 taken as 0 or 1:

watch 10

x8	x3	x6	s2(f2->x8)
				0	0	0	0.50.20.3＝0.03
0	0	1	0.50.20.7＝0.07
				0	1	0	0.50.80.3＝0.12
0	1	1	0.50.80.7＝0.28
				1	0	0	0.50.20.3＝0.03
1	0	1	0.50.20.7＝0.07
				1	1	0	0.50.80.3＝0.12
1	1	1	0.50.80.7＝0.28

Likewise, the information that f3 and f4 convey to x8 can be obtained: s2(f3- > x8) and s2(f4- > x8)

Variable nodes pass information to factor nodes (second round): the calculation method is the same as the first round, and is not described herein again.

Factor nodes pass information to variable nodes (third round): the calculation method is the same as the second round, and is not described herein again.

The above process is iterated until the information delivered changes little (i.e., converges).

After the iteration is complete (algorithm convergence), we compute the probability distribution of the variables based on the information passed by the factors to the variable nodes. Also taking variable x8 as an example, assume convergence after 15 iterations:

q8＝s15(q8->x8)*s15(f2->x8)*s15(f3->x8)*s15(f4->x8)

wherein s15(q8- > x8) is s1(q8- > x 8).

For convenience of explanation, assume:

q8＝{0.5*0.5*0.4*0.35,0.5*0.5*0.6*0.65}＝{0.035,0.0975}

after normalization, q8 is {0.264,0.736}, that is, the probability distribution q8 of the variable x8 obtained after the E-step first wave cycle is {0.264,0.736}, and then the first probability distribution of the corresponding triplet E8 is 0.264, and the second probability distribution is 0.736.

Likewise, the probability distributions of other variables may be found and will not be described further herein.

It can be understood that this result is obtained only after the first wave cycle of E-step, and to be used in M-step, the factor function f is updated, then according to the updated factor function, the second wave cycle of E-step is performed again, and a new set of probability distributions of the variables is obtained again to be used in M-step, and the iteration is performed for many times until the probability distribution of the final variable and the updated factor function are not changed any more. The probability distribution of the variables obtained at this time is the target probability distribution, and is used as a standard for judging whether the corresponding triples are credible.

(3)M-step

Updating of the factor function:

for convenience of explanation, in this step, let us assume that the empirical distribution of the factor function is f ' (0), i.e., f ' (0) ═ f1 ' (0), f2 ' (0), f3 ' (0), f4 ' (0), and f ' is obtained according to the following formula:

f1’(0)＝q1^q2^q6；

f2’(0)＝q3^q6^q8；

f3’(0)＝q4^q5^q8；

f4’(0)＝q7^q8；

where q1 to q8 are probability distributions of variables x1 to x8 obtained in E-step, and f1 '(0) to f 4' (0) represent empirical distributions of the factor functions obtained in the cycle where t is 0, which are 8-value functions or 4-value functions of the f1 form as described above, and ^ represents a combined multiplication.

It will be understood that the empirical distribution of the factor functions used is constant during a complete M-step cycle, i.e. is determined from the probability distribution of the variables previously obtained by E-step, i.e. the distribution is constant over time

f1’(t)＝f1’(3)＝f1’(2)＝f1’(1)＝f1’(0)；

And (4) until the M-step is finished, obtaining the probability distribution of a new variable through the E-step, and obtaining the experience distribution of a new factor function according to the probability distribution of the new variable.

Taking f1 '(0) as an example, if q1, q2 and q3 obtained in E-step are {0.01,0.99}, and {0.1,0.9}, respectively, the calculation of the empirical distribution of f 1' (0) is shown in table 11.

TABLE 11

(x1,x2,x6)	f1’(0)
		(0,0,0)	0.010.010.1＝0.00001
(0,0,1)	0.010.010.9＝0.00009
		(0,1,0)	0.010.990.1＝0.00099
(0,1,1)	0.010.990.9＝0.00891
		(1,0,0)	0.990.010.1＝0.00099
(1,0,1)	0.990.010.9＝0.00891
		(1,1,0)	0.990.990.1＝0.09801
(1,1,1)	0.990.990.9＝0.88209

Then, the sampling distribution of the factor function is obtained through sampling, and the specific process is as follows:

firstly, initializing: t is 0, and t represents the number of M-step outer layer cycles;

next, for each variable x1 to x8, a value (0 or 1) is randomly initialized, for example, m ═ x1, x2, x3, x4, x5, x6, x7, x8 ═ 0,1,0,0,1,1,0,1] where m is an array, and we refer to sample data for storing the sample values of the variables;

again, the factor function f is initialized randomly, as with E-step, and is not described here again.

Finally, the cycle is as follows:

when T < T (T is 50, T represents the maximum number of M-step outer cycle, and it is understood that other values are possible, and this is not a limitation):

1) if t is 0, L takes a large value (e.g., L is 100, L represents the maximum number of cycles of the M-step inner layer cycle), and if t >0, L takes a large value (e.g., L is 30);

initializing l to 0, wherein l represents the number of M-step inner layer circulation;

when L < L:

1. for each variable x 1-x 8, under the condition that the values of other variables are determined, the probability distribution of the variable is obtained, and the value of the variable is determined according to the probability distribution.

For example, taking variable x8 as an example, assuming that x 1-x 7 respectively take values of [0,1,0,0,1,1,0], the current factor function is initialized and can be considered to be known, since the probability distributions of f2, f3 and f4 are related to x8, the probability distributions of x8 with respect to f2, f3 and f4 can be obtained, taking f3 as an example, and assuming that the current f3 takes values as shown in the following table, for convenience of description, we can use f3(0) to represent initialized f3, as shown in table 12, since x4 and x5 take values of [0,1], so that the probability distribution of x8 with respect to f3 can be obtained as {0.12,0.18} (it is necessary to normalize these two values so that their sum is equal to 1), and then normalize these two values to (0.4, 0.6). Similarly, the probability distributions of x8 for f2 and f4 can be obtained, assuming {0.3,0.7} and {0.2,0.8} respectively. The probability distribution of x8 is equal to the multiplication (and normalization) of these 3 probability distributions, i.e., the probability distribution of x8 is {0.4 x 0.3 x 0.2,0.6 x 0.7 x 0.8} normalized to {0.067,0.933 }. Having obtained this distribution, we sample x8 according to this distribution and update the sample array m.

TABLE 12

(x4,x5,x8)	f3(0)
		(0,0,0)	0.025
(0,0,1)	0.025
		(0,1,0)	0.12
(0,1,1)	0.18
		(1,0,0)	0.05
(1,0,1)	0.05
		(1,1,0)	0.10
(1,1,1)	0.45

Similarly, the sampled values of several other variables x 1-x 7 can be obtained, and finally, the sampled values of all variables in m are updated, and the sampled values of a group of variables are obtained and stored.

2. And (4) updating l to l +1, and entering the next inner layer circulation.

The loop variable is initialized to the group of sampling values obtained by the last loop, after L loops, the inner-layer loop is finished, the sampling values of the L groups of variables are obtained, and the sampling distribution of each factor function is calculated according to the M groups of sampling values obtained by the last M (if M is 15) loops, wherein the specific process is as follows:

for example, taking f3 as an example, assuming that the sampling results of the last 15 rounds (x4, x5, x8) are shown in table 13, for convenience of description, we can use p3(0) to represent the sampling distribution of f3 in the round of t ═ 0 round.

Watch 13

(x4,x5,x8)	Number of occurrences in 15 sets of sample values	p3(0)
			(0,0,0)	1	1/15
(0,0,1)	2	2/15
			(0,1,0)	1	1/15
(0,1,1)	1	1/15
			(1,0,0)	0	0/15
(1,0,1)	1	1/15
			(1,1,0)	3	3/15
(1,1,1)	6	6/15

The sampling distribution of several other factor functions can be obtained as well, and will not be described herein again.

The update factor function is as follows:

for example, take f3 as an example:

f3(1)＝f3(0)*(f3’(0)/p3(0))

where f3(1) represents the value of the factor function f3 updated after the outer-layer cycle t becomes 0 cycle, f3(0) represents the value of the initialized factor function f3, f 3' (0) is the empirical distribution of the factor function f3 obtained from the probability distribution of the variables of E-step, and p3(0) represents the sample distribution of the factor function f3 when t becomes 0.

It is understood that if t is 2, then f3(2) is f3(1) × (f 3' (1)/p3 (1));

here, f3(2) is the value of the factor function f3 updated cyclically with t being 1, f 3' (1) is the empirical distribution of the factor function f3 when t being 1, and p3(1) is the sampling distribution of the factor function f3 when t being 1, and so on, and the description thereof is omitted.

Updated values of other factor functions can be obtained as well, and are not described herein again.

It is understood that if there are multiple instances of the same rule r3, f 3' (0) is the result of adding multiple empirical distributions obtained from multiple instances, and similarly, p3(0) is the result of adding sample distributions obtained from multiple instances.

For example, if three triplets x9, x10, and x11 are also examples of the rule r3, then an empirical distribution and a sample distribution are obtained according to the above process, as with x4, x5, and x8, then f 3' (0) in the formula is the sum of the two empirical distributions, and p3(0) is the sum of the two sample distributions.

And updating t to t +1, and entering the next outer layer cycle.

And when the outer layer cycle is finished for T times, obtaining a group of updated factor functions f1(T), f2(T), f3(T) and f4 (T).

It can be understood that the factor function obtained by the M-step loop is the result of the loop, the resulting factor function is to be used again to initialize the factor function of the E-step, and the loop is continued until the obtained factor function and the probability distribution of the variable do not change (i.e. converge), the obtained factor function is the final factor function, and can be used to judge the correctness of the rule.

After the EM algorithm is finished, obtaining probability distribution of all variables, wherein the probability distribution of the expansion triple is target probability distribution, a preset threshold value is set, whether the expansion triple is credible or not is determined according to the target probability distribution and the preset threshold value, and the target probability distribution is first probability distribution or second probability distribution.

For example, assuming that the probability distribution of the extended triplet e8 is {0.1,0.9}, i.e. the first probability distribution is 0.9, and the second probability distribution is 0.1, if a predetermined threshold is set as a confidence threshold i (e.g. i ═ 0.98), the target probability distribution is taken as the first probability distribution, i.e. 0.9, and when 0.9> j, the extended triplet is considered as being trustworthy and can be placed in the knowledge base to be augmented, and if 0.9< j, the extended triplet is considered as being untrustworthy and is not placed in the knowledge base. If the threshold is set to be an untrusted threshold j (for example, j is 0.2), the target probability distribution is 0.1, and the extended triplet is trusted when the second probability distribution is less than 0.2 and is untrusted when the second probability distribution is greater than 0.2.

It should be noted that, the formula f (t +1) ═ f (t) × (f' (t)/p (t)) for determining the factor function can be derived by the following procedure:

after a factor graph is constructed, the learning problem of the credibility of the rule is converted into the learning of the factor:

suppose that

Representing a factor graph in which the sets of variable nodes and factor nodes are respectivelyAnd

for any one

In (2), we denote by N (u) the set of nodes connected to node u, where u denotes

One node in the set, for example, the node connected to the factor node f2 in the embodiment of the present invention, has variable nodes x3, x6, and x 8. By using

Representing any one set, for any one setWe use

Represents from

To

Of a set of mapping functions, wherein

To represent

Is that

Is selected from the group consisting of (a) a subset of,comprises

Some or all of the variable nodes. By usingTo representIn a set of unknown functions belonging to the same family, i.e.The method includes representing a set of a plurality of functions, that is, representing a set of factor functions belonging to the same family, where each function in the set corresponds to the same rule, and values of the functions are consistent, for example, each rule in the embodiment of the present invention is an unknown function summarized from a plurality of instances belonging to the same family. For the

A function of f:

there are m (f) variables, and each variable is derived from

Taking the value in the step (1). Let factor graph

Each section ofDot

And a function

In the context of a correlation, the correlation,

denotes g_uIs that

Wherein m (g) is_u) | n (u) |. Where u represents a factor node in the factor graph, g_uRepresenting a factor function corresponding to the factor node u; n (u) represents a set of variable nodes directly connected to the factor node u; m (g)_u) Where | n (u) | represents the size of the set of variable nodes directly connected to the factor node u, i.e. the number of variable nodes, it should be noted that, for different nodes u,g_uand g_u′May be the same, i.e. if g_uAnd g_u′Corresponding to the same rule, then g_uAnd g_u′The same is true. For each function

We denote the aggregate by ^ (f)

I.e., the expression of ^ (f)

The set of all nodes in relation to the function f, i.e. the ^ (f) representation

The set of all functions belonging to the same family as the function f, i.e. all factor functions in the set correspond to the same rule. Then

Product form of representative function

Assuming that the product represents a set of random variables

Joint distribution of

Wherein the content of the first and second substances,representing a set of variable nodes in the factor graph;representation and variable node

Related variables, where each variable is consistent with each variable in embodiments of the present invention;

representing a set of such variables, e.g. a set of three variables corresponding to a factor function in the embodiment, i.e. here

I.e., variable x in the embodiment of the present invention, since

Is unknown, meaning that this joint distribution is also unknown.

Assuming an empirical distribution is observed

Representing a joint distribution

The empirical distribution of (3) is an empirical distribution of a factor function calculated from an empirical distribution of variables obtained from an E-step of the EM algorithm in an embodiment of the present invention. The objective is to estimate each factor function based on this empirical distribution

Specifically, by the pair

Sampling M times, each value

Observe that

Second, the observed set { a }⁽ⁱ⁾I is 1,2, …, denoted D.

By usingRepresenting logp (D | F), where logp (D | F) represents the likelihood function given the observation set D and p (D | F) represents the posterior distribution of F, then we need to maximize

For each one

By D_:SDenotes the mapping of the combination observed in D on S, e.g. a set of variables S ═ x3, x6, x8]May be [0, 0]],[0,0,1],…,[1,1,1]. For each one

And any one vector

By m_S(a) The number of times the observed combination is a is indicated. For any purpose

By a_:SThe subvector mapped on S is represented, that is, a is represented for the specific value of the variable combination S.

We define:

wherein the content of the first and second substances,

representing a set of variables, e.g. (x3, x6, x8), a representing a specific value of the set of variables, e.g., [0,0],…,[1,1,1]Then, there is,

is represented by A

The following can be obtained:

：＝A-M log z(10)

now for each

We derive

So as to obtain the compound with the characteristics of,

and

is that

The function of (a):

wherein, Λ^*(f) The combination of all the variables corresponding to the set of functions Λ (f) is represented.

For each a ∈ ^ a^*(f) The method comprises the following steps:

introduction 1: for each one

The above-mentioned partial derivative formula is set to 0 for each

The following can be obtained:

moving f (b) to one side of the equation as f^t+1(b) An iterative formula of the algorithm can be obtained:

wherein the content of the first and second substances,a sampling distribution obtained by sampling in the t-th round, for example, a sampling distribution p3(0) of a factor function calculated by an example in the embodiment of the present invention, is shown.

It should be noted that, for convenience of description, the iterative formula (16) is expressed as:

f(t+1)＝f(t)*[f’(t)/p(t)]，

wherein f (t) represents the value of the factor function in the t round, i.e. f in formula (16)^t(b) T is a positive integer greater than or equal to 0 and the initial value of t is 0, f (0) is the value of the initialized factor function, f' (t) represents the empirical distribution of said factor function over t rounds, i.e. in equation (16)

p (t) represents the sampling distribution of the factor function in the t-th round, i.e. in equation (16)

According to the method for checking the triple of the knowledge base, the target probability distribution of the expanded triple and the factor function of the rule corresponding to the expanded triple are continuously updated in the EM algorithm, so that the correct probability of the rule, namely the credibility of the rule, is further learned on the basis of the expanded knowledge base, the calculation of the credibility of the expanded triple is obtained by calculation based on the factor function of the rule corresponding to the expanded triple, and the calculation of the factor function of the rule corresponding to the expanded triple relates to the credibility of the original triple in the knowledge base, so that the calculation of the credibility of the expanded triple also takes the global correlation between the knowledge into consideration, and the knowledge base can be efficiently, highly accurately and qualitatively expanded.

Example four

The embodiment provides a device for checking triple of a knowledge base, which is used for executing the method for checking triple of a knowledge base in the first embodiment.

Fig. 3 is a schematic structural diagram of the apparatus for checking a triple of a knowledge base provided in this embodiment. The apparatus 40 for checking triple of knowledge base of the present embodiment includes an obtaining module 41, a determining module 42 and a processing module 43.

The obtaining module 41 is configured to obtain a rule corresponding to an extended triple, where the extended triple is a triple obtained by performing an extension operation based on an original triple and the rule in an existing knowledge base, the extended triple includes an ordered set at least including a first statement, a relational statement, and a second statement, and the relational statement is used to represent a relationship between the first statement and the second statement; the determining module 42 is configured to determine a factor function corresponding to the rule obtained by the obtaining module 41, where the factor function is used to indicate a probability of whether the rule is correct, and the factor function is obtained according to an initial factor function and an EM algorithm; the processing module 43 is configured to determine whether the extension triplet is authentic according to the factor function determined by the determining module 42.

The specific manner in which the respective modules perform operations has been described in detail in relation to the apparatus in this embodiment, and will not be elaborated upon here.

According to the device for inspecting the triple of the knowledge base, the rule corresponding to the expansion triple is obtained, the factor function corresponding to the rule is determined according to the initial factor function and the EM algorithm, whether the expansion triple is credible or not is determined according to the factor function, whether the expansion triple is put into the knowledge base or not can be further determined, the knowledge base is expanded, and the accuracy of expanding the knowledge base is improved.

EXAMPLE five

This embodiment further supplements the apparatus for checking triple in a knowledge base in the fourth embodiment to perform the pattern editing method in the second embodiment.

As shown in fig. 4, the processing module 43 in the apparatus 40 for knowledge base triple verification of the present embodiment includes a first sub-module 51 and a second sub-module 52.

The first submodule 51 is configured to determine, according to belief propagation and a factor function, a first probability distribution and a second probability distribution corresponding to the extended triplet, where the first probability distribution is used to indicate a probability that the extended triplet should be trusted, the second probability distribution is used to indicate a probability that the extended triplet is untrustworthy, and the second probability distribution is 1 — the first probability distribution; the second sub-module 52 is configured to determine whether the extension triplet is reliable according to a target probability distribution and a preset threshold, where the target probability distribution is a first probability distribution or a second probability distribution.

Optionally, the determining module 42 is specifically configured to determine the factor function f (t +1) after the iterative operation is performed by the EM algorithm according to the following formula:

f(t+1)＝f(t)*[f’(t)/p(t)]；

Optionally, the determining module 42 is further configured to stop the iterative operation when the value of f (t) no longer changes.

According to the device for checking the triple of the knowledge base, the target probability distribution of the expanded triple and the factor function corresponding to the rule corresponding to the expanded triple are continuously updated in the EM algorithm, so that the correct probability of the rule, namely the reliability of the rule, is further learned on the basis of the expanded knowledge base, the calculation of the reliability of the expanded triple is obtained by calculation based on the factor function corresponding to the corresponding rule, and the calculation of the factor function corresponding to the corresponding rule relates to the reliability of the original triple in the knowledge base, so that the global correlation among knowledge is also considered in the calculation of the reliability of the expanded triple, and the knowledge base can be efficiently, high-quality and high-accuracy expanded.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of triple verification of a knowledge base, comprising:

determining whether the extension triple is credible according to the factor function;

the determining a factor function corresponding to the rule includes:

f(t+1)＝f(t)*[f’(t)/p(t)]；

2. The method of claim 1, wherein the determining whether the extended triplet is trustworthy according to the factor function comprises:

determining, according to belief propagation and the factor function, a first probability distribution and a second probability distribution corresponding to the extended triplets, the first probability distribution representing a probability that the extended triplets are trustworthy, the second probability distribution representing a probability that the extended triplets are untrustworthy, and the second probability distribution being 1 — the first probability distribution;

3. The method of claim 2, wherein the determining whether the extension triplet is trustworthy according to the target probability distribution and the preset threshold comprises:

4. A method according to any of claims 1 to 3, wherein the iterative operation is stopped when the value of f (t) no longer changes.

5. An apparatus for triple verification of a knowledge base, comprising:

the processing module is used for determining whether the extension triple is credible according to the factor function;

the determining module is specifically configured to:

f(t+1)＝f(t)*[f’(t)/p(t)]；

6. The apparatus of claim 5, wherein the processing module comprises:

a first submodule, configured to determine, according to belief propagation and the factor function, a first probability distribution and a second probability distribution corresponding to the extended triplet, where the first probability distribution is used to represent a probability that the extended triplet is trustworthy, the second probability distribution is used to represent a probability that the extended triplet is untrustworthy, and the second probability distribution is 1 — the first probability distribution;

7. The apparatus of claim 6, wherein the second submodule is specifically configured to:

8. The apparatus of any of claims 5-7, wherein the determining module is further configured to:

the iterative operation stops when the value of f (t) no longer changes.