CN111191460A - Relation prediction method combining logic rule and fragmentation knowledge - Google Patents
Relation prediction method combining logic rule and fragmentation knowledge Download PDFInfo
- Publication number
- CN111191460A CN111191460A CN201911390283.4A CN201911390283A CN111191460A CN 111191460 A CN111191460 A CN 111191460A CN 201911390283 A CN201911390283 A CN 201911390283A CN 111191460 A CN111191460 A CN 111191460A
- Authority
- CN
- China
- Prior art keywords
- rule
- logic
- relation
- knowledge
- knowledge base
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
- G06N5/025—Extracting rules from data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Abstract
The invention relates to a relation prediction method combining logic rules and fragmentation knowledge. Firstly, uniformly modeling a fact triple and a logic rule, and embedding hidden semantic information into a relation inference model based on knowledge representation; secondly, combining fragmentation knowledge, continuously iterating and updating, so that the knowledge base becomes more complete. The invention carries out unified modeling on the fact triple and the logic rule, and embeds the hidden semantic information into the relation inference model based on knowledge representation, thereby realizing more accurate prediction.
Description
Technical Field
The invention relates to a relation prediction method combining logic rules and fragmentation knowledge.
Background
In the field of relational reasoning, a relational reasoning model represented by TransE [1] has been a research focus in recent years because it is simple and efficient and has good prediction performance. The TransE model directly models fact triples (h, r, t) in a knowledge base, and the basic idea is to map entities and relations in the knowledge base into a low-dimensional continuous vector space, so that the related calculation of the knowledge base is simplified. Although simple and efficient, the basic representation learning model only considers the direct fact triples (h, r and t) in the knowledge base and ignores the semantic information hidden in the knowledge base, so that the inference precision is limited. Some recent work has utilized the addition of external data such as entity types, textual descriptions, logic rules, etc. to further improve inference accuracy. Document [2] reduces the influence of noisy data on the model by introducing a domain and a range of relationships to filter out some erroneous samples. Document [3] considers adding entity context information to the representation learning model, thereby improving the semantic expression capability of the model. Document [4] improves its reasoning performance by uniformly modeling additional textual information and direct fact triples in the knowledge base. Document [5] first extracts a set of horns logical rules that can represent knowledge base semantic information through a rule mining system. A new set of facts is then derived through a rule-based materialization inference method. Document [6] represents semantic relationships between entities using a number of multi-hop relationship paths existing between the entities.
With the rapid development of the internet, new knowledge fragments are continuously generated, and the knowledge base is no longer static. Therefore, when the relational reasoning technology is applied to realize the automatic completion of the knowledge base, the dynamic growth situation of the knowledge base should be considered. In recent years, relational reasoning based on knowledge representation learning has received great attention. However, most existing knowledge representation learning methods only use fact triples to perform embedding, and ignore some hidden semantic information in a knowledge network, so that not only can learned vectors not accurately express semantic relations in an original knowledge base, but also values brought by fragmented knowledge cannot be fully utilized. Therefore, the invention provides a relation prediction method combining logic rules and fragmentation knowledge. The method comprises the steps of firstly, carrying out unified modeling on fact triples and logic rules, and embedding hidden semantic information into a relation inference model based on knowledge representation in such a mode. And secondly, combining fragmentation knowledge and continuously iterating and updating to enable a knowledge base to be more complete and realize relationship prediction.
Disclosure of Invention
The invention aims to provide a relation prediction method combining logic rules and fragmented knowledge.
In order to achieve the purpose, the technical scheme of the invention is as follows: a relation prediction method combining logic rules and fragmented knowledge comprises the steps of firstly carrying out unified modeling on fact triples and logic rules, and embedding hidden semantic information into a relation inference model based on knowledge representation; secondly, combining fragmentation knowledge, continuously iterating and updating, so that the knowledge base becomes more complete.
In an embodiment of the present invention, the method is specifically implemented as follows:
the first stage is as follows: modeling the direct fact triples in the knowledge base to obtain vector expressions of all entities and relations in the knowledge base, wherein the vector expressions are used for calculating the semantic association degree among the rules at the third stage;
and a second stage: digging out a group of logic rules which can represent semantic information of a knowledge base through a rule mining algorithm;
and a third stage: the logical rule reasoning phase is applied in two ways: firstly, reasoning out new facts and adding the new facts into a knowledge base through materialization reasoning based on logic rules to realize dynamic expansion of the knowledge base; secondly, the relation r in the fact triples (h, r, t) is represented by a logic rule instead, so that the logic rule is embedded into a relation reasoning model based on representation learning, and both h and t represent entities; because a plurality of inference rules with the relation r as a rule head exist in the knowledge base, a method for semantic association between different rule bodies with the relation r as the rule head and the relation r is provided;
a fourth stage: based on the output of the first stage to the third stage as the input of the fourth stage, the unified modeling is carried out on the fact triples and the logic rules, and through the mode, the semantic information rich in the logic rules is embedded into a relational inference model RTransE based on representation learning, and then the relational inference is carried out through the trained RTransE model, so that the completion of the knowledge base is realized;
the fifth stage: and combining the dynamic knowledge fragments, and continuously updating in an iterative manner, so that the knowledge base becomes more complete.
In the fourth stage, in an embodiment of the present invention, the unified modeling process for the fact triples and the logic rules is as follows:
given a triplet (h, r, t), the TransE model satisfies the following relationship when the triplet is formed: h + r is approximately equal to t; counting the defects in | | h + r-t | |, the Y ray1Based on the above, the normalization improvement is performed on the triple score function, as shown in the following formula (1):
d (h, r, t) | | | h + r-t | | non-woven hair in formula (1)1For the distance function, it can be easily seen that f (h, r, t) is e [0,1 ∈]If the triplet is true, f (h, r, t) should be as small as possible, otherwise as large as possible;
the model uses the existing fact triples in the knowledge base as positive examples, randomly replaces the triples which are generated by head and tail entities and relations and contradict the existing facts in the knowledge base as negative examples for training, and the triple modeling loss function is shown in the following formula (2):
in formula (2), S ═ S1,s2,...si,...snIs a set of fact triples,in order to be a set of positive example triples,for negative example sets of triplets, γ represents an adjustable hyper-parameter.
In an embodiment of the present invention, in the fourth stage, a specific process of embedding semantic information rich in logic rules into a relational inference model RTransE based on representation learning is as follows:
distance function embedded with logical rule joint representationAs shown in the following equation (3):
k is the number of rulers of the ith logic rule with the relation r as a rule head; b isiA rule body of the ith logic rule; if the logical rule can represent semantic information for the relationship r, thenShould be as close to 0 as possible, otherwise as large as possible;
the expression learning model embedded with the logic rule uses the inferred logic rule as a positive example, and a rule which is generated by a random replacement rule header and contradicts with the existing logic rule as a negative example, and the loss function of the logic rule is shown in the following formula (4):
where LR is a set of logical rules,in order to be a positive example of a set of logical rules,for the negative set of examples, γ represents an adjustable hyper-parameter,the confidence of the ith logical rule with the relation r as the rule head is shown,expressing the semantic association degree of the ith logical rule taking the relation r as a rule head and the relation r;
representation learning model loss function embedded in rule logic rulesAs shown in the following equation (5):
as shown in the formula (5), the model loss function consists of two parts, namely a direct fact triple distance function of the knowledge base and a distance function of the logical rule and the relation r.
In one embodiment of the present invention, the embedded representation of an entity in the instantiation of an entity type substitution rule will be more predictive, and therefore:
distance function D (h, r, t, h) of binding entity type improved for D (h, r, t) in formula (1)type,ttype) As shown in the following equation (6):
D(h,r,t,htype,ttype)=||(h+htype)+r-(t+ttype)||1(6)
h in formula (6)typeIndicates the entity type corresponding to the head entity h, ttypeRepresenting the entity type corresponding to the tail entity t;
triple modeling loss function of adding entity type for improving triple modeling loss function of formula (2)As shown in the following equation (7)
EL in equation (7) is an entity type tag set, where the entity tag set EL ═ EL1, EL2, and Eln, and it represents a set of tags that can represent all entity classes in the knowledge base, and f (h, r, t, h, and f) are set of tags that can represent all entity classes in the knowledge basetype,ttype) As a new triple scoring functionSpecifically, the following formula (8) is shown:
distance function of improved embedded logic rule and entity type joint expression for formula (3)The following formula (9):
m in formula (9)ie represents the addition of the type vectors of the connected variable entities of the rule body of the ith rule of the relation r;
logic rule loss function for adding entity type improved from equation (4)As shown in the following equation (10)
M in formula (10)ie represents the addition of the type vectors of the connected variable entities of the rule body of the ith rule of the relation r;
learning model loss function by adding representation of embedded rule logic rule of entity type improved from equation (5)The following formula (11):
in an embodiment of the present invention, in the fifth stage, a conditional function Rt of model iterative training is triggered, as shown in the following formula (12):
in the equation (12), # facts and # entity are the number of fact triples and the number of entities in the temporary repository KB', respectively, and θ is the model iterative training threshold.
In an embodiment of the present invention, the method is applied to human relationship prediction.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention carries out unified modeling on the fact triples and the logic rules, and embeds the hidden semantic information into a relation inference model based on knowledge representation, thereby realizing more accurate prediction;
1. the method can operate the activation strategy aiming at the dynamically inflowing knowledge fragments, adapts to the dynamic knowledge network and realizes more accurate knowledge reasoning.
Drawings
FIG. 1 is a block diagram of the method of the present invention.
FIG. 2 is an example temporary repository.
FIG. 3 is a relational inference overall framework diagram.
FIG. 4 is a diagram of a process for reasoning in conjunction with logic rules and fragmented knowledge.
Detailed Description
The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.
The invention provides a relation prediction method combining logic rules and fragmented knowledge, which comprises the steps of firstly, carrying out unified modeling on fact triples and logic rules, and embedding hidden semantic information into a relation inference model based on knowledge representation; secondly, combining fragmentation knowledge, continuously iterating and updating, so that the knowledge base becomes more complete. The method is concretely realized as follows:
the first stage is as follows: modeling the direct fact triples in the knowledge base to obtain vector expressions of all entities and relations in the knowledge base, wherein the vector expressions are used for calculating the semantic association degree among the rules at the third stage;
and a second stage: digging out a group of logic rules which can represent semantic information of a knowledge base through a rule mining algorithm;
and a third stage: the logical rule reasoning phase is applied in two ways: firstly, reasoning out new facts and adding the new facts into a knowledge base through materialization reasoning based on logic rules to realize dynamic expansion of the knowledge base; secondly, the relation r in the fact triples (h, r, t) is represented by a logic rule instead, so that the logic rule is embedded into a relation reasoning model based on representation learning, and both h and t represent entities; because a plurality of inference rules with the relation r as a rule head exist in the knowledge base, a method for semantic association between different rule bodies with the relation r as the rule head and the relation r is provided;
a fourth stage: based on the output of the first stage to the third stage as the input of the fourth stage, the unified modeling is carried out on the fact triples and the logic rules, and through the mode, the semantic information rich in the logic rules is embedded into a relational inference model RTransE based on representation learning, and then the relational inference is carried out through the trained RTransE model, so that the completion of the knowledge base is realized;
the fifth stage: and combining the dynamic knowledge fragments, and continuously updating in an iterative manner, so that the knowledge base becomes more complete.
The following is a specific implementation of the present invention.
The method provided by the invention mainly comprises five stages: in the first stage, modeling is carried out on the direct fact triples in the knowledge base to obtain vector expressions of all entities and relations in the knowledge base, and the vector expressions are used for calculating the semantic association degree among the rules in the third stage. And in the second stage, a group of logic rules which can represent semantic information of the knowledge base is mined through a rule mining algorithm. In the third stage, a logical rule reasoning stage is applied, and two main modes are provided: firstly, reasoning out new facts and adding the new facts into a knowledge base through materialization reasoning based on logic rules to realize dynamic expansion of the knowledge base; and secondly, the relation r in the fact triple (h, r, t) is replaced and expressed by a logic rule, so that the logic rule is embedded into a relation inference model based on expression learning. Because a plurality of inference rules with the relation r as a rule head exist in the knowledge base, three methods for calculating semantic association degrees between different rule bodies with the relation r as the rule head and the relation r are provided; and in the fourth stage, the output of the three stages is used as the input of the fourth stage, the key idea is to uniformly model the fact triples and the logic rules, and through the way, the semantic information rich in the logic rules is embedded into a relational inference model based on representation learning, and then the relational inference is carried out through a trained RTransE model, so that the completion of the knowledge base is realized. And in the fifth stage, combining the dynamic knowledge fragments, and continuously updating in an iterative manner, so that the knowledge base becomes more complete. The overall framework design of the method of the invention is shown in figure 1:
the relevant definitions herein are given below.
Definition 1 (repository, KB) sets repository KB ═ E, R, F, P, V >, where E denotes a set of entitlements, R denotes a set of relationships, F denotes a set of facts in the repository, P denotes a set of properties, and V denotes a set of values.
Definition 2 (entity set, E) sets E { E1, E2., en }, (ii) subject (kb) ∪ ii object (kb), which describes all entities in the semantic network knowledge base data layer and corresponds to the set of instances in the RDF.
Definition 3 (relationship set, R) sets a relationship set R { R1, R2.., rn }, ii relationship (kb), which represents a relationship between entities.
Definitions 4 (fact set, F) set of factsIt represents the set of all instance triples in the knowledge base.
Definition 5 (attribute, P) attribute set P represents a set P ═ P1, P2.., pn } of overall attributes, which associates E with attribute value V.
Definition 6 (attribute value, V) attribute value set V represents a set V of overall attribute values { V1, V2.
Definition 7 (entity tag set, EL) sets an entity tag set EL ═ EL1, EL 2. For commonly used datasets such as YAGO and DBpedia, PER, LOC and ORG are respectively extended herein, 39 types are defined as entity tag sets in the document, denoted as EL, and Cf ═ { PER | ORG | LOC } represents a set of three major classes. As shown in table 1.
TABLE 1 entity tag set
The method comprises the following steps:
1. triple modeling
Given a triplet (h, r, t), the TransE model is expected to satisfy the following relationship as much as possible when the triplet is formed: h + r ≈ t. For example, yaoming + nationality is china and james + nationality is the united states. When the text is in | | h + r-t | | non-woven hair1Based on the three-element score function, the normalization improvement is carried out.
A score function f (h, r, t) of 8 triplets is defined, as shown in equation 1 below:
d (h, r, t) | | | h + r-t | | non-woven hair in formula (1)1For the distance function, it can be easily seen that f (h, r, t) is e [0,1 ∈]If a triplet is true, f (h, r, t) should be as small as possible, and vice versa as large as possible.
The model is trained by using the existing fact triples in the knowledge base as positive examples and randomly replacing the triples which are generated by head and tail entities and relations and contradict the existing facts in the knowledge base as negative examples, and the loss function of the model is defined as the following definition 9.
formula (2) S ═ S1,s2,...si,...snIs a set of fact triples,as a set of positive example triplesFor negative example sets of triplets, γ represents an adjustable hyper-parameter.
2. Logical rule mining
In connection with document [7], the present invention implements an algorithm HornConcerto that finds Horn logic rules in large graph data. The algorithm is superior to the existing method in terms of running time and memory consumption, and higher-quality logic rules are mined for knowledge reasoning tasks. The HornConcerto algorithm is inspired by the ANIE + algorithm, and a confidence measure method PCA is also introduced. The support and confidence of the logic rules are shown in definitions 10 and 11.
wherein the support degree suppp of the rule represents the number of fact triples satisfying the rule head and the rule body simultaneously in the knowledge base, z1,...zmRepresenting rule variables other than x and y.
where the numerator represents the support of the rule and y' in the denominator represents all possible relationships of the rule header computed by the PCA hypothesis. The confidence of a rule reflects the confidence of the rule and the expressed semantic richness, with confidence being closer to 1 and more reliable.
3. Applying logic rules
The knowledge base itself typically already contains enough information to derive and add new facts. Some rules may be found in the knowledge base by a rule mining algorithm. For example, we can mine the rules:
this rule captures the fact that a person's spouse often lives in the same place as the person.
As known from background knowledge, these logic rules imply rich semantic information, which will make knowledge reasoning more predictive. While the logical rules mined herein apply to two aspects:
first, the relation r in the fact triple (h, r, t) is replaced by a logic rule, so that the logic rule is embedded into a relation inference model based on representation learning. However, there may be multiple inference rules in the knowledge base with the relationship r as the rule head. For example, the following two rules can infer the relationship of "nationality", but the semantic association degree between the "nationality" and the "nationality" is different.
In this subsection, we will introduce three methods to measure semantic association between different rulers with relation r as the rule header and relation r. Definition ofThe N logic rules corresponding to the rule head relation r.
Define 12 equol (average): the influence degree of each inference rule on the relation r is considered to be the same, so the semantic association degree calculation method for the ith logic rule with the relation r as the rule head and the relation r is shown in the following formula 5:
define 13NumberRatio (number ratio): the semantic association degree is measured by calculating the proportion of the number of fact triples satisfying the logic rule in the knowledge base to the sum of the number of fact triples satisfying the N logic rules, so that the semantic association degree calculation method for the ith logic rule with the relation r as the rule head is shown in the following formula 6:
define 14 vectordestination (vector distance): the semantic association degree between the rule body and the rule head in the logic rule is calculated through the relation vector representation learned by the TransE model, that is, the similarity between vectors is measured by utilizing the cosine value of the included angle between two vectors in the vector space, so that the semantic association degree calculation method for the ith logic rule taking the relation r as the rule head and the relation r is shown in the following formula 7:
n in the formula (7) is the dimension of the vector, k is the number of the rulers in the ith logic rule, BiIs the rule body of the ith logic rule, T is the normalization factor, which can be easily seenThe closer the cosine value is to 1, the more the included angle is indicatedNear 0, the more similar the vector.
And secondly, reasoning out new facts and adding the new facts into a knowledge base through materialization reasoning based on logic rules, so that knowledge is more fully utilized. However, the untrusted fact triplets will contribute more noise to the model, and filtering the noise data through equations (3) and (4) herein introduces more reliable rule and fact triplet information into the model.
The confidence function tripleconf (h, r, t) of the inferred new fact triplets is defined 15, as shown in equation 8 below:
4. representation learning model RTransE of embedded logic rule
The train series model introduced in the previous section only considers the direct fact triples in the knowledge base, and only considers the logic rule with the highest confidence coefficient in the expression learning method based on the logic rule, but ignores the influence of other inference rules on the fact triples, so that the method has the problem of low inference precision when facing complex relation types of 1-N, N-1 and N-N. This section presents a relational inference model RTransE based on logic rules and representation learning. The schematic diagram of the model is shown in fig. 1-part 4.
As can be seen from the schematic diagram of fig. 1-part4, the model considers not only the direct relationship r between the entity h and the entity t in the triplet (h, r, t), but also considers N logic rules with the relationship r as the rule header, so that the entity relationship vector learned by the expression learning model embedded with the logic rules can more completely express semantic information in the knowledge base, and more accurate prediction is realized.
Defining 16 a distance function for a joint representation of embedded logical rulesAs shown in equation 9 below:
k is the number of rulers of the ith logical rule with the relation r as the rule head. If the logical rule can represent semantic information for the relationship r, thenShould be as close to 0 as possible, and vice versa as large as possible.
The expression learning model embedded with the logic rules uses the inferred logic rules as positive examples and randomly replaces rules generated by the rule header that contradict the existing logic rules as negative examples. The loss function is shown in definition 17 below.
where LR is a set of logical rules,in order to be a positive example of a set of logical rules,for the negative set of examples, γ represents an adjustable hyper-parameter,the confidence of the ith logical rule with the relation r as the rule head is shown,and expressing the semantic association degree of the ith logical rule taking the relation r as a rule head and the relation r.
Defining 18 a representational learning model loss function of an embedded rule logic ruleAs shown in equation 11 below:
as shown in the formula (11), the model loss function is composed of two parts, namely a direct fact triple distance function of the knowledge base and a distance function of the logical rule and the relation r. The main idea is to let the distance function values for those positive samples be much smaller than for negative samples.
5. Influence of entity type on representation learning model
From equation (9), the distance functionOnly the logical rules themselves are considered, and the impact of the entities in the instantiation of the rules on the embedding model is ignored. Instantiations of a rule necessarily correspond to different entities, and it is therefore difficult to properly represent the entities. To alleviate this problem, we represent entities using entity types present in the knowledge base, e.g., for logical rules
In other words, instance y must correspond to a different entity, but its corresponding entity type is the same, so the embedded representation of the entity in the instantiation of the substitution rule introduced into the entity type will be more predictive.
Defining 19 a distance function D (h, r, t, h) for binding entity types that improves D (h, r, t) in equation (1)type,ttype) As shown in the following equation 12:
D(h,r,t,htype,ttype)=||(h+htype)+r-(t+ttype)||1formula (12)
H in formula (12)typeIndicates the entity type corresponding to the head entity h, ttypeAnd representing the entity type corresponding to the tail entity t.
Definitions 20 Embedded logic rules with improvements to equation (9)Then the distance function represented jointly with the entity typeAs shown in equation 13 below:
wherein M isie represents the addition of the type vectors of the connected variable entities of the rulebody of the ith rule of the relationship r.
Defining 21 a triple modeling penalty function for adding entity types that improves on equation (2)As shown in equation 14 below:
EL in formula (14) is entity type tag set, and the specific reference definition is 7, f (h, r, t, h)type,ttype) The new triple score function is specifically expressed as shown in the following equation 15.
Defining 22 a logical rule loss function for joining entity types that improves on equation (10)As shown in equation 16 below:
defining 23 a learning model loss function for embedding rules logic rules for adding entity types that improves equation (11)As shown in the following equation 17。
6. Relationship reasoning incorporating dynamic knowledge fragmentation
The modern era is the era of the rapid development of the Internet, and new knowledge fragments are continuously generated, so that the knowledge base is not static any more. The utilization of dynamic knowledge fragmentation for relational reasoning is also one of the extremely effective means for realizing the dynamic growth of the knowledge base.
For example, if the fact triple "< yaoming, place of birth, shanghai >" exists in the knowledge base, and the newly inflowing knowledge fragment "< shanghai, country of the country, china >" is combined, if the corresponding entity vector can be found in the knowledge base by the "shanghai" and the "china", the relationship of "nationality" can be complemented between the two entities "yaoming" and "china" through the trained RTransE model. And if not, adding the data into the temporary knowledge base pool, merging the temporary knowledge base KB' and the original knowledge base KB when the ratio of the number of the fact triples to the number of the entities in the pool reaches the threshold theta of the iterative training, retraining to obtain a new logic rule and a new RTransE model, and performing the relational reasoning by combining with the fragmentation knowledge. This is repeated until no new pieces of knowledge are flowing in or new facts are no longer being generated.
The threshold value is set, so that the iteration number is reduced, and the algorithm execution efficiency is improved. The subsection uses the proportion of the number of the fact triples to the number of the entities to measure the discrete degree of the temporary knowledge base, and the larger the proportion is, the closer the relation between the entities is, and the potential relation between the entities can be mined. At this point, retraining is more meaningful. And vice versa. It is clear from fig. 2 that the temporary knowledge base satisfying fig. 2-a is more compliant with the retraining criteria.
A conditional function Rt is defined 24 that triggers iterative training of the model, as shown in equation 18 below.
In the equation (18), # facts and # entity are the number of fact triples and the number of entities in the temporary repository KB', respectively, and θ is the model iterative training threshold.
7. Application of the method of the invention
Despite the fact that a common knowledge base already contains millions of entities and hundreds of millions, it is still incomplete. Completion of the knowledge base is to reason and predict the relationship between the entities by the prior knowledge combined with the fragmented knowledge. The invention realizes further knowledge base completion by realizing an algorithm HornConcercato [8] and RACRFK algorithm for discovering the Horn logic rule in the large-scale graph data, thereby realizing more full utilization of knowledge.
As known from background knowledge, logic rules imply rich semantic information, which makes knowledge reasoning more predictive. Therefore, a set of horns logical rules which can represent semantic information of the knowledge base is mined by using a rule mining algorithm hornConcerto. Secondly, the logic rule is applied to a relational inference algorithm RACRFK combining the logic rule and dynamic knowledge fragmentation, fragmentation knowledge is combined, and the updating is continuously iterated, so that a knowledge base becomes larger and more complete.
The overall framework diagram for the relational inference using the HornConcerto algorithm and the RACRFK algorithm is shown in fig. 3.
First, a set of hornlogical rules that can represent semantic information of a knowledge base is mined through a hornConcerto rule mining algorithm, for example, such a rule can be mined
Next, for example, the knowledge base is complemented by a relational inference algorithm RACRFK combining logic rules and fragmented knowledge. The reasoning process combining logic rules and fragmentation knowledge is shown in figure 4.
It can be observed from fig. 4 that the inflow of one knowledge fragment may trigger the model to iteratively reason thousands of facts about the knowledge base, which reflects the value that the model can make full use of the dynamic knowledge fragments. In fig. 4, the dotted line (i.e., the content of the nationality part in the figure) is a new fact inferred by the present model in combination with the fragmentation knowledge.
Reference documents:
[1]Bordes A,Usunier N,GarcíaduránA,et al.Translating Embeddings forModeling Multi-relational Data.[C]//International Conference onNeuralInformation Processing Systems.2013.
[2]Krompaβ,Denis,S.Baier,andV.Tresp."Type-Constrained RepresentationLearning in Knowledge Graphs."(2015).
[3]Xie R,Liu Z,Sun M.Representation learning ofknowledge graphs withhierarchical types[C]//International Joint Conference onArtificialIntelligence.AAAI Press,2016.
[4]Wang Z,Li J.Text-enhancedrepresentation learning for knowledgegraph[C]//International Joint Conference onArtificial Intelligence.AAAIPress,2016.
[5] the knowledge graph of the rule enhancement of Chenxi, Chenhuajun, Zhang Wen represents the learning method [ J ] information engineering, 2017,3(1):026-034.
[6]Lin Y,Liu Z,Luan H,et al.Modeling Relation Paths forRepresentation Learning of Knowledge Bases[J].Computer Science,2015.
[7]Soru T,Valdestilhas,André,Marx E,et al.Beyond Markov Logic:Efficient Mining of Prediction Rules in Large Graphs[J].2018
[8]Soru T,Valdestilhas,André,Marx E,et al.Beyond Markov Logic:Efficient Mining of Prediction Rules in Large Graphs[J].2018.。
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.
Claims (7)
1. A relation prediction method combining logic rules and fragmented knowledge is characterized in that a fact triple and the logic rules are modeled uniformly, and hidden semantic information is embedded into a relation inference model based on knowledge representation; secondly, combining fragmentation knowledge, continuously iterating and updating, so that the knowledge base becomes more complete.
2. The method for predicting relationships by combining logic rules and fragmentation knowledge according to claim 1, is implemented as follows:
the first stage is as follows: modeling the direct fact triples in the knowledge base to obtain vector expressions of all entities and relations in the knowledge base, wherein the vector expressions are used for calculating the semantic association degree among the rules at the third stage;
and a second stage: digging out a group of logic rules which can represent semantic information of a knowledge base through a rule mining algorithm;
and a third stage: the logical rule reasoning phase is applied in two ways: firstly, reasoning out new facts and adding the new facts into a knowledge base through materialization reasoning based on logic rules to realize dynamic expansion of the knowledge base; secondly, the relation r in the fact triples (h, r, t) is represented by a logic rule instead, so that the logic rule is embedded into a relation reasoning model based on representation learning, and both h and t represent entities; because a plurality of inference rules with the relation r as a rule head exist in the knowledge base, a method for semantic association between different rule bodies with the relation r as the rule head and the relation r is provided;
a fourth stage: based on the output of the first stage to the third stage as the input of the fourth stage, the unified modeling is carried out on the fact triples and the logic rules, and through the mode, the semantic information rich in the logic rules is embedded into a relational inference model RTransE based on representation learning, and then the relational inference is carried out through the trained RTransE model, so that the completion of the knowledge base is realized;
the fifth stage: and combining the dynamic knowledge fragments, and continuously updating in an iterative manner, so that the knowledge base becomes more complete.
3. The method for predicting the relationship between the logic rule and the fragmentation knowledge as claimed in claim 2, wherein in the fourth stage, the unified modeling of the fact triple and the logic rule is performed as follows:
given a triplet (h, r, t), the TransE model satisfies the following relationship when the triplet is formed: h + r is approximately equal to t; counting the defects in | | h + r-t | |, the Y ray1Based on the above, the normalization improvement is performed on the triple score function, as shown in the following formula (1):
d (h, r, t) | | | h + r-t | | non-woven hair in formula (1)1For the distance function, it can be easily seen that f (h, r, t) is e [0,1 ∈]If the triplet is true, f (h, r, t) should be as small as possible, otherwise as large as possible;
the model uses the existing fact triples in the knowledge base as positive examples, randomly replaces the triples which are generated by head and tail entities and relations and contradict the existing facts in the knowledge base as negative examples for training, and the triple modeling loss function is shown in the following formula (2):
4. The relation prediction method combining logic rules and fragmented knowledge according to claim 3, characterized in that, in the fourth stage, the specific process of embedding the semantic information rich in logic rules into the relational inference model RTransE based on representation learning is as follows:
distance function embedded with logical rule joint representationAs shown in the following equation (3):
k is the number of rulers of the ith logic rule with the relation r as a rule head; b isiA rule body of the ith logic rule; if the logical rule can represent semantic information for the relationship r, thenShould be as close to 0 as possible, otherwise as large as possible;
the expression learning model embedded with the logic rule uses the inferred logic rule as a positive example, and a rule which is generated by a random replacement rule header and contradicts with the existing logic rule as a negative example, and the loss function of the logic rule is shown in the following formula (4):
where LR is a set of logical rules,in order to be a positive example of a set of logical rules,for the negative set of examples, γ represents an adjustable hyper-parameter,the confidence of the ith logical rule with the relation r as the rule head is shown,expressing the semantic association degree of the ith logical rule taking the relation r as a rule head and the relation r;
representation learning model loss function embedded in rule logic rulesAs shown in the following equation (5):
as shown in the formula (5), the model loss function consists of two parts, namely a direct fact triple distance function of the knowledge base and a distance function of the logical rule and the relation r.
5. The method of claim 4, wherein the relationship between the logic rules and the fragmentation knowledge is predicted,
the embedded representation of the entity in the instantiation of the substitution rule by the type of the entity will be more predictive, so:
distance function D (h, r, t, h) of binding entity type improved for D (h, r, t) in formula (1)type,ttype) As shown in the following equation (6):
D(h,r,t,htype,ttype)=||(h+htype)+r-(t+ttype)||1(6)
h in formula (6)typeIndicates the entity type corresponding to the head entity h, ttypeRepresenting the entity type corresponding to the tail entity t;
triple modeling loss function of adding entity type for improving triple modeling loss function of formula (2)As shown in the following equation (7)
In formula (7), EL is an entity type tag set, and the entity tag set EL ═ EL1, EL2Set of labels for all entity classes, f (h, r, t, h)type,ttype) The new triple score function is specifically expressed as shown in the following formula (8):
distance function of improved embedded logic rule and entity type joint expression for formula (3)The following formula (9):
m in formula (9)ie represents the addition of the type vectors of the connected variable entities of the rule body of the ith rule of the relation r;
logic rule loss function for adding entity type improved from equation (4)As shown in the following equation (10)
M in formula (10)ie represents the addition of the type vectors of the connected variable entities of the rule body of the ith rule of the relation r;
learning model loss function by adding representation of embedded rule logic rule of entity type improved from equation (5)The following formula (11):
6. the relationship prediction method combining logic rules and fragmentation knowledge as claimed in claim 5, wherein in the fifth stage, a conditional function Rt triggering model iterative training is shown as the following formula (12):
in the equation (12), # facts and # entity are the number of fact triples and the number of entities in the temporary repository KB', respectively, and θ is the model iterative training threshold.
7. A method of relationship prediction combining logic rules and fragmentation knowledge according to any of claims 1-6, applied to human relationship prediction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911390283.4A CN111191460B (en) | 2019-12-30 | 2019-12-30 | Relation prediction method combining logic rule and fragmentation knowledge |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911390283.4A CN111191460B (en) | 2019-12-30 | 2019-12-30 | Relation prediction method combining logic rule and fragmentation knowledge |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111191460A true CN111191460A (en) | 2020-05-22 |
CN111191460B CN111191460B (en) | 2023-01-03 |
Family
ID=70707772
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911390283.4A Active CN111191460B (en) | 2019-12-30 | 2019-12-30 | Relation prediction method combining logic rule and fragmentation knowledge |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111191460B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112417171A (en) * | 2020-11-23 | 2021-02-26 | 南京大学 | Data augmentation method for knowledge graph representation learning |
CN113901151A (en) * | 2021-09-30 | 2022-01-07 | 北京有竹居网络技术有限公司 | Method, apparatus, device and medium for relationship extraction |
CN114741460A (en) * | 2022-06-10 | 2022-07-12 | 山东大学 | Knowledge graph data expansion method and system based on association between rules |
CN115033716A (en) * | 2022-08-10 | 2022-09-09 | 深圳市人马互动科技有限公司 | General self-learning system and self-learning method based on same |
WO2023007270A1 (en) * | 2021-07-26 | 2023-02-02 | Carl Wimmer | Foci analysis tool |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228245A (en) * | 2016-07-21 | 2016-12-14 | 电子科技大学 | Infer based on variation and the knowledge base complementing method of tensor neutral net |
US20170017221A1 (en) * | 2015-07-16 | 2017-01-19 | Siemens Aktiengesellschaft | Knowledge-based programmable logic controller with flexible in-field knowledge management and analytics |
CN106528609A (en) * | 2016-09-28 | 2017-03-22 | 厦门理工学院 | Vector constraint embedded transformation knowledge graph inference method |
CN109376864A (en) * | 2018-09-06 | 2019-02-22 | 电子科技大学 | A kind of knowledge mapping relation inference algorithm based on stacking neural network |
CN110069638A (en) * | 2019-03-12 | 2019-07-30 | 北京航空航天大学 | A kind of knowledge mapping combination table dendrography learning method of binding rule and path |
-
2019
- 2019-12-30 CN CN201911390283.4A patent/CN111191460B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170017221A1 (en) * | 2015-07-16 | 2017-01-19 | Siemens Aktiengesellschaft | Knowledge-based programmable logic controller with flexible in-field knowledge management and analytics |
CN106228245A (en) * | 2016-07-21 | 2016-12-14 | 电子科技大学 | Infer based on variation and the knowledge base complementing method of tensor neutral net |
CN106528609A (en) * | 2016-09-28 | 2017-03-22 | 厦门理工学院 | Vector constraint embedded transformation knowledge graph inference method |
CN109376864A (en) * | 2018-09-06 | 2019-02-22 | 电子科技大学 | A kind of knowledge mapping relation inference algorithm based on stacking neural network |
CN110069638A (en) * | 2019-03-12 | 2019-07-30 | 北京航空航天大学 | A kind of knowledge mapping combination table dendrography learning method of binding rule and path |
Non-Patent Citations (1)
Title |
---|
郭君: "基于三元组图模型的数据拓扑结构研究", 《万方数据学位论文库》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112417171A (en) * | 2020-11-23 | 2021-02-26 | 南京大学 | Data augmentation method for knowledge graph representation learning |
CN112417171B (en) * | 2020-11-23 | 2023-10-03 | 南京大学 | Knowledge graph representation learning-oriented data augmentation method |
WO2023007270A1 (en) * | 2021-07-26 | 2023-02-02 | Carl Wimmer | Foci analysis tool |
CN113901151A (en) * | 2021-09-30 | 2022-01-07 | 北京有竹居网络技术有限公司 | Method, apparatus, device and medium for relationship extraction |
CN114741460A (en) * | 2022-06-10 | 2022-07-12 | 山东大学 | Knowledge graph data expansion method and system based on association between rules |
CN114741460B (en) * | 2022-06-10 | 2022-09-30 | 山东大学 | Knowledge graph data expansion method and system based on association between rules |
CN115033716A (en) * | 2022-08-10 | 2022-09-09 | 深圳市人马互动科技有限公司 | General self-learning system and self-learning method based on same |
WO2024031813A1 (en) * | 2022-08-10 | 2024-02-15 | 深圳市人马互动科技有限公司 | General self-learning system and self-learning method based on general self-learning system |
Also Published As
Publication number | Publication date |
---|---|
CN111191460B (en) | 2023-01-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111191460B (en) | Relation prediction method combining logic rule and fragmentation knowledge | |
Deng et al. | Attention-based BiLSTM fused CNN with gating mechanism model for Chinese long text classification | |
US9311823B2 (en) | Caching natural language questions and results in a question and answer system | |
CN113360915B (en) | Intelligent contract multi-vulnerability detection method and system based on source code diagram representation learning | |
CN110390017B (en) | Target emotion analysis method and system based on attention gating convolutional network | |
Marx et al. | Logic on MARS: Ontologies for Generalised Property Graphs. | |
Xu et al. | Exploiting shared information for multi-intent natural language sentence classification. | |
Xu et al. | Event temporal relation extraction with attention mechanism and graph neural network | |
CN113255822B (en) | Double knowledge distillation method for image retrieval | |
CN112765653B (en) | Multi-source data fusion privacy protection method based on multi-privacy policy combination optimization | |
US20150199607A1 (en) | Incremental reasoning based on scalable and dynamical semantic data | |
Guo et al. | ELAA: An efficient local adversarial attack using model interpreters | |
Wei et al. | Fault diagnosis of marine turbocharger system based on an unsupervised algorithm | |
Xiao et al. | Video captioning with temporal and region graph convolution network | |
Sen et al. | Logical neural networks for knowledge base completion with embeddings & rules | |
CN114579605B (en) | Table question-answer data processing method, electronic equipment and computer storage medium | |
Du et al. | Bidirectional edge-enhanced graph convolutional networks for aspect-based sentiment classification | |
Diaconescu | Quasi-boolean encodings and conditionals in algebraic specification | |
Sun et al. | Information entropy and mutual information-based uncertainty measures in rough set theory | |
CN116994309B (en) | Face recognition model pruning method for fairness perception | |
CN115577361B (en) | Improved PHP Web shell detection method based on graph neural network | |
Krishnakumar et al. | INDENT: Incremental Online Decision Tree Training for Domain-Specific Systems-on-Chip | |
Meng et al. | Enhancing graph neural networks with edge features through sequential representation | |
CN117951314A (en) | Scenario generation decision method integrating knowledge graph and large language generation model | |
Lv et al. | Integration of multiple qualitative probabilistic networks based on probabilistic rough sets |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |