CN111191460A - Relation prediction method combining logic rule and fragmentation knowledge - Google Patents

Relation prediction method combining logic rule and fragmentation knowledge Download PDF

Info

Publication number
CN111191460A
CN111191460A CN201911390283.4A CN201911390283A CN111191460A CN 111191460 A CN111191460 A CN 111191460A CN 201911390283 A CN201911390283 A CN 201911390283A CN 111191460 A CN111191460 A CN 111191460A
Authority
CN
China
Prior art keywords
rule
logic
relation
knowledge
knowledge base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911390283.4A
Other languages
Chinese (zh)
Other versions
CN111191460B (en
Inventor
汪璟玢
张梨贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201911390283.4A priority Critical patent/CN111191460B/en
Publication of CN111191460A publication Critical patent/CN111191460A/en
Application granted granted Critical
Publication of CN111191460B publication Critical patent/CN111191460B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Abstract

The invention relates to a relation prediction method combining logic rules and fragmentation knowledge. Firstly, uniformly modeling a fact triple and a logic rule, and embedding hidden semantic information into a relation inference model based on knowledge representation; secondly, combining fragmentation knowledge, continuously iterating and updating, so that the knowledge base becomes more complete. The invention carries out unified modeling on the fact triple and the logic rule, and embeds the hidden semantic information into the relation inference model based on knowledge representation, thereby realizing more accurate prediction.

Description

Relation prediction method combining logic rule and fragmentation knowledge
Technical Field
The invention relates to a relation prediction method combining logic rules and fragmentation knowledge.
Background
In the field of relational reasoning, a relational reasoning model represented by TransE [1] has been a research focus in recent years because it is simple and efficient and has good prediction performance. The TransE model directly models fact triples (h, r, t) in a knowledge base, and the basic idea is to map entities and relations in the knowledge base into a low-dimensional continuous vector space, so that the related calculation of the knowledge base is simplified. Although simple and efficient, the basic representation learning model only considers the direct fact triples (h, r and t) in the knowledge base and ignores the semantic information hidden in the knowledge base, so that the inference precision is limited. Some recent work has utilized the addition of external data such as entity types, textual descriptions, logic rules, etc. to further improve inference accuracy. Document [2] reduces the influence of noisy data on the model by introducing a domain and a range of relationships to filter out some erroneous samples. Document [3] considers adding entity context information to the representation learning model, thereby improving the semantic expression capability of the model. Document [4] improves its reasoning performance by uniformly modeling additional textual information and direct fact triples in the knowledge base. Document [5] first extracts a set of horns logical rules that can represent knowledge base semantic information through a rule mining system. A new set of facts is then derived through a rule-based materialization inference method. Document [6] represents semantic relationships between entities using a number of multi-hop relationship paths existing between the entities.
With the rapid development of the internet, new knowledge fragments are continuously generated, and the knowledge base is no longer static. Therefore, when the relational reasoning technology is applied to realize the automatic completion of the knowledge base, the dynamic growth situation of the knowledge base should be considered. In recent years, relational reasoning based on knowledge representation learning has received great attention. However, most existing knowledge representation learning methods only use fact triples to perform embedding, and ignore some hidden semantic information in a knowledge network, so that not only can learned vectors not accurately express semantic relations in an original knowledge base, but also values brought by fragmented knowledge cannot be fully utilized. Therefore, the invention provides a relation prediction method combining logic rules and fragmentation knowledge. The method comprises the steps of firstly, carrying out unified modeling on fact triples and logic rules, and embedding hidden semantic information into a relation inference model based on knowledge representation in such a mode. And secondly, combining fragmentation knowledge and continuously iterating and updating to enable a knowledge base to be more complete and realize relationship prediction.
Disclosure of Invention
The invention aims to provide a relation prediction method combining logic rules and fragmented knowledge.
In order to achieve the purpose, the technical scheme of the invention is as follows: a relation prediction method combining logic rules and fragmented knowledge comprises the steps of firstly carrying out unified modeling on fact triples and logic rules, and embedding hidden semantic information into a relation inference model based on knowledge representation; secondly, combining fragmentation knowledge, continuously iterating and updating, so that the knowledge base becomes more complete.
In an embodiment of the present invention, the method is specifically implemented as follows:
the first stage is as follows: modeling the direct fact triples in the knowledge base to obtain vector expressions of all entities and relations in the knowledge base, wherein the vector expressions are used for calculating the semantic association degree among the rules at the third stage;
and a second stage: digging out a group of logic rules which can represent semantic information of a knowledge base through a rule mining algorithm;
and a third stage: the logical rule reasoning phase is applied in two ways: firstly, reasoning out new facts and adding the new facts into a knowledge base through materialization reasoning based on logic rules to realize dynamic expansion of the knowledge base; secondly, the relation r in the fact triples (h, r, t) is represented by a logic rule instead, so that the logic rule is embedded into a relation reasoning model based on representation learning, and both h and t represent entities; because a plurality of inference rules with the relation r as a rule head exist in the knowledge base, a method for semantic association between different rule bodies with the relation r as the rule head and the relation r is provided;
a fourth stage: based on the output of the first stage to the third stage as the input of the fourth stage, the unified modeling is carried out on the fact triples and the logic rules, and through the mode, the semantic information rich in the logic rules is embedded into a relational inference model RTransE based on representation learning, and then the relational inference is carried out through the trained RTransE model, so that the completion of the knowledge base is realized;
the fifth stage: and combining the dynamic knowledge fragments, and continuously updating in an iterative manner, so that the knowledge base becomes more complete.
In the fourth stage, in an embodiment of the present invention, the unified modeling process for the fact triples and the logic rules is as follows:
given a triplet (h, r, t), the TransE model satisfies the following relationship when the triplet is formed: h + r is approximately equal to t; counting the defects in | | h + r-t | |, the Y ray1Based on the above, the normalization improvement is performed on the triple score function, as shown in the following formula (1):
Figure BDA0002344640250000021
d (h, r, t) | | | h + r-t | | non-woven hair in formula (1)1For the distance function, it can be easily seen that f (h, r, t) is e [0,1 ∈]If the triplet is true, f (h, r, t) should be as small as possible, otherwise as large as possible;
the model uses the existing fact triples in the knowledge base as positive examples, randomly replaces the triples which are generated by head and tail entities and relations and contradict the existing facts in the knowledge base as negative examples for training, and the triple modeling loss function is shown in the following formula (2):
Figure BDA0002344640250000022
in formula (2), S ═ S1,s2,...si,...snIs a set of fact triples,
Figure BDA0002344640250000023
in order to be a set of positive example triples,
Figure BDA0002344640250000024
for negative example sets of triplets, γ represents an adjustable hyper-parameter.
In an embodiment of the present invention, in the fourth stage, a specific process of embedding semantic information rich in logic rules into a relational inference model RTransE based on representation learning is as follows:
distance function embedded with logical rule joint representation
Figure BDA0002344640250000031
As shown in the following equation (3):
Figure BDA0002344640250000032
k is the number of rulers of the ith logic rule with the relation r as a rule head; b isiA rule body of the ith logic rule; if the logical rule can represent semantic information for the relationship r, then
Figure BDA0002344640250000033
Should be as close to 0 as possible, otherwise as large as possible;
the expression learning model embedded with the logic rule uses the inferred logic rule as a positive example, and a rule which is generated by a random replacement rule header and contradicts with the existing logic rule as a negative example, and the loss function of the logic rule is shown in the following formula (4):
Figure BDA0002344640250000034
where LR is a set of logical rules,
Figure BDA0002344640250000035
in order to be a positive example of a set of logical rules,
Figure BDA0002344640250000036
for the negative set of examples, γ represents an adjustable hyper-parameter,
Figure BDA0002344640250000037
the confidence of the ith logical rule with the relation r as the rule head is shown,
Figure BDA0002344640250000038
expressing the semantic association degree of the ith logical rule taking the relation r as a rule head and the relation r;
representation learning model loss function embedded in rule logic rules
Figure BDA0002344640250000039
As shown in the following equation (5):
Figure BDA00023446402500000310
as shown in the formula (5), the model loss function consists of two parts, namely a direct fact triple distance function of the knowledge base and a distance function of the logical rule and the relation r.
In one embodiment of the present invention, the embedded representation of an entity in the instantiation of an entity type substitution rule will be more predictive, and therefore:
distance function D (h, r, t, h) of binding entity type improved for D (h, r, t) in formula (1)type,ttype) As shown in the following equation (6):
D(h,r,t,htype,ttype)=||(h+htype)+r-(t+ttype)||1(6)
h in formula (6)typeIndicates the entity type corresponding to the head entity h, ttypeRepresenting the entity type corresponding to the tail entity t;
triple modeling loss function of adding entity type for improving triple modeling loss function of formula (2)
Figure BDA00023446402500000311
As shown in the following equation (7)
Figure BDA0002344640250000041
EL in equation (7) is an entity type tag set, where the entity tag set EL ═ EL1, EL2, and Eln, and it represents a set of tags that can represent all entity classes in the knowledge base, and f (h, r, t, h, and f) are set of tags that can represent all entity classes in the knowledge basetype,ttype) As a new triple scoring functionSpecifically, the following formula (8) is shown:
Figure BDA0002344640250000042
distance function of improved embedded logic rule and entity type joint expression for formula (3)
Figure BDA0002344640250000043
The following formula (9):
Figure BDA0002344640250000044
m in formula (9)ie represents the addition of the type vectors of the connected variable entities of the rule body of the ith rule of the relation r;
logic rule loss function for adding entity type improved from equation (4)
Figure BDA0002344640250000045
As shown in the following equation (10)
Figure BDA0002344640250000046
M in formula (10)ie represents the addition of the type vectors of the connected variable entities of the rule body of the ith rule of the relation r;
learning model loss function by adding representation of embedded rule logic rule of entity type improved from equation (5)
Figure BDA0002344640250000047
The following formula (11):
Figure BDA0002344640250000048
in an embodiment of the present invention, in the fifth stage, a conditional function Rt of model iterative training is triggered, as shown in the following formula (12):
Figure BDA0002344640250000049
in the equation (12), # facts and # entity are the number of fact triples and the number of entities in the temporary repository KB', respectively, and θ is the model iterative training threshold.
In an embodiment of the present invention, the method is applied to human relationship prediction.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention carries out unified modeling on the fact triples and the logic rules, and embeds the hidden semantic information into a relation inference model based on knowledge representation, thereby realizing more accurate prediction;
1. the method can operate the activation strategy aiming at the dynamically inflowing knowledge fragments, adapts to the dynamic knowledge network and realizes more accurate knowledge reasoning.
Drawings
FIG. 1 is a block diagram of the method of the present invention.
FIG. 2 is an example temporary repository.
FIG. 3 is a relational inference overall framework diagram.
FIG. 4 is a diagram of a process for reasoning in conjunction with logic rules and fragmented knowledge.
Detailed Description
The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.
The invention provides a relation prediction method combining logic rules and fragmented knowledge, which comprises the steps of firstly, carrying out unified modeling on fact triples and logic rules, and embedding hidden semantic information into a relation inference model based on knowledge representation; secondly, combining fragmentation knowledge, continuously iterating and updating, so that the knowledge base becomes more complete. The method is concretely realized as follows:
the first stage is as follows: modeling the direct fact triples in the knowledge base to obtain vector expressions of all entities and relations in the knowledge base, wherein the vector expressions are used for calculating the semantic association degree among the rules at the third stage;
and a second stage: digging out a group of logic rules which can represent semantic information of a knowledge base through a rule mining algorithm;
and a third stage: the logical rule reasoning phase is applied in two ways: firstly, reasoning out new facts and adding the new facts into a knowledge base through materialization reasoning based on logic rules to realize dynamic expansion of the knowledge base; secondly, the relation r in the fact triples (h, r, t) is represented by a logic rule instead, so that the logic rule is embedded into a relation reasoning model based on representation learning, and both h and t represent entities; because a plurality of inference rules with the relation r as a rule head exist in the knowledge base, a method for semantic association between different rule bodies with the relation r as the rule head and the relation r is provided;
a fourth stage: based on the output of the first stage to the third stage as the input of the fourth stage, the unified modeling is carried out on the fact triples and the logic rules, and through the mode, the semantic information rich in the logic rules is embedded into a relational inference model RTransE based on representation learning, and then the relational inference is carried out through the trained RTransE model, so that the completion of the knowledge base is realized;
the fifth stage: and combining the dynamic knowledge fragments, and continuously updating in an iterative manner, so that the knowledge base becomes more complete.
The following is a specific implementation of the present invention.
The method provided by the invention mainly comprises five stages: in the first stage, modeling is carried out on the direct fact triples in the knowledge base to obtain vector expressions of all entities and relations in the knowledge base, and the vector expressions are used for calculating the semantic association degree among the rules in the third stage. And in the second stage, a group of logic rules which can represent semantic information of the knowledge base is mined through a rule mining algorithm. In the third stage, a logical rule reasoning stage is applied, and two main modes are provided: firstly, reasoning out new facts and adding the new facts into a knowledge base through materialization reasoning based on logic rules to realize dynamic expansion of the knowledge base; and secondly, the relation r in the fact triple (h, r, t) is replaced and expressed by a logic rule, so that the logic rule is embedded into a relation inference model based on expression learning. Because a plurality of inference rules with the relation r as a rule head exist in the knowledge base, three methods for calculating semantic association degrees between different rule bodies with the relation r as the rule head and the relation r are provided; and in the fourth stage, the output of the three stages is used as the input of the fourth stage, the key idea is to uniformly model the fact triples and the logic rules, and through the way, the semantic information rich in the logic rules is embedded into a relational inference model based on representation learning, and then the relational inference is carried out through a trained RTransE model, so that the completion of the knowledge base is realized. And in the fifth stage, combining the dynamic knowledge fragments, and continuously updating in an iterative manner, so that the knowledge base becomes more complete. The overall framework design of the method of the invention is shown in figure 1:
the relevant definitions herein are given below.
Definition 1 (repository, KB) sets repository KB ═ E, R, F, P, V >, where E denotes a set of entitlements, R denotes a set of relationships, F denotes a set of facts in the repository, P denotes a set of properties, and V denotes a set of values.
Definition 2 (entity set, E) sets E { E1, E2., en }, (ii) subject (kb) ∪ ii object (kb), which describes all entities in the semantic network knowledge base data layer and corresponds to the set of instances in the RDF.
Definition 3 (relationship set, R) sets a relationship set R { R1, R2.., rn }, ii relationship (kb), which represents a relationship between entities.
Definitions 4 (fact set, F) set of facts
Figure BDA0002344640250000061
It represents the set of all instance triples in the knowledge base.
Definition 5 (attribute, P) attribute set P represents a set P ═ P1, P2.., pn } of overall attributes, which associates E with attribute value V.
Definition 6 (attribute value, V) attribute value set V represents a set V of overall attribute values { V1, V2.
Definition 7 (entity tag set, EL) sets an entity tag set EL ═ EL1, EL 2. For commonly used datasets such as YAGO and DBpedia, PER, LOC and ORG are respectively extended herein, 39 types are defined as entity tag sets in the document, denoted as EL, and Cf ═ { PER | ORG | LOC } represents a set of three major classes. As shown in table 1.
TABLE 1 entity tag set
Figure BDA0002344640250000062
Figure BDA0002344640250000071
The method comprises the following steps:
1. triple modeling
Given a triplet (h, r, t), the TransE model is expected to satisfy the following relationship as much as possible when the triplet is formed: h + r ≈ t. For example, yaoming + nationality is china and james + nationality is the united states. When the text is in | | h + r-t | | non-woven hair1Based on the three-element score function, the normalization improvement is carried out.
A score function f (h, r, t) of 8 triplets is defined, as shown in equation 1 below:
Figure BDA0002344640250000072
d (h, r, t) | | | h + r-t | | non-woven hair in formula (1)1For the distance function, it can be easily seen that f (h, r, t) is e [0,1 ∈]If a triplet is true, f (h, r, t) should be as small as possible, and vice versa as large as possible.
The model is trained by using the existing fact triples in the knowledge base as positive examples and randomly replacing the triples which are generated by head and tail entities and relations and contradict the existing facts in the knowledge base as negative examples, and the loss function of the model is defined as the following definition 9.
Defining 9 loss functions for triple modeling
Figure BDA0002344640250000073
As shown in equation 2 below:
Figure BDA0002344640250000074
formula (2) S ═ S1,s2,...si,...snIs a set of fact triples,
Figure BDA0002344640250000075
as a set of positive example triples
Figure BDA0002344640250000076
For negative example sets of triplets, γ represents an adjustable hyper-parameter.
2. Logical rule mining
In connection with document [7], the present invention implements an algorithm HornConcerto that finds Horn logic rules in large graph data. The algorithm is superior to the existing method in terms of running time and memory consumption, and higher-quality logic rules are mined for knowledge reasoning tasks. The HornConcerto algorithm is inspired by the ANIE + algorithm, and a confidence measure method PCA is also introduced. The support and confidence of the logic rules are shown in definitions 10 and 11.
Defining 10 degrees of support for logical rules
Figure BDA0002344640250000081
As shown in equation 3 below:
Figure BDA0002344640250000082
wherein the support degree suppp of the rule represents the number of fact triples satisfying the rule head and the rule body simultaneously in the knowledge base, z1,...zmRepresenting rule variables other than x and y.
PCA confidence for defining 11 logic rules
Figure BDA0002344640250000083
As shown in equation 4 below:
Figure BDA0002344640250000084
where the numerator represents the support of the rule and y' in the denominator represents all possible relationships of the rule header computed by the PCA hypothesis. The confidence of a rule reflects the confidence of the rule and the expressed semantic richness, with confidence being closer to 1 and more reliable.
3. Applying logic rules
The knowledge base itself typically already contains enough information to derive and add new facts. Some rules may be found in the knowledge base by a rule mining algorithm. For example, we can mine the rules:
Figure BDA0002344640250000085
this rule captures the fact that a person's spouse often lives in the same place as the person.
As known from background knowledge, these logic rules imply rich semantic information, which will make knowledge reasoning more predictive. While the logical rules mined herein apply to two aspects:
first, the relation r in the fact triple (h, r, t) is replaced by a logic rule, so that the logic rule is embedded into a relation inference model based on representation learning. However, there may be multiple inference rules in the knowledge base with the relationship r as the rule head. For example, the following two rules can infer the relationship of "nationality", but the semantic association degree between the "nationality" and the "nationality" is different.
1、
Figure BDA0002344640250000086
2、
Figure BDA0002344640250000087
In this subsection, we will introduce three methods to measure semantic association between different rulers with relation r as the rule header and relation r. Definition of
Figure BDA0002344640250000088
The N logic rules corresponding to the rule head relation r.
Define 12 equol (average): the influence degree of each inference rule on the relation r is considered to be the same, so the semantic association degree calculation method for the ith logic rule with the relation r as the rule head and the relation r is shown in the following formula 5:
Figure BDA0002344640250000089
define 13NumberRatio (number ratio): the semantic association degree is measured by calculating the proportion of the number of fact triples satisfying the logic rule in the knowledge base to the sum of the number of fact triples satisfying the N logic rules, so that the semantic association degree calculation method for the ith logic rule with the relation r as the rule head is shown in the following formula 6:
Figure BDA0002344640250000091
define 14 vectordestination (vector distance): the semantic association degree between the rule body and the rule head in the logic rule is calculated through the relation vector representation learned by the TransE model, that is, the similarity between vectors is measured by utilizing the cosine value of the included angle between two vectors in the vector space, so that the semantic association degree calculation method for the ith logic rule taking the relation r as the rule head and the relation r is shown in the following formula 7:
Figure BDA0002344640250000092
n in the formula (7) is the dimension of the vector, k is the number of the rulers in the ith logic rule, BiIs the rule body of the ith logic rule, T is the normalization factor, which can be easily seen
Figure BDA0002344640250000093
The closer the cosine value is to 1, the more the included angle is indicatedNear 0, the more similar the vector.
And secondly, reasoning out new facts and adding the new facts into a knowledge base through materialization reasoning based on logic rules, so that knowledge is more fully utilized. However, the untrusted fact triplets will contribute more noise to the model, and filtering the noise data through equations (3) and (4) herein introduces more reliable rule and fact triplet information into the model.
The confidence function tripleconf (h, r, t) of the inferred new fact triplets is defined 15, as shown in equation 8 below:
Figure BDA0002344640250000094
4. representation learning model RTransE of embedded logic rule
The train series model introduced in the previous section only considers the direct fact triples in the knowledge base, and only considers the logic rule with the highest confidence coefficient in the expression learning method based on the logic rule, but ignores the influence of other inference rules on the fact triples, so that the method has the problem of low inference precision when facing complex relation types of 1-N, N-1 and N-N. This section presents a relational inference model RTransE based on logic rules and representation learning. The schematic diagram of the model is shown in fig. 1-part 4.
As can be seen from the schematic diagram of fig. 1-part4, the model considers not only the direct relationship r between the entity h and the entity t in the triplet (h, r, t), but also considers N logic rules with the relationship r as the rule header, so that the entity relationship vector learned by the expression learning model embedded with the logic rules can more completely express semantic information in the knowledge base, and more accurate prediction is realized.
Defining 16 a distance function for a joint representation of embedded logical rules
Figure BDA0002344640250000101
As shown in equation 9 below:
Figure BDA0002344640250000102
k is the number of rulers of the ith logical rule with the relation r as the rule head. If the logical rule can represent semantic information for the relationship r, then
Figure BDA0002344640250000103
Should be as close to 0 as possible, and vice versa as large as possible.
The expression learning model embedded with the logic rules uses the inferred logic rules as positive examples and randomly replaces rules generated by the rule header that contradict the existing logic rules as negative examples. The loss function is shown in definition 17 below.
Defining 17 a loss function of a logic rule
Figure BDA0002344640250000104
As shown in equation 10 below:
Figure BDA0002344640250000105
where LR is a set of logical rules,
Figure BDA0002344640250000106
in order to be a positive example of a set of logical rules,
Figure BDA0002344640250000107
for the negative set of examples, γ represents an adjustable hyper-parameter,
Figure BDA0002344640250000108
the confidence of the ith logical rule with the relation r as the rule head is shown,
Figure BDA0002344640250000109
and expressing the semantic association degree of the ith logical rule taking the relation r as a rule head and the relation r.
Defining 18 a representational learning model loss function of an embedded rule logic rule
Figure BDA00023446402500001010
As shown in equation 11 below:
Figure BDA00023446402500001011
as shown in the formula (11), the model loss function is composed of two parts, namely a direct fact triple distance function of the knowledge base and a distance function of the logical rule and the relation r. The main idea is to let the distance function values for those positive samples be much smaller than for negative samples.
5. Influence of entity type on representation learning model
From equation (9), the distance function
Figure BDA00023446402500001012
Only the logical rules themselves are considered, and the impact of the entities in the instantiation of the rules on the embedding model is ignored. Instantiations of a rule necessarily correspond to different entities, and it is therefore difficult to properly represent the entities. To alleviate this problem, we represent entities using entity types present in the knowledge base, e.g., for logical rules
Figure BDA00023446402500001013
In other words, instance y must correspond to a different entity, but its corresponding entity type is the same, so the embedded representation of the entity in the instantiation of the substitution rule introduced into the entity type will be more predictive.
Defining 19 a distance function D (h, r, t, h) for binding entity types that improves D (h, r, t) in equation (1)type,ttype) As shown in the following equation 12:
D(h,r,t,htype,ttype)=||(h+htype)+r-(t+ttype)||1formula (12)
H in formula (12)typeIndicates the entity type corresponding to the head entity h, ttypeAnd representing the entity type corresponding to the tail entity t.
Definitions 20 Embedded logic rules with improvements to equation (9)Then the distance function represented jointly with the entity type
Figure BDA0002344640250000111
As shown in equation 13 below:
Figure BDA0002344640250000112
wherein M isie represents the addition of the type vectors of the connected variable entities of the rulebody of the ith rule of the relationship r.
Defining 21 a triple modeling penalty function for adding entity types that improves on equation (2)
Figure BDA0002344640250000113
As shown in equation 14 below:
Figure BDA0002344640250000114
EL in formula (14) is entity type tag set, and the specific reference definition is 7, f (h, r, t, h)type,ttype) The new triple score function is specifically expressed as shown in the following equation 15.
Figure BDA0002344640250000115
Defining 22 a logical rule loss function for joining entity types that improves on equation (10)
Figure BDA0002344640250000116
As shown in equation 16 below:
Figure BDA0002344640250000117
defining 23 a learning model loss function for embedding rules logic rules for adding entity types that improves equation (11)
Figure BDA0002344640250000118
As shown in the following equation 17。
Figure BDA0002344640250000119
6. Relationship reasoning incorporating dynamic knowledge fragmentation
The modern era is the era of the rapid development of the Internet, and new knowledge fragments are continuously generated, so that the knowledge base is not static any more. The utilization of dynamic knowledge fragmentation for relational reasoning is also one of the extremely effective means for realizing the dynamic growth of the knowledge base.
For example, if the fact triple "< yaoming, place of birth, shanghai >" exists in the knowledge base, and the newly inflowing knowledge fragment "< shanghai, country of the country, china >" is combined, if the corresponding entity vector can be found in the knowledge base by the "shanghai" and the "china", the relationship of "nationality" can be complemented between the two entities "yaoming" and "china" through the trained RTransE model. And if not, adding the data into the temporary knowledge base pool, merging the temporary knowledge base KB' and the original knowledge base KB when the ratio of the number of the fact triples to the number of the entities in the pool reaches the threshold theta of the iterative training, retraining to obtain a new logic rule and a new RTransE model, and performing the relational reasoning by combining with the fragmentation knowledge. This is repeated until no new pieces of knowledge are flowing in or new facts are no longer being generated.
The threshold value is set, so that the iteration number is reduced, and the algorithm execution efficiency is improved. The subsection uses the proportion of the number of the fact triples to the number of the entities to measure the discrete degree of the temporary knowledge base, and the larger the proportion is, the closer the relation between the entities is, and the potential relation between the entities can be mined. At this point, retraining is more meaningful. And vice versa. It is clear from fig. 2 that the temporary knowledge base satisfying fig. 2-a is more compliant with the retraining criteria.
A conditional function Rt is defined 24 that triggers iterative training of the model, as shown in equation 18 below.
Figure BDA0002344640250000121
In the equation (18), # facts and # entity are the number of fact triples and the number of entities in the temporary repository KB', respectively, and θ is the model iterative training threshold.
7. Application of the method of the invention
Despite the fact that a common knowledge base already contains millions of entities and hundreds of millions, it is still incomplete. Completion of the knowledge base is to reason and predict the relationship between the entities by the prior knowledge combined with the fragmented knowledge. The invention realizes further knowledge base completion by realizing an algorithm HornConcercato [8] and RACRFK algorithm for discovering the Horn logic rule in the large-scale graph data, thereby realizing more full utilization of knowledge.
As known from background knowledge, logic rules imply rich semantic information, which makes knowledge reasoning more predictive. Therefore, a set of horns logical rules which can represent semantic information of the knowledge base is mined by using a rule mining algorithm hornConcerto. Secondly, the logic rule is applied to a relational inference algorithm RACRFK combining the logic rule and dynamic knowledge fragmentation, fragmentation knowledge is combined, and the updating is continuously iterated, so that a knowledge base becomes larger and more complete.
The overall framework diagram for the relational inference using the HornConcerto algorithm and the RACRFK algorithm is shown in fig. 3.
First, a set of hornlogical rules that can represent semantic information of a knowledge base is mined through a hornConcerto rule mining algorithm, for example, such a rule can be mined
Figure BDA0002344640250000131
Next, for example, the knowledge base is complemented by a relational inference algorithm RACRFK combining logic rules and fragmented knowledge. The reasoning process combining logic rules and fragmentation knowledge is shown in figure 4.
It can be observed from fig. 4 that the inflow of one knowledge fragment may trigger the model to iteratively reason thousands of facts about the knowledge base, which reflects the value that the model can make full use of the dynamic knowledge fragments. In fig. 4, the dotted line (i.e., the content of the nationality part in the figure) is a new fact inferred by the present model in combination with the fragmentation knowledge.
Reference documents:
[1]Bordes A,Usunier N,GarcíaduránA,et al.Translating Embeddings forModeling Multi-relational Data.[C]//International Conference onNeuralInformation Processing Systems.2013.
[2]Krompaβ,Denis,S.Baier,andV.Tresp."Type-Constrained RepresentationLearning in Knowledge Graphs."(2015).
[3]Xie R,Liu Z,Sun M.Representation learning ofknowledge graphs withhierarchical types[C]//International Joint Conference onArtificialIntelligence.AAAI Press,2016.
[4]Wang Z,Li J.Text-enhancedrepresentation learning for knowledgegraph[C]//International Joint Conference onArtificial Intelligence.AAAIPress,2016.
[5] the knowledge graph of the rule enhancement of Chenxi, Chenhuajun, Zhang Wen represents the learning method [ J ] information engineering, 2017,3(1):026-034.
[6]Lin Y,Liu Z,Luan H,et al.Modeling Relation Paths forRepresentation Learning of Knowledge Bases[J].Computer Science,2015.
[7]Soru T,Valdestilhas,André,Marx E,et al.Beyond Markov Logic:Efficient Mining of Prediction Rules in Large Graphs[J].2018
[8]Soru T,Valdestilhas,André,Marx E,et al.Beyond Markov Logic:Efficient Mining of Prediction Rules in Large Graphs[J].2018.。
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims (7)

1. A relation prediction method combining logic rules and fragmented knowledge is characterized in that a fact triple and the logic rules are modeled uniformly, and hidden semantic information is embedded into a relation inference model based on knowledge representation; secondly, combining fragmentation knowledge, continuously iterating and updating, so that the knowledge base becomes more complete.
2. The method for predicting relationships by combining logic rules and fragmentation knowledge according to claim 1, is implemented as follows:
the first stage is as follows: modeling the direct fact triples in the knowledge base to obtain vector expressions of all entities and relations in the knowledge base, wherein the vector expressions are used for calculating the semantic association degree among the rules at the third stage;
and a second stage: digging out a group of logic rules which can represent semantic information of a knowledge base through a rule mining algorithm;
and a third stage: the logical rule reasoning phase is applied in two ways: firstly, reasoning out new facts and adding the new facts into a knowledge base through materialization reasoning based on logic rules to realize dynamic expansion of the knowledge base; secondly, the relation r in the fact triples (h, r, t) is represented by a logic rule instead, so that the logic rule is embedded into a relation reasoning model based on representation learning, and both h and t represent entities; because a plurality of inference rules with the relation r as a rule head exist in the knowledge base, a method for semantic association between different rule bodies with the relation r as the rule head and the relation r is provided;
a fourth stage: based on the output of the first stage to the third stage as the input of the fourth stage, the unified modeling is carried out on the fact triples and the logic rules, and through the mode, the semantic information rich in the logic rules is embedded into a relational inference model RTransE based on representation learning, and then the relational inference is carried out through the trained RTransE model, so that the completion of the knowledge base is realized;
the fifth stage: and combining the dynamic knowledge fragments, and continuously updating in an iterative manner, so that the knowledge base becomes more complete.
3. The method for predicting the relationship between the logic rule and the fragmentation knowledge as claimed in claim 2, wherein in the fourth stage, the unified modeling of the fact triple and the logic rule is performed as follows:
given a triplet (h, r, t), the TransE model satisfies the following relationship when the triplet is formed: h + r is approximately equal to t; counting the defects in | | h + r-t | |, the Y ray1Based on the above, the normalization improvement is performed on the triple score function, as shown in the following formula (1):
Figure FDA0002344640240000011
d (h, r, t) | | | h + r-t | | non-woven hair in formula (1)1For the distance function, it can be easily seen that f (h, r, t) is e [0,1 ∈]If the triplet is true, f (h, r, t) should be as small as possible, otherwise as large as possible;
the model uses the existing fact triples in the knowledge base as positive examples, randomly replaces the triples which are generated by head and tail entities and relations and contradict the existing facts in the knowledge base as negative examples for training, and the triple modeling loss function is shown in the following formula (2):
Figure FDA0002344640240000012
in formula (2), S ═ S1,s2,...si,...snIs a set of fact triples,
Figure FDA0002344640240000021
in order to be a set of positive example triples,
Figure FDA0002344640240000022
for negative example sets of triplets, γ represents an adjustable hyper-parameter.
4. The relation prediction method combining logic rules and fragmented knowledge according to claim 3, characterized in that, in the fourth stage, the specific process of embedding the semantic information rich in logic rules into the relational inference model RTransE based on representation learning is as follows:
distance function embedded with logical rule joint representation
Figure FDA0002344640240000023
As shown in the following equation (3):
Figure FDA0002344640240000024
k is the number of rulers of the ith logic rule with the relation r as a rule head; b isiA rule body of the ith logic rule; if the logical rule can represent semantic information for the relationship r, then
Figure FDA0002344640240000025
Should be as close to 0 as possible, otherwise as large as possible;
the expression learning model embedded with the logic rule uses the inferred logic rule as a positive example, and a rule which is generated by a random replacement rule header and contradicts with the existing logic rule as a negative example, and the loss function of the logic rule is shown in the following formula (4):
Figure FDA0002344640240000026
where LR is a set of logical rules,
Figure FDA0002344640240000027
in order to be a positive example of a set of logical rules,
Figure FDA0002344640240000028
for the negative set of examples, γ represents an adjustable hyper-parameter,
Figure FDA0002344640240000029
the confidence of the ith logical rule with the relation r as the rule head is shown,
Figure FDA00023446402400000210
expressing the semantic association degree of the ith logical rule taking the relation r as a rule head and the relation r;
representation learning model loss function embedded in rule logic rules
Figure FDA00023446402400000211
As shown in the following equation (5):
Figure FDA00023446402400000212
as shown in the formula (5), the model loss function consists of two parts, namely a direct fact triple distance function of the knowledge base and a distance function of the logical rule and the relation r.
5. The method of claim 4, wherein the relationship between the logic rules and the fragmentation knowledge is predicted,
the embedded representation of the entity in the instantiation of the substitution rule by the type of the entity will be more predictive, so:
distance function D (h, r, t, h) of binding entity type improved for D (h, r, t) in formula (1)type,ttype) As shown in the following equation (6):
D(h,r,t,htype,ttype)=||(h+htype)+r-(t+ttype)||1(6)
h in formula (6)typeIndicates the entity type corresponding to the head entity h, ttypeRepresenting the entity type corresponding to the tail entity t;
triple modeling loss function of adding entity type for improving triple modeling loss function of formula (2)
Figure FDA0002344640240000031
As shown in the following equation (7)
Figure FDA0002344640240000032
In formula (7), EL is an entity type tag set, and the entity tag set EL ═ EL1, EL2Set of labels for all entity classes, f (h, r, t, h)type,ttype) The new triple score function is specifically expressed as shown in the following formula (8):
Figure FDA0002344640240000033
distance function of improved embedded logic rule and entity type joint expression for formula (3)
Figure FDA0002344640240000034
The following formula (9):
Figure FDA0002344640240000035
m in formula (9)ie represents the addition of the type vectors of the connected variable entities of the rule body of the ith rule of the relation r;
logic rule loss function for adding entity type improved from equation (4)
Figure FDA0002344640240000036
As shown in the following equation (10)
Figure FDA0002344640240000037
M in formula (10)ie represents the addition of the type vectors of the connected variable entities of the rule body of the ith rule of the relation r;
learning model loss function by adding representation of embedded rule logic rule of entity type improved from equation (5)
Figure FDA0002344640240000038
The following formula (11):
Figure FDA0002344640240000039
6. the relationship prediction method combining logic rules and fragmentation knowledge as claimed in claim 5, wherein in the fifth stage, a conditional function Rt triggering model iterative training is shown as the following formula (12):
Figure FDA0002344640240000041
in the equation (12), # facts and # entity are the number of fact triples and the number of entities in the temporary repository KB', respectively, and θ is the model iterative training threshold.
7. A method of relationship prediction combining logic rules and fragmentation knowledge according to any of claims 1-6, applied to human relationship prediction.
CN201911390283.4A 2019-12-30 2019-12-30 Relation prediction method combining logic rule and fragmentation knowledge Active CN111191460B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911390283.4A CN111191460B (en) 2019-12-30 2019-12-30 Relation prediction method combining logic rule and fragmentation knowledge

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911390283.4A CN111191460B (en) 2019-12-30 2019-12-30 Relation prediction method combining logic rule and fragmentation knowledge

Publications (2)

Publication Number Publication Date
CN111191460A true CN111191460A (en) 2020-05-22
CN111191460B CN111191460B (en) 2023-01-03

Family

ID=70707772

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911390283.4A Active CN111191460B (en) 2019-12-30 2019-12-30 Relation prediction method combining logic rule and fragmentation knowledge

Country Status (1)

Country Link
CN (1) CN111191460B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417171A (en) * 2020-11-23 2021-02-26 南京大学 Data augmentation method for knowledge graph representation learning
CN113901151A (en) * 2021-09-30 2022-01-07 北京有竹居网络技术有限公司 Method, apparatus, device and medium for relationship extraction
CN114741460A (en) * 2022-06-10 2022-07-12 山东大学 Knowledge graph data expansion method and system based on association between rules
CN115033716A (en) * 2022-08-10 2022-09-09 深圳市人马互动科技有限公司 General self-learning system and self-learning method based on same
WO2023007270A1 (en) * 2021-07-26 2023-02-02 Carl Wimmer Foci analysis tool

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228245A (en) * 2016-07-21 2016-12-14 电子科技大学 Infer based on variation and the knowledge base complementing method of tensor neutral net
US20170017221A1 (en) * 2015-07-16 2017-01-19 Siemens Aktiengesellschaft Knowledge-based programmable logic controller with flexible in-field knowledge management and analytics
CN106528609A (en) * 2016-09-28 2017-03-22 厦门理工学院 Vector constraint embedded transformation knowledge graph inference method
CN109376864A (en) * 2018-09-06 2019-02-22 电子科技大学 A kind of knowledge mapping relation inference algorithm based on stacking neural network
CN110069638A (en) * 2019-03-12 2019-07-30 北京航空航天大学 A kind of knowledge mapping combination table dendrography learning method of binding rule and path

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170017221A1 (en) * 2015-07-16 2017-01-19 Siemens Aktiengesellschaft Knowledge-based programmable logic controller with flexible in-field knowledge management and analytics
CN106228245A (en) * 2016-07-21 2016-12-14 电子科技大学 Infer based on variation and the knowledge base complementing method of tensor neutral net
CN106528609A (en) * 2016-09-28 2017-03-22 厦门理工学院 Vector constraint embedded transformation knowledge graph inference method
CN109376864A (en) * 2018-09-06 2019-02-22 电子科技大学 A kind of knowledge mapping relation inference algorithm based on stacking neural network
CN110069638A (en) * 2019-03-12 2019-07-30 北京航空航天大学 A kind of knowledge mapping combination table dendrography learning method of binding rule and path

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭君: "基于三元组图模型的数据拓扑结构研究", 《万方数据学位论文库》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417171A (en) * 2020-11-23 2021-02-26 南京大学 Data augmentation method for knowledge graph representation learning
CN112417171B (en) * 2020-11-23 2023-10-03 南京大学 Knowledge graph representation learning-oriented data augmentation method
WO2023007270A1 (en) * 2021-07-26 2023-02-02 Carl Wimmer Foci analysis tool
CN113901151A (en) * 2021-09-30 2022-01-07 北京有竹居网络技术有限公司 Method, apparatus, device and medium for relationship extraction
CN114741460A (en) * 2022-06-10 2022-07-12 山东大学 Knowledge graph data expansion method and system based on association between rules
CN114741460B (en) * 2022-06-10 2022-09-30 山东大学 Knowledge graph data expansion method and system based on association between rules
CN115033716A (en) * 2022-08-10 2022-09-09 深圳市人马互动科技有限公司 General self-learning system and self-learning method based on same
WO2024031813A1 (en) * 2022-08-10 2024-02-15 深圳市人马互动科技有限公司 General self-learning system and self-learning method based on general self-learning system

Also Published As

Publication number Publication date
CN111191460B (en) 2023-01-03

Similar Documents

Publication Publication Date Title
CN111191460B (en) Relation prediction method combining logic rule and fragmentation knowledge
Deng et al. Attention-based BiLSTM fused CNN with gating mechanism model for Chinese long text classification
US9311823B2 (en) Caching natural language questions and results in a question and answer system
CN113360915B (en) Intelligent contract multi-vulnerability detection method and system based on source code diagram representation learning
CN110390017B (en) Target emotion analysis method and system based on attention gating convolutional network
Marx et al. Logic on MARS: Ontologies for Generalised Property Graphs.
Xu et al. Exploiting shared information for multi-intent natural language sentence classification.
Xu et al. Event temporal relation extraction with attention mechanism and graph neural network
CN113255822B (en) Double knowledge distillation method for image retrieval
CN112765653B (en) Multi-source data fusion privacy protection method based on multi-privacy policy combination optimization
US20150199607A1 (en) Incremental reasoning based on scalable and dynamical semantic data
Guo et al. ELAA: An efficient local adversarial attack using model interpreters
Wei et al. Fault diagnosis of marine turbocharger system based on an unsupervised algorithm
Xiao et al. Video captioning with temporal and region graph convolution network
Sen et al. Logical neural networks for knowledge base completion with embeddings & rules
CN114579605B (en) Table question-answer data processing method, electronic equipment and computer storage medium
Du et al. Bidirectional edge-enhanced graph convolutional networks for aspect-based sentiment classification
Diaconescu Quasi-boolean encodings and conditionals in algebraic specification
Sun et al. Information entropy and mutual information-based uncertainty measures in rough set theory
CN116994309B (en) Face recognition model pruning method for fairness perception
CN115577361B (en) Improved PHP Web shell detection method based on graph neural network
Krishnakumar et al. INDENT: Incremental Online Decision Tree Training for Domain-Specific Systems-on-Chip
Meng et al. Enhancing graph neural networks with edge features through sequential representation
CN117951314A (en) Scenario generation decision method integrating knowledge graph and large language generation model
Lv et al. Integration of multiple qualitative probabilistic networks based on probabilistic rough sets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant