CN116186278A - Knowledge graph completion method based on hyperplane projection and relational path neighborhood - Google Patents

Knowledge graph completion method based on hyperplane projection and relational path neighborhood Download PDF

Info

Publication number
CN116186278A
CN116186278A CN202211648882.3A CN202211648882A CN116186278A CN 116186278 A CN116186278 A CN 116186278A CN 202211648882 A CN202211648882 A CN 202211648882A CN 116186278 A CN116186278 A CN 116186278A
Authority
CN
China
Prior art keywords
entity
entities
triplet
path
tail
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211648882.3A
Other languages
Chinese (zh)
Inventor
韩亚丹
陆广泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Normal University
Original Assignee
Guangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Normal University filed Critical Guangxi Normal University
Priority to CN202211648882.3A priority Critical patent/CN116186278A/en
Publication of CN116186278A publication Critical patent/CN116186278A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a knowledge graph completion method based on hyperplane projection and relation path neighborhood, which comprises the following steps: 1) Embedding the knowledge graph by utilizing the structure information of the triples; 2) Joining neighborhood information of the path; 3) Adding mapping attributes of the relationship; 4) Designing a scoring function of the TransH-RPN; 5) When model training is carried out, a probability method is adopted to replace head and tail entities, and meanwhile, when the entities are selected, the entities are selected according to the similarity of the entities; 6) Link prediction based on hyperplane projection and knowledge graph completion of the relationship path neighborhood; 7) Triad classification based on hyperplane projection and knowledge graph completion of the relationship path neighborhood. The method adds the mapping attribute of the relation on the basis of a TransH model; and modeling is performed based on the path neighborhood of the large-scale knowledge graph by combining the neighborhood information of the path, so that the representation learning capacity of the model is improved, and the effect of knowledge graph complementation is improved.

Description

Knowledge graph completion method based on hyperplane projection and relational path neighborhood
Technical Field
The invention belongs to the technical field of knowledge representation learning and knowledge graph completion, and particularly relates to a knowledge graph completion method based on a relational path neighborhood.
Background
Knowledge Graph (knowledgegraph) stores a large number of facts in the real world, is a multiple relationship Graph consisting of entities (nodes) and relationships (different types of edges), is usually expressed in the form of triples (head entities, relationships, tail entities), and can be expressed by letters as (h, r, t). Nowadays, many knowledge maps are constructed, such as WordNet, freebase and Yago, which are widely used in various fields of knowledge reasoning, question-answering and recommendation systems.
Because knowledge base scale is continuously enlarged and data updating period is continuously shortened, knowledge graphs cannot contain all knowledge in the real world, so that missing knowledge needs to be predicted according to the existing knowledge in the knowledge graphs, and the task is called Knowledge Graph Completion (KGC) and comprises link prediction and ternary group classification tasks.
In order to complement the knowledge graph, knowledge representation learning is proposed, and the main idea is that firstly, a knowledge representation learning model is utilized to embed entities and relations of triples in the knowledge graph, then, a scoring function is utilized to score the triples, and finally, scoring results are arranged according to the sequence from high to low, so that the completion work of the knowledge graph is completed.
As the traditional knowledge representation learning method has stronger knowledge graph modeling capability, the method is interesting for students. However, these traditional knowledge representation learning models have some drawbacks. On the one hand, these more typical models are limited by the translation rules, so that the models cannot model complex and diverse entities; on the other hand, when the models are used for embedding the knowledge graph, only the structure information of the triples is concerned, the fact of a single triplet is taken as input, the information of the entity is very limited, and the expression capability of the vector is not strong, so that the models can not well represent the entity and the relation in the knowledge graph, and the problems of the models in the aspect of solving the completion of the knowledge graph are still not ideal. In recent years, in order to enhance knowledge representation learning capabilities of models, various multimodal information such as text descriptions, type constraints, visual information, entity attributes, logical rules, relationship paths, and the like have been used. The knowledge representation capability of the model can be significantly improved by combining the auxiliary information with the structure information of the triples. However, this multivariate information also has several problems: (1) The quality of the multi-element information is good and bad, and the existing model lacks an effective method for extracting useful information from the multi-element information; (2) The variety of the multi-source information is quite rich, but the rich information is not fully utilized; (3) The heterogeneity of head and tail entities in triples is ignored (i.e., the number of head and tail entities in the same relationship in the knowledge graph can sometimes be very large, whereas current models do not take into account the effect of such differences on entity modeling).
Disclosure of Invention
Aiming at the problems of the prior knowledge representation model, the invention provides a knowledge graph completion method based on hyperplane projection and path neighborhood, which adds a relation mapping attribute on the basis of a TransH model; and modeling is performed based on the path neighborhood of the large-scale knowledge graph by combining the neighborhood information of the path, so that the representation learning capacity of the model is improved, and the effect of knowledge graph complementation is improved.
The technical scheme for realizing the aim of the invention is as follows:
a knowledge graph completion method based on hyperplane projection and relation path neighborhood comprises the following steps:
1) Embedding the knowledge graph by utilizing the structural information of the triples: given a triplet (h, r, t), by using the idea of the hyperplane projection of TransH to project entities into a relationship-specific hyperplane, the projected head and tail entities are represented as:
Figure BDA0004008111690000021
w r is the normal vector of the hyperplane, d r Is a translation operation corresponding to the relation, and the scoring function of the TransH is defined as follows: f (f) r (h,t)=||h +d r -t ||;
2) Joining neighborhood information of the path: for the head entity or the tail entity in a triplet, there are many paths around them, and in order for the model to be able to use the most valuable path neighborhood information, the weight of each path needs to be calculated; the greater the weight value of a path, the more valuable the information describing the path is, and for the head entity and the tail entity in a triplet, there are two connection modes: first, the head entity and the tail entity are directly connected to form a direct path; secondly, the head entity and the tail entity are indirectly connected to form an indirect path, namely, a triplet cannot be directly formed, and the relationship is lost; for tail entities, co-head entities; then, the influence of the path on the entity embedding needs to be considered when the entity and the relation are embedded, and the influence is mainly represented by secondary embedding, wherein the secondary embedding is the calculation of the entity and the relation; therefore, when calculating weights, two cases are also divided: directly selecting the shortest path for the direct path, and taking the reciprocal of the shortest path value as the weight; for indirect paths, selecting nodes among the paths within five ranges (too many nodes are not needed to be selected, the paths are long because of too many nodes, a large amount of time is consumed and a large amount of memory is occupied during training), then accumulating the relation of each path connected with the nodes, selecting the path with the smallest value, and finally taking the reciprocal of the smallest value as the weight;
3) Mapping attributes of joining relationships: by virtue of the concept of the TransM, the TransM considers that each training triplet is associated with a weight representing the mapping degree, and the mapping property of the triplet is greatly dependent on the relation between the head entity and the tail entity, so that the weight is specific to the relation; in order to improve the processing capacity of the model on complex relationships, different weights are given to different relationships, so that the model can distinguish different relationships; in calculating the weight, it is necessary to calculate the average number t of tail entities corresponding to each head entity r qh r Average number h of head entities appearing corresponding to each tail entity r qt r Then calculate weights for each relationship according to equation (1)
Figure BDA0004008111690000022
4) The method comprises the following steps of designing a knowledge graph completion model-TransH-RPN scoring function based on hyperplane projection and relation path neighborhood as follows:
Figure BDA0004008111690000023
wherein->
Figure BDA0004008111690000024
5) When model training is carried out, a probability method is adopted to replace head and tail entities, and meanwhile, when the entities are selected, the entities are selected according to the similarity of the entities;
5.1 Using probability method to replace head and tail entities: to reduce the generation of false negative triples, for many-to-one relationships, a high probability is chosen to replace the tail entity; for one-to-many relationships, selecting high probability to replace head entities, giving a relationship and triples (h, r, t) of all positive samples related to the relationship, firstly calculating the average number t of tail entities correspondingly appearing in each head entity r qh r The method comprises the steps of carrying out a first treatment on the surface of the And the average number h of head entities corresponding to each tail entity r qt r When the probability method is adopted, then the method is as follows
Figure BDA0004008111690000031
Is sampled by Bernoulli distribution; when constructing a negative example triplet by using the positive example triplet, replacing a head entity with probability q, and replacing a tail entity with probabilities 1-q to ensure that the total probability is 1 and the sampling mode accords with Bernoulli distribution;
for each relation r, calculating the average number t of tail entities corresponding to each head entity r qh r Average number of head entities h corresponding to each tail entity r qt r The method comprises the steps of carrying out a first treatment on the surface of the When t r qh r < 1.5 and h r qt r < 1.5, meaning that the relationship r is one-to-one; when t r qh r > 1.5 and h r qt r > 1.5, meaning that the relationship r is many-to-many; when t r qh r Not less than 1.5 and h r qt r < 1.5, meaning that the relationship r is one-to-many; when t r qh r < 1.5 and h r qt r Not less than 1.5, the expression relationship r is many-to-one;
5.2 Selecting an entity based on similarity):
when entities are of similar type, thisThe entities are distributed in a relatively close range in the vector space, if the relationship is that the corresponding head entity residing in is a person name, the corresponding tail entity is a place, the person name is concentrated in one area, and the place is concentrated in another area; when similarity judgment is carried out between entities, semantic similarity between the entities or the relationships is selected for judgment and reflected to vector space, namely similarity between vectors is calculated, and a calculation formula is as follows:
Figure BDA0004008111690000032
thus, given a positive case triplet (h, r, t), when the replacement head entity generates a negative case triplet (h ', r, t), h ' is chosen such that dis (h, h ') is minimal; when the tail entity is replaced to generate a negative case triplet (h, r, t '), t ' is selected such that dis (t, t ') is minimal;
in the model training process, in order to distinguish the correct triples from the wrong triples, the following marginal-based loss function is adopted as an optimization objective function of a training model:
Figure BDA0004008111690000033
in the formula, S represents a set to which a correct triplet belongs, S' represents a set to which an incorrect triplet belongs, max (x, y) refers to a value between x and y returned, and γ represents a distance between a loss function score of the correct triplet and a loss function score of the incorrect triplet; the objective function is therefore optimized to maximize the separation of the correct triples from the incorrect triples;
in minimizing the objective function L, the following constraints are considered, mainly including:
Figure BDA0004008111690000034
Figure BDA0004008111690000041
Figure BDA0004008111690000042
the meaning expressed by the formula (2) is to ensure that the length of the entity vector is less than or equal to 1, the meaning expressed by the formula (3) is to ensure that the relation r is on a projection plane, and the meaning expressed by the formula (4) is that the hyperplane is a unit normal vector;
when a model is specifically trained, a random gradient descent method is adopted to optimize an objective function, and best experimental data are obtained by adjusting a learning rate, a marginal value, an embedding dimension, a batch processing size and a paradigm type;
6) Link prediction based on hyperplane projection and knowledge graph completion of relational path neighborhood: the goal of the link prediction is to predict h or t missing in the triples (h, r, t) according to the existing knowledge in the knowledge base, firstly construct a negative triplet, remove the head entity or the tail entity from the triples (h, r, t) of the positive example, and replace the head entity or the tail entity in each triplet in the test set with the entity in the set in turn; then calculate the score of these damaged triples, arrange these scores in descending order; finally, the ranking of the correct entity is recorded, and the task emphasizes the ranking of the correct entity, rather than finding only the best entity;
7) Triad classification based on hyperplane projection and knowledge graph completion of relational path neighborhood: the purpose of triplet classification is to determine whether a given triplet (h, r, t) is correct, which is a binary classification task; this method of evaluation requires taking negative samples into account, however, the data that appears in the existing knowledge-graph is considered to be correct, and therefore a negative sample set needs to be constructed such that the ratio of the positive and negative samples is 1:1, a step of; after the negative sample set is constructed, vector representations of entities and relations learned by the model are calculated by using a scoring function to obtain scores of all triples, and a threshold sigma is determined when maximum classification accuracy is obtained according to a verification set during training r This threshold sigma r Closely related to the relationship, determining different thresholds for different relationships; for a triplet (h, r, t), if the score is less than a given threshold σ r Then the prediction is correct and vice versa.
Compared with the prior knowledge representation model, the technical scheme has the beneficial effects that:
1) The neighborhood information of the path is fully utilized, and the representation learning capacity of the model is improved;
2) The mapping attribute of the relation is added, so that the model is more good at processing the complex relation in the triplet;
3) The probability method is used for replacing the head entity and the tail entity in the triples, so that the quality of the generated negative triples is improved.
By combining the points, the technical scheme finally optimizes the effects of link prediction and triplet classification, and is superior to the traditional baseline model.
The method adds relation mapping attribute on the basis of a TransH model; and modeling is performed based on the path neighborhood of the large-scale knowledge graph by combining the neighborhood information of the path, so that the representation learning capacity of the model is improved, and the effect of knowledge graph complementation is improved.
Drawings
FIG. 1 is a diagram of a TransH-RPN model in an embodiment;
FIG. 2 is an exemplary diagram of an indistinguishable entity.
Detailed Description
The present invention will now be further illustrated with reference to the drawings and examples, but is not limited thereto.
Examples:
referring to fig. 1, a knowledge graph completion method based on hyperplane projection and relationship path neighborhood includes the following steps:
1) Embedding the knowledge graph by utilizing the structural information of the triples: given a triplet (h, r, t), by using the idea of the hyperplane projection of TransH to project entities into a relationship-specific hyperplane, the projected head and tail entities are represented as:
Figure BDA0004008111690000051
w r is the normal vector of the hyperplane, d r Is a translation operation corresponding to the relation, and the scoring function of the TransH is defined as follows: f (f) r (h,t)=||h +d r -t ||;
2) Joining neighborhood information of the path: there are many paths around the head entity and the tail entity in the triplet, and in order for the model to be able to utilize the most valuable path neighborhood information, the weight of each path that the head entity and the tail entity are connected to needs to be calculated; the greater the weight value of a path, the more valuable the information describing the path is, and there may be two ways in which the head entity and the tail entity of a triplet in the knowledge-graph are connected: first, the head entity and the tail entity are directly connected to form a direct path; secondly, the head entity and the tail entity are indirectly connected to form an indirect path, namely, a triplet cannot be directly formed, the relation is lost, and for the tail entity, the head entity and the tail entity are identical; then, the effect of the path on the entity embedding needs to be considered when the entity and the relation are embedded, and the effect is mainly represented by secondary embedding, wherein the secondary embedding is the calculation of the entity and the relation, and therefore, when the weight is calculated, two cases are also divided: directly selecting the shortest path for the direct path, and taking the reciprocal of the shortest path value as the weight; for indirect paths, selecting nodes among the paths within five ranges, accumulating the relation of each path connected with the nodes, selecting the path with the smallest value, and finally taking the reciprocal of the smallest value as a weight;
3) Mapping attributes of joining relationships: there are four relationships between the head and tail entities of the triplet: 1-1,1-N, N-1 and N-N, however, when the entity and the relation are projected by using the TransH, different relations in the triples are not distinguished, so that a relation mapping attribute is added on the basis of the hyperplane projection of the TransH, and the thought of the TransM is used for reference; in this example, each training triplet is associated with a weight representing the degree of mapping, the mapping nature of the triplet being largely dependent on the head entity to tail entity relationship, and hence the weights being relationship specific; in order to improve the processing capability of the model on complex relationships, different weights are given to different relationships, so that the model can distinguish different relationships, and when the weights are calculated, the average number t of tail entities which correspondingly appear in each head entity is required to be calculated r qh r Average of head entities appearing corresponding to each tail entityQuantity h r qt r Then calculate weights for each relationship according to equation (1)
Figure BDA0004008111690000052
By calculating weights for different relations, the model can distinguish different relations in the triples, so that the capability of the model for processing complex relations is improved;
4) Based on the steps 1) -3), designing a knowledge graph completion model-TransH-RPN scoring function based on the hyperplane projection and the relation path neighborhood as follows:
Figure BDA0004008111690000061
wherein the method comprises the steps of
Figure BDA0004008111690000062
Figure BDA0004008111690000063
Weights representing the relationship paths;
5) When model training is carried out, a probability method is adopted to replace head and tail entities, and meanwhile, when the entities are selected, the entities are selected according to the similarity of the entities;
5.1 Using probability method to replace head and tail entities: to reduce the generation of false negative triples, for many-to-one relationships, a high probability is chosen to replace the tail entity; for one-to-many relationships, selecting a high probability replacement header entity; given a relation and triples (h, r, t) of all positive samples related to the relation, firstly, the average number t of tail entities which correspondingly appear in each head entity is calculated r qh r The method comprises the steps of carrying out a first treatment on the surface of the And the average number h of head entities corresponding to each tail entity r qt r When the probability method is adopted, then the method is as follows
Figure BDA0004008111690000064
When the positive case triples are utilized to construct the negative case triples, the probability is used for replacing the head entity, the probability is used for replacing the tail entity, the probability is 1-q, the total probability is 1, and the sampling mode accords with the Bernoulli distribution;
5.2 Selecting an entity based on similarity):
when the entities have similar types, the entities are distributed in a relatively close range in a vector space, for example, the relationship is that the corresponding head entity is a person name, the corresponding tail entity is a place, the person name is concentrated in one area, the place is concentrated in another area, as shown in fig. 2, when the similarity between the entities is judged, the similarity of the semantics between the entities or the relationship is selected for judgment, and the similarity between the vectors is reflected in the vector space, namely, calculated by the formula:
Figure BDA0004008111690000065
thus, given a positive case triplet (h, r, t), when the replacement head entity generates a negative case triplet (h ', r, t), h ' is chosen such that dis (h, h ') is minimal; when the tail entity is replaced to generate a negative case triplet (h, r, t '), t ' is selected such that dis (t, t ') is minimal;
in the model training process, in order to distinguish the correct triples from the wrong triples, the following marginal-based loss function is adopted as an optimization objective function of a training model:
Figure BDA0004008111690000066
in the formula, S represents the set to which the correct triplet belongs, S' represents the set to which the error triplet belongs, max (x, y) refers to the return of a value between x and y, and gamma represents the distance between the loss function score of the correct triplet and the loss function score of the error triplet, so that the optimization objective of the objective function is to separate the correct triplet and the error triplet to the greatest extent;
in minimizing the objective function L, the following constraints are considered, mainly including:
Figure BDA0004008111690000067
Figure BDA0004008111690000068
Figure BDA0004008111690000071
the meaning expressed by the formula (2) is to ensure that the length of the entity vector is less than or equal to 1, the meaning expressed by the formula (3) is to ensure that the relation r is on a projection plane, and the meaning expressed by the formula (4) is that the hyperplane is a unit normal vector;
when a model is specifically trained, a random gradient descent method is adopted to optimize an objective function, and best experimental data are obtained by adjusting a learning rate, a marginal value, an embedding dimension, a batch processing size and a paradigm type;
6) Link prediction based on hyperplane projection and knowledge graph completion of relational path neighborhood: the goal of the link prediction is to predict h or t missing in the triples (h, r, t) according to the existing knowledge in the knowledge base, firstly, a negative triplet needs to be constructed, the head entity or the tail entity of the triples (h, r, t) of the positive example is removed, and the head entity or the tail entity in each triplet in the test set is replaced by the entity in the set in sequence; then calculate the score of these damaged triples, arrange these scores in descending order; finally, the ranking of the correct entity is recorded; the task emphasizes the ranking of the correct entity, rather than finding only the best one;
7) Triad classification based on hyperplane projection and knowledge graph completion of relational path neighborhood: the purpose of triplet classification is to determine whether a given triplet (h, r, t) is correct, which is a binary classification task; this method of evaluation requires taking negative samples into account, however, the data that appears in the existing knowledge-graph is considered to be correct, and therefore a negative sample set needs to be constructed such that the ratio of the positive and negative samples is 1:1, a step of; then, vector representations of entities and relations learned by the model are calculated by using a scoring function, and scores of all triples are obtained; at training time, a threshold sigma is determined when maximum classification accuracy is obtained from the validation set r This threshold sigma r Closely related to the relationship; determining different thresholds for different relationships, for a triplet (h, r, t), if scoredLess than a given threshold sigma r Then the prediction is correct and vice versa.
The knowledge graph learning method is used as a model for knowledge representation learning and applied to the field of knowledge graph completion. The knowledge representation learning model maps the entity and the relation into a low-dimensional continuous space, and predicts a missing link in the knowledge graph through calculation of a numerical vector so as to complete the completion work of the knowledge graph. In this example, firstly, the proposed TransH-RPN model is utilized to project the entities in the knowledge graph into the hyperplane specific to the relation, so as to obtain the vector representation of the head entity and the tail entity in the hyperplane, then the scoring function f (h, t) is utilized to calculate the vector, and the probability that one candidate triplet is established is judged, so that the score of the positive triplet is greater than the score of the negative triplet through optimizing the objective function. After scoring, the scores of all triples are ranked in order from high to low, the higher the score is, the greater the probability that the triples are established is, and the triples with the highest score are added into the knowledge graph, so that the completion work of the knowledge graph is completed.

Claims (1)

1. A knowledge graph completion method based on hyperplane projection and relation path neighborhood is characterized by comprising the following steps:
1) Embedding the knowledge graph by utilizing the structural information of the triples: given a triplet (h, r, t), by using the idea of the hyperplane projection of TransH to project entities into a relationship-specific hyperplane, the projected head and tail entities are represented as:
Figure FDA0004008111680000011
w r is the normal vector of the hyperplane, d r Is a translation operation corresponding to the relation, and the scoring function of the TransH is defined as follows: f (f) r (h,t)=||h +d r -t ||;
2) Joining neighborhood information of the path: to improve the representation capability of the model, adding neighborhood information of the path; a plurality of paths are arranged around a head entity and a tail entity in the triplet, in order to enable the model to utilize the most valuable path neighborhood information, the weight of each path needs to be calculated, and the greater the weight value of the path is, the information indicating the path is the most valuable; for the head entity and the tail entity in a triplet, there are two connection modes: first, the head entity and the tail entity are directly connected to form a direct path; secondly, the head entity and the tail entity are indirectly connected to form an indirect path, namely, a triplet cannot be directly formed, and the relationship is lost; for tail entities, co-head entities; then, the influence of the path on the entity embedding needs to be considered when the entity and the relation are embedded, and the influence is mainly represented by secondary embedding, wherein the secondary embedding is the calculation of the entity and the relation; therefore, when calculating weights, two cases are also divided: directly selecting the shortest path for the direct path, and taking the reciprocal of the shortest path value as the weight; for indirect paths, selecting nodes between paths to be in five ranges, accumulating the relation of each path connected with the nodes, selecting the path with the smallest value, and finally taking the reciprocal of the smallest value as a weight;
3) Mapping attributes of joining relationships: by referring to the concept of the TransM, the TransM considers that each training triplet is associated with a weight representing the mapping degree, and the mapping property of the triplet depends on the relation between a head entity and a tail entity in the triplet, so the weight is specific to the relation; in order to improve the processing capacity of the model on complex relationships, different weights are given to different relationships, so that the model can distinguish different relationships; when calculating the weight, the average number t of tail entities which correspondingly appear in each head entity needs to be calculated r qh r Average number h of head entities appearing corresponding to each tail entity r qt r Then calculate weights for each relationship according to equation (1)
Figure FDA0004008111680000012
4) The method comprises the following steps of designing a knowledge graph completion model-TransH-RPN scoring function based on hyperplane projection and relation path neighborhood as follows:
Figure FDA0004008111680000013
wherein->
Figure FDA0004008111680000014
5) When model training is carried out, a probability method is adopted to replace head and tail entities, and meanwhile, when the entities are selected, the entities are selected according to the similarity of the entities;
5.1 Using probability method to replace head and tail entities: to reduce the generation of false negative triples, for many-to-one relationships, a high probability is chosen to replace the tail entity; for one-to-many relationships, selecting a high probability replacement header entity; given a relation and triples (h, r, t) of all positive samples related to the relation, firstly, the average number t of tail entities which correspondingly appear in each head entity is calculated r qh r The method comprises the steps of carrying out a first treatment on the surface of the And the average number h of head entities corresponding to each tail entity r qt r When the probability method is adopted, then the method is as follows
Figure FDA0004008111680000021
When the positive case triples are utilized to construct the negative case triples, the probability q is used for replacing the head entity, the probability 1-q is used for replacing the tail entity, the total probability is 1, and the sampling mode accords with the Bernoulli distribution;
for each relation r, calculating the average number t of tail entities corresponding to each head entity r qh r Average number of head entities h corresponding to each tail entity r qt r The method comprises the steps of carrying out a first treatment on the surface of the When t r qh r < 1.5 and h r qt r < 1.5, meaning that the relationship r is one-to-one; when t r qh r > 1.5 and h r qt r > 1.5, meaning that the relationship r is many-to-many; when t r qh r Not less than 1.5 and h r qt r < 1.5, meaning that the relationship r is one-to-many; when t r qh r < 1.5 and h r qt r Not less than 1.5, the expression relationship r is many-to-one;
5.2 Selecting an entity based on similarity):
when entities are of similar type, the entities are facingThe measurement space can be distributed in a range with a relatively short distance, for example, the relationship that the head entity corresponding to living is a name of a person, the tail entity corresponding to the head entity is a place, the name of the person can be concentrated in one area, and the place can be concentrated in another area; when similarity judgment is carried out between entities, semantic similarity between the entities or the relationships is selected for judgment and reflected to vector space, namely similarity between vectors is calculated, and a calculation formula is as follows:
Figure FDA0004008111680000022
thus, given a positive case triplet (h, r, t), when the replacement head entity generates a negative case triplet (h ', r, t), h ' is chosen such that dis (h, h ') is minimal; when the tail entity is replaced to generate a negative case triplet (h, r, t '), t ' is selected such that dis (t, t ') is minimal;
in the model training process, in order to distinguish the correct triples from the wrong triples, the following marginal-based loss function is adopted as an optimization objective function of a training model:
Figure FDA0004008111680000023
in the formula, S represents the set to which the correct triplet belongs, S' represents the set to which the error triplet belongs, max (x, y) refers to the return of a value between x and y, and gamma represents the distance between the loss function score of the correct triplet and the loss function score of the error triplet, so that the optimization objective of the objective function is to separate the correct triplet and the error triplet to the greatest extent;
in minimizing the objective function L, the following constraints are considered, mainly including:
Figure FDA0004008111680000024
Figure FDA0004008111680000025
Figure FDA0004008111680000026
the meaning expressed by the formula (2) is to ensure that the length of the entity vector is less than or equal to 1, the meaning expressed by the formula (3) is to ensure that the relation r is on a projection plane, and the meaning expressed by the formula (4) is that the hyperplane is a unit normal vector;
when a model is specifically trained, a random gradient descent method is adopted to optimize an objective function, and best experimental data are obtained by adjusting a learning rate, a marginal value, an embedding dimension, a batch processing size and a paradigm type;
6) Link prediction based on hyperplane projection and knowledge graph completion of relational path neighborhood: the goal of the link prediction is to predict the missing h or t in the triplet (h, r, t) based on knowledge known in the knowledge base; firstly, constructing a negative triplet, removing a head entity or a tail entity from a positive triplet (h, r, t), and sequentially replacing the head entity or the tail entity in each triplet in the test set by the entities in the set; then calculate the score of these damaged triples, arrange these scores in descending order; finally, the ranking of the correct entity is recorded, and the task emphasizes the ranking of the correct entity, rather than finding only the best entity;
7) Triad classification based on hyperplane projection and knowledge graph completion of relational path neighborhood: the purpose of triplet classification is to determine whether a given triplet (h, r, t) is correct, which is a binary classification task; this method of evaluation requires taking negative samples into account, however, the data that appears in the existing knowledge-graph is considered to be correct, and therefore a negative sample set needs to be constructed such that the ratio of the positive and negative samples is 1:1, a step of; after the negative sample set is constructed, vector representations of entities and relations learned by the model are calculated by using a scoring function to obtain scores of all triples, and a threshold sigma is determined when maximum classification accuracy is obtained according to a verification set during training r This threshold sigma r Closely related to the relationship; determining different thresholds for different relationships, for a triplet (h, r, t), if the score is smaller than a given threshold σ r Then the prediction is correct, otherwise the errorError.
CN202211648882.3A 2022-12-20 2022-12-20 Knowledge graph completion method based on hyperplane projection and relational path neighborhood Pending CN116186278A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211648882.3A CN116186278A (en) 2022-12-20 2022-12-20 Knowledge graph completion method based on hyperplane projection and relational path neighborhood

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211648882.3A CN116186278A (en) 2022-12-20 2022-12-20 Knowledge graph completion method based on hyperplane projection and relational path neighborhood

Publications (1)

Publication Number Publication Date
CN116186278A true CN116186278A (en) 2023-05-30

Family

ID=86437484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211648882.3A Pending CN116186278A (en) 2022-12-20 2022-12-20 Knowledge graph completion method based on hyperplane projection and relational path neighborhood

Country Status (1)

Country Link
CN (1) CN116186278A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116705338A (en) * 2023-08-08 2023-09-05 中国中医科学院中医药信息研究所 Traditional Chinese medicine multi-mode knowledge graph reasoning method and device based on rules and paths

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116705338A (en) * 2023-08-08 2023-09-05 中国中医科学院中医药信息研究所 Traditional Chinese medicine multi-mode knowledge graph reasoning method and device based on rules and paths
CN116705338B (en) * 2023-08-08 2023-12-08 中国中医科学院中医药信息研究所 Traditional Chinese medicine multi-mode knowledge graph reasoning method and device based on rules and paths

Similar Documents

Publication Publication Date Title
CN111753101B (en) Knowledge graph representation learning method integrating entity description and type
CN112131404A (en) Entity alignment method in four-risk one-gold domain knowledge graph
CN108509463A (en) A kind of answer method and device of problem
Li et al. Classifiability-based omnivariate decision trees
CN110647904A (en) Cross-modal retrieval method and system based on unmarked data migration
WO2023116111A1 (en) Disk fault prediction method and apparatus
CN108052683B (en) Knowledge graph representation learning method based on cosine measurement rule
CN112000689A (en) Multi-knowledge graph fusion method based on text analysis
CN110909881A (en) Knowledge representation method for cross-media knowledge reasoning task
CN116186278A (en) Knowledge graph completion method based on hyperplane projection and relational path neighborhood
CN114254093A (en) Multi-space knowledge enhanced knowledge graph question-answering method and system
CN107451617B (en) Graph transduction semi-supervised classification method
CN114897085A (en) Clustering method based on closed subgraph link prediction and computer equipment
CN113764034B (en) Method, device, equipment and medium for predicting potential BGC in genome sequence
Lonij et al. Open-world visual recognition using knowledge graphs
CN108830407B (en) Sensor distribution optimization method in structure health monitoring under multi-working condition
CN114357221A (en) Self-supervision active learning method based on image classification
CN112668633B (en) Adaptive graph migration learning method based on fine granularity field
CN111339258B (en) University computer basic exercise recommendation method based on knowledge graph
CN111584010A (en) Key protein identification method based on capsule neural network and ensemble learning
CN111414930A (en) Deep learning model training method and device, electronic equipment and storage medium
Malik et al. XGBoost: A deep dive into boosting
CN113033914B (en) Entity and relation prediction method for machining process knowledge graph
CN113076490B (en) Case-related microblog object-level emotion classification method based on mixed node graph
CN115098699A (en) Link prediction method based on knowledge graph embedded model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination