CN110472233B - Relation similarity measurement method and system based on head-tail entity distribution in knowledge base - Google Patents

Relation similarity measurement method and system based on head-tail entity distribution in knowledge base Download PDF

Info

Publication number
CN110472233B
CN110472233B CN201910639564.2A CN201910639564A CN110472233B CN 110472233 B CN110472233 B CN 110472233B CN 201910639564 A CN201910639564 A CN 201910639564A CN 110472233 B CN110472233 B CN 110472233B
Authority
CN
China
Prior art keywords
head
relations
tail entity
divergence
tail
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910639564.2A
Other languages
Chinese (zh)
Other versions
CN110472233A (en
Inventor
刘知远
陈暐泽
朱昊
韩旭
孙茂松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201910639564.2A priority Critical patent/CN110472233B/en
Publication of CN110472233A publication Critical patent/CN110472233A/en
Application granted granted Critical
Publication of CN110472233B publication Critical patent/CN110472233B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a relation similarity measurement method and a relation similarity measurement system based on head and tail entity distribution in a knowledge base, wherein the method comprises the following steps: acquiring two relations to be compared; acquiring the head and tail entity distribution corresponding to the two relations respectively; calculating KL divergence between head and tail entity distributions corresponding to the two relations respectively, and determining similarity between the two relations based on the calculated KL divergence. Based on the relation similarity measurement mode of head and tail entity distribution in the knowledge base, the similarity between the two relations can be determined by using the information of the head and tail entities in the knowledge base. Meanwhile, the embodiment of the invention focuses on the distribution of head and tail entities of the two relations, so that the interpretability of the similarity of the two relations is enhanced.

Description

Relation similarity measurement method and system based on head-tail entity distribution in knowledge base
Technical Field
The invention relates to the technical field of natural language processing and knowledge representation, in particular to a method and a system for measuring relational similarity based on head and tail entity distribution in a knowledge base.
Background
In order to store and process the real-world knowledge in a structured manner and simultaneously facilitate a computer model to achieve better expression effect with the aid of the knowledge, a plurality of large-scale knowledge maps are established, such as Wikitata, Dbpedia, YAGO and the like. The knowledge graph takes proper nouns such as characters, place names, organization names and the like and things as entities, takes the relation among the entities as the relation, and finally stores knowledge in the form of a ternary relation group of (head entity, relation, tail entity). For example, the knowledge of "yaoming was born in shanghai" is represented in the knowledge-graph by the triad relationship (yaoming, born in …, shanghai).
Based on the existing knowledge base, people explore many tasks, such as automatic completion of the knowledge base, relationship extraction and the like. We have found that in these tasks, existing models tend to have difficulty distinguishing similar relationships. If the similarity between the relations can be measured, the ability of the model for distinguishing the similar relations can be strengthened in a targeted mode in the training process of the model, and therefore the ability of the model is enhanced.
Disclosure of Invention
The embodiment of the invention provides a relation similarity measurement method and a relation similarity measurement system based on head and tail entity distribution in a knowledge base, which are used for solving the problem that the effect of measuring the similarity between entity relations in the prior art is unsatisfactory, realizing better quantification of the similarity between the relations in the knowledge base and ensuring that the similarity determined by a measurement mode has high similarity with the cognition of people on the similarity.
The embodiment of the invention provides a relation similarity measurement method based on head and tail entity distribution in a knowledge base, which comprises the following steps:
acquiring two relations to be compared;
acquiring the head and tail entity distribution corresponding to the two relations respectively;
calculating KL divergence between head and tail entity distributions corresponding to the two relations respectively, and determining similarity between the two relations based on the calculated KL divergence.
Further, the step of calculating the KL divergence between the distributions of the head and tail entities corresponding to the two relationships further includes:
based on Monte Carlo simulation, calculating KL divergence between head and tail entity distributions corresponding to the two relations.
Further, the step of obtaining two relationships to be compared further includes:
defining the distribution of the ternary relationship group and defining the calculation mode of the distribution of the ternary relationship group.
Further, the step of defining the distribution of the set of ternary relationships and defining the calculation mode of the distribution of the set of ternary relationships further includes:
and calculating optimization model parameters, and optimizing the distribution of the ternary relationship group based on the optimization model parameters.
Further, the step of calculating the KL divergence between the head and tail entity distributions corresponding to the two relationships based on the monte carlo simulation further includes:
calculating KL divergence between head and tail entity distributions corresponding to the two relations based on the following formula:
Figure BDA0002131407970000021
wherein DKL(. I. represents KL divergence);
Figure BDA0002131407970000024
represents the relationship r1The corresponding head and tail entities are distributed,
Figure BDA0002131407970000025
represents the relationship r2Corresponding head and tail entity distribution; h and t are respectively a head entity and a tail entity in the relation corresponding ternary relation group; theta*Is a model parameter;
Figure BDA0002131407970000023
is from
Figure BDA0002131407970000026
And (4) sampling the head and tail entity pair set.
Further, the theta*The following conditions are satisfied:
Figure BDA0002131407970000022
wherein
Figure BDA0002131407970000031
Is a set of relational triples, epsilon is a set of entities,
Figure BDA0002131407970000032
is a collection of relationships; and theta is a parameter model before optimization.
The embodiment of the invention provides a relation similarity measurement system based on head and tail entity distribution in a knowledge base, which comprises the following steps:
the acquisition module is used for acquiring two relations to be compared;
the acquisition module is further used for acquiring the head and tail entity distribution corresponding to the two relations respectively;
and the calculation module is used for calculating KL divergence between head and tail entity distributions corresponding to the two relations and determining the similarity between the two relations based on the calculated KL divergence.
Further, the calculation module is further configured to:
based on Monte Carlo simulation, calculating KL divergence between head and tail entity distributions corresponding to the two relations.
An embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the relationship similarity measurement method according to any one of the above descriptions when executing the computer program.
Embodiments of the present invention provide a non-transitory computer readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the relationship similarity measure method according to any one of the above-mentioned methods.
The method and the system for measuring the similarity of the relationship based on the head and tail entity distribution in the knowledge base provided by the embodiment of the invention can determine the similarity between two relationships by using the information of the head and tail entities in the knowledge base based on the relation similarity measuring mode of the head and tail entity distribution in the knowledge base. Meanwhile, the embodiment of the invention focuses on the distribution of head and tail entities of the two relations, so that the interpretability of the similarity of the two relations is enhanced. The method and the system provided by the embodiment of the invention can be directly expanded into the real world, for example, the method and the system can help the open domain relation extraction task to carry out the combination of the redundant relation, and can be used as a component in a heuristic algorithm to optimize a relation extraction model and a relation prediction model.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart of a prior art method for measuring similarity based on head-to-tail entity distribution in a knowledge base;
FIG. 2 is a flowchart of an embodiment of a method for measuring similarity based on head-to-tail entity distribution in a knowledge base according to the present invention;
fig. 3 is a schematic structural diagram of an embodiment of the apparatus for measuring similarity of relationships based on head-tail entity distribution in a knowledge base according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to solve at least one technical problem in the prior art, an embodiment of the present invention provides a method for measuring relationship similarity. As shown in fig. 1, the method for measuring relationship similarity generally includes the following steps:
in step S1, two relationships to be compared are obtained.
And step S2, acquiring the head and tail entity distribution corresponding to the two relations respectively.
And step S3, calculating KL divergence between head and tail entity distributions corresponding to the two relations respectively, and determining similarity between the two relations based on the calculated KL divergence.
It should be noted that the triple corresponding to the relationship to be compared is predefined, the head-tail entity distribution calculation expression corresponding to the relationship to be compared is also defined in advance, the above-mentioned definition method may adopt a method in the prior art, and the embodiment of the present invention is not particularly limited.
It is assumed that the similarity between the distributions of the head and tail entities corresponding to the two relationships reflects the similarity between the two relationships, that is, if the distributions of the head and tail entities corresponding to the two relationships are very similar, the embodiment of the present invention reasonably considers that the two relationships are also very similar. Therefore, the embodiment of the invention measures the similarity based on the Kullback-Leibler divergence (KL divergence) between the head and tail entity distributions corresponding to the two relations.
Compared with the prior art, the embodiment of the invention provides a new relation similarity measurement mode based on head and tail entity distribution in the knowledge base, and the similarity between two relations can be determined by utilizing the information of the head and tail entities in the knowledge base. Meanwhile, the interpretability of the similarity of the two relations is enhanced by focusing on the distribution of head and tail entities of the two relations. The method provided by the embodiment of the invention can be directly expanded into the real world, for example, the method helps the open domain relation extraction task to merge the redundant relation, and is used as a component in a heuristic algorithm to optimize a relation extraction model and a relation prediction model.
On the basis of the foregoing embodiment of the present invention, a method for measuring relationship similarity is provided, where the step of calculating KL divergence between distributions of head and tail entities corresponding to two relationships further includes: based on Monte Carlo simulation, calculating KL divergence between the two head-tail entity distributions.
Since the calculation of the distribution of all head and tail entities of two relationships in the prior art involves the calculation of the number of all head and tail entities, which is very resource-consuming, the embodiment of the present invention considers that the KL divergence is estimated based on the monte carlo approximation.
Among them, the monte carlo method is also called a statistical simulation method and a statistical test method. The method is a numerical simulation method using a probability phenomenon as a research object. The method is a calculation method for estimating an unknown characteristic amount by obtaining a statistical value by a sampling survey method. Monte Carlo is a famous gamble in Morna, which is named to indicate its randomly sampled nature. Therefore, the method is suitable for performing calculation simulation tests on the discrete system. In computational simulation, the stochastic nature of the system can be simulated by constructing a probabilistic model that approximates the performance of the system and performing stochastic tests on a digital computer.
Figure BDA0002131407970000051
Wherein D isKL(. I. represents KL divergence);
Figure BDA0002131407970000053
represents the relationship r1The corresponding head and tail entities are distributed,
Figure BDA0002131407970000054
represents the relationship r2Corresponding head and tail entity distribution; h and t are respectively a head entity and a tail entity in the relation corresponding ternary relation group; theta*Is a model parameter;
Figure BDA0002131407970000052
is from
Figure BDA0002131407970000055
And (4) sampling the head and tail entity pair set.
Figure BDA0002131407970000061
Epsilon is the set of entities that are,
Figure BDA0002131407970000062
is a collection of relationships.
Compared with the prior art, the embodiment of the invention provides a new relation similarity measurement mode based on head and tail entity distribution in a knowledge base, and the KL divergence between all the head and tail entity distributions of two relations is estimated based on Monte Carlo approximation, so that the calculation resources are saved, and the calculation rate is improved.
On the basis of any one of the above embodiments of the present invention, there is provided a method for measuring relationship similarity, where the step of obtaining two relationships to be compared further includes:
defining the distribution of the ternary relationship group and defining the calculation mode of the distribution of the ternary relationship group.
Wherein the distribution of the set of ternary relations is first defined.
A set of ternary relations can be represented as (h, r, t), h being the head entity, t being the tail entity, r being the relation between the two, and h, t ∈ ε, ε being the set of entities,
Figure BDA0002131407970000063
Figure BDA0002131407970000064
is a collection of relationships. First consider a function
Figure BDA0002131407970000065
All sets of ternary relations may be mapped to a scalar. In particular, can define
Figure BDA0002131407970000066
Further, we utilize FθTo define an unnormalized probability.
Figure BDA0002131407970000067
In the embodiment of the invention, only F of a local normalized version is consideredθ
Figure BDA0002131407970000068
Figure BDA0002131407970000069
Figure BDA00021314079700000610
Wherein
Figure BDA00021314079700000611
And
Figure BDA00021314079700000612
can be computed directly from a feed-forward neural network. By the above-mentioned local normalization, it is possible,
Figure BDA00021314079700000613
naturally, it is a reasonable probability distribution because of ∑h,r,texp (h, r, t) is 1, and thus
Figure BDA00021314079700000614
Secondly, a calculation mode of distribution of the ternary relationship group is defined.
For the
Figure BDA0002131407970000071
Giving each relationship a different parameter and taking this parameter as the log probability, i.e.:
Figure BDA0002131407970000072
wherein theta is1(r) is a parameter corresponding to the relation r.
For the second and third parts, the calculation is performed by a multilayer perceptron:
Figure BDA0002131407970000073
Figure BDA0002131407970000074
each MLP represents a multi-layer perceptron with the expression y ═ relu (Wx + b) per layer, and h, r, and t are vectors of h, r, and t.
Compared with the prior art, the embodiment of the invention provides a new relation similarity measurement mode based on head and tail entity distribution in the knowledge base, and the similarity between two relations can be determined by utilizing the information of the head and tail entities in the knowledge base. Meanwhile, the interpretability of the similarity of the two relations is enhanced by focusing on the distribution of head and tail entities of the two relations. The method provided by the embodiment of the invention can be directly expanded into the real world, for example, the method helps the open domain relation extraction task to merge the redundant relation, and is used as a component in a heuristic algorithm to optimize a relation extraction model and a relation prediction model.
On the basis of any of the above embodiments of the present invention, there is provided a method for measuring relationship similarity, where the step of defining the distribution of the set of ternary relationships and the calculation manner of the distribution of the set of ternary relationships further includes: and calculating optimization model parameters, and optimizing the distribution of the ternary relationship group based on the optimization model parameters.
In some embodiments, it is desirable to maximize the joint probability of the training set, i.e., to find the parameter θ of a set of models*So that:
Figure BDA0002131407970000075
wherein
Figure BDA0002131407970000076
Is a set of relational triples, epsilon is a set of entities,
Figure BDA0002131407970000077
is a collection of relationships; and theta is a parameter model before optimization.
Compared with the prior art, the embodiment of the invention provides a new relation similarity measurement mode based on head and tail entity distribution in the knowledge base, and the similarity between two relations can be determined by utilizing the information of the head and tail entities in the knowledge base. Meanwhile, the interpretability of the similarity of the two relations is enhanced by focusing on the distribution of head and tail entities of the two relations. The method provided by the embodiment of the invention can be directly expanded into the real world, for example, the method helps the open domain relation extraction task to merge the redundant relation, and is used as a component in a heuristic algorithm to optimize a relation extraction model and a relation prediction model.
On the basis of any of the above embodiments of the present invention, there is provided a method for measuring relationship similarity,
the step of calculating the KL divergence between the head and tail entity distributions corresponding to the two relationships based on the Monte Carlo simulation further includes:
calculating KL divergence between head and tail entity distributions corresponding to the two relations based on the following formula:
Figure BDA0002131407970000081
wherein DKL(. I. represents KL divergence);
Figure BDA0002131407970000084
represents the relationship r1The corresponding head and tail entities are distributed,
Figure BDA0002131407970000085
represents the relationship r2Corresponding head and tail entity distribution; h and t are respectively a head entity and a tail entity in the relation corresponding ternary relation group; theta*Is a model parameter;
Figure BDA0002131407970000082
is from
Figure BDA0002131407970000086
And (4) sampling the head and tail entity pair set.
It is assumed that the distance between the distributions of the head and tail entities corresponding to the two relationships reflects the similarity of the two relationships, i.e. if the distributions of the head and tail entities corresponding to the two relationships are very similar, the two relationships are reasonably considered to be also very similar. Therefore, the similarity between two relationships can be defined based on the Kullback-Leibler divergence (KL divergence) between the distributions of head and tail entities corresponding to the two relationships:
Figure BDA0002131407970000087
wherein DKL(. |. cndot.) represents KL divergence,
Figure BDA0002131407970000083
the same principle is reversed. The function g (·, ·) represents a symmetric function, and g should be a monotonically decreasing function in order to be consistent with the meaning of "similarity". In our invention, let g (x, y) be e-max(x,y)
Since the calculation of the distribution of all head-to-tail entities for two relations involves O (epsilon)2) Is very resource consuming, and therefore the distribution is considered to be estimated using the monte carlo approximation. Calculating the KL divergence between the two head-to-tail entity distributions based on:
Figure BDA0002131407970000091
wherein DKL(. I. represents KL divergence);
Figure BDA0002131407970000096
represents the relationship r1The corresponding head and tail entities are distributed,
Figure BDA0002131407970000097
represents the relationship r2Corresponding head and tail entity distribution; h and t are respectively a head entity and a tail entity in the relation corresponding ternary relation group; theta*Is a model parameter;
Figure BDA0002131407970000095
is from
Figure BDA0002131407970000098
And (4) sampling the head and tail entity pair set.
Compared with the prior art, the embodiment of the invention provides a new relation similarity measurement mode based on head and tail entity distribution in the knowledge base, and the similarity between two relations can be determined by utilizing the information of the head and tail entities in the knowledge base. Meanwhile, the interpretability of the similarity of the two relations is enhanced by focusing on the distribution of head and tail entities of the two relations. The method provided by the embodiment of the invention can be directly expanded into the real world, for example, the method helps the open domain relation extraction task to merge the redundant relation, and is used as a component in a heuristic algorithm to optimize a relation extraction model and a relation prediction model.
On the basis of any one of the above embodiments of the invention, the theta*The following conditions are satisfied:
Figure BDA0002131407970000092
wherein
Figure BDA0002131407970000093
Is a set of relational triples, epsilon is a set of entities,
Figure BDA0002131407970000094
is a collection of relationships; and theta is a parameter model before optimization.
On the basis of any of the above embodiments of the present invention, as shown in fig. 2, there is provided a relationship similarity measurement system, including:
an obtaining module 21, configured to obtain two relationships to be compared;
the obtaining module 21 is further configured to obtain respective head-tail entity distributions corresponding to the two relationships;
and the calculating module 22 is configured to calculate KL divergence between head and tail entity distributions corresponding to the two relationships, and determine similarity between the two relationships based on the calculated KL divergence.
It should be noted that the triple corresponding to the relationship to be compared is predefined, the head-tail entity distribution calculation expression corresponding to the relationship to be compared is also defined in advance, the above-mentioned definition method may adopt a method in the prior art, and the embodiment of the present invention is not particularly limited.
It is assumed that the similarity between the distributions of the head and tail entities corresponding to the two relationships reflects the similarity between the two relationships, that is, if the distributions of the head and tail entities corresponding to the two relationships are very similar, the embodiment of the present invention reasonably considers that the two relationships are also very similar. Therefore, the embodiment of the invention measures the similarity based on the Kullback-Leibler divergence (KL divergence) between the head and tail entity distributions corresponding to the two relations.
Compared with the prior art, the embodiment of the invention provides a new relation similarity measurement mode based on head and tail entity distribution in the knowledge base, and the similarity between two relations can be determined by utilizing the information of the head and tail entities in the knowledge base. Meanwhile, the interpretability of the similarity of the two relations is enhanced by focusing on the distribution of head and tail entities of the two relations. The system provided by the embodiment of the invention can be directly expanded into the real world, for example, the system can help the open domain relation extraction task to carry out the combination of the redundant relation, and can be used as a component in a heuristic algorithm to optimize a relation extraction model and a relation prediction model.
On the basis of the foregoing embodiment of the present invention, a relationship similarity measurement system is provided, wherein the calculation module is further configured to: based on Monte Carlo simulation, calculating KL divergence between head and tail entity distributions corresponding to the two relations.
Since the calculation of the distribution of all head and tail entities of two relationships in the prior art involves the calculation of the number of all head and tail entities, which is very resource-consuming, the embodiment of the present invention considers that the KL divergence is estimated based on the monte carlo approximation.
Among them, the monte carlo method is also called a statistical simulation method and a statistical test method. The method is a numerical simulation method using a probability phenomenon as a research object. The method is a calculation method for estimating an unknown characteristic amount by obtaining a statistical value by a sampling survey method. Monte Carlo is a famous gamble in Morna, which is named to indicate its randomly sampled nature. Therefore, the method is suitable for performing calculation simulation tests on the discrete system. In computational simulation, the stochastic nature of the system can be simulated by constructing a probabilistic model that approximates the performance of the system and performing stochastic tests on a digital computer.
Figure BDA0002131407970000111
Wherein D isKL(. I. represents KL divergence);
Figure BDA0002131407970000115
represents the relationship r1The corresponding head and tail entities are distributed,
Figure BDA0002131407970000116
represents the relationship r2Corresponding head and tail entity distribution; h and t are respectively a head entity and a tail entity in the relation corresponding ternary relation group; theta*Is a model parameter;
Figure BDA0002131407970000114
is from
Figure BDA0002131407970000117
And (4) sampling the head and tail entity pair set.
Figure BDA0002131407970000112
Epsilon is the set of entities that are,
Figure BDA0002131407970000113
is a collection of relationships.
Compared with the prior art, the embodiment of the invention provides a new relation similarity measurement mode based on head and tail entity distribution in a knowledge base, and the KL divergence between all the head and tail entity distributions of two relations is estimated based on Monte Carlo approximation, so that the calculation resources are saved, and the calculation rate is improved.
An example is as follows:
fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may call logic instructions in the memory 330 to perform the following method: acquiring two relations to be compared; acquiring the head and tail entity distribution corresponding to the two relations respectively; calculating KL divergence between head and tail entity distributions corresponding to the two relations respectively, and determining similarity between the two relations based on the calculated KL divergence.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the transmission method provided in the foregoing embodiments when executed by a processor, and for example, the method includes: acquiring two relations to be compared; acquiring the head and tail entity distribution corresponding to the two relations respectively; calculating KL divergence between head and tail entity distributions corresponding to the two relations respectively, and determining similarity between the two relations based on the calculated KL divergence.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (6)

1. A method for measuring relationship similarity includes:
acquiring two relations to be compared;
acquiring the head and tail entity distribution corresponding to the two relations respectively;
calculating KL divergence between head and tail entity distributions corresponding to the two relations respectively, and determining similarity between the two relations based on the calculated KL divergence;
wherein, the step of calculating the KL divergence between the head and tail entity distributions corresponding to the two relationships further comprises:
calculating KL divergence between head and tail entity distributions corresponding to the two relations based on Monte Carlo simulation;
the step of calculating the KL divergence between the head and tail entity distributions corresponding to the two relationships based on the Monte Carlo simulation further includes:
calculating the KL divergence between the two head-to-tail entity distributions based on:
Figure FDA0002806678160000011
wherein DKL(. I. represents KL divergence);
Figure FDA0002806678160000017
represents the relationship r1The corresponding head and tail entities are distributed,
Figure FDA0002806678160000018
represents the relationship r2Corresponding head and tail entity distribution; h and t are respectively a head entity and a tail entity in the relation corresponding ternary relation group; theta*Is a model parameter;
Figure FDA0002806678160000016
is from
Figure FDA0002806678160000015
The head and tail entity pairs sampled from the middle are collected;
theta is described*The following conditions are satisfied:
Figure FDA0002806678160000012
wherein
Figure FDA0002806678160000013
Is a set of relational triples, epsilon is a set of entities,
Figure FDA0002806678160000014
is a collection of relationships; and theta is a parameter model before optimization.
2. The relationship similarity metric method according to claim 1, wherein the step of obtaining two relationships to be compared further comprises:
defining the distribution of the ternary relationship group and defining the calculation mode of the distribution of the ternary relationship group.
3. The relationship similarity measurement method according to claim 2, wherein the step of defining the distribution of the set of ternary relationships and defining the calculation manner of the distribution of the set of ternary relationships further comprises:
and calculating optimization model parameters, and optimizing the distribution of the ternary relationship group based on the optimization model parameters.
4. A relational similarity measurement system, comprising:
the acquisition module is used for acquiring two relations to be compared;
the acquisition module is further used for acquiring the head and tail entity distribution corresponding to the two relations respectively;
the calculation module is used for calculating KL divergence between head and tail entity distributions corresponding to the two relations and determining similarity between the two relations based on the calculated KL divergence;
wherein the computing module is further configured to:
calculating KL divergence between head and tail entity distributions corresponding to the two relations based on Monte Carlo simulation;
the step of calculating the KL divergence between the head and tail entity distributions corresponding to the two relationships based on the Monte Carlo simulation further includes:
calculating the KL divergence between the two head-to-tail entity distributions based on:
Figure FDA0002806678160000021
wherein DKL(. I. represents KL divergence);
Figure FDA0002806678160000026
represents the relationship r1The corresponding head and tail entities are distributed,
Figure FDA0002806678160000027
represents the relationship r2Corresponding head and tail entity distribution; h and t are respectively a head entity and a tail entity in the relation corresponding ternary relation group; theta*Is a model parameter;
Figure FDA0002806678160000022
is from
Figure FDA0002806678160000028
The head and tail entity pairs sampled from the middle are collected;
theta is described*The following conditions are satisfied:
Figure FDA0002806678160000023
wherein
Figure FDA0002806678160000024
Is a set of relational triples, epsilon is a set of entities,
Figure FDA0002806678160000025
is a collection of relationships; and theta is a parameter model before optimization.
5. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the relationship similarity measure method according to any of claims 1 to 3 when executing the program.
6. A non-transitory computer readable storage medium, having stored thereon a computer program, which, when being executed by a processor, carries out the steps of the relational similarity measure method according to any one of claims 1 to 3.
CN201910639564.2A 2019-07-16 2019-07-16 Relation similarity measurement method and system based on head-tail entity distribution in knowledge base Active CN110472233B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910639564.2A CN110472233B (en) 2019-07-16 2019-07-16 Relation similarity measurement method and system based on head-tail entity distribution in knowledge base

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910639564.2A CN110472233B (en) 2019-07-16 2019-07-16 Relation similarity measurement method and system based on head-tail entity distribution in knowledge base

Publications (2)

Publication Number Publication Date
CN110472233A CN110472233A (en) 2019-11-19
CN110472233B true CN110472233B (en) 2021-02-12

Family

ID=68508664

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910639564.2A Active CN110472233B (en) 2019-07-16 2019-07-16 Relation similarity measurement method and system based on head-tail entity distribution in knowledge base

Country Status (1)

Country Link
CN (1) CN110472233B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111506706B (en) * 2020-04-15 2022-06-17 重庆邮电大学 Relationship similarity based upper and lower meaning relationship forest construction method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425710A (en) * 2012-05-25 2013-12-04 北京百度网讯科技有限公司 Subject-based searching method and device
CN109446339A (en) * 2018-10-11 2019-03-08 广东工业大学 A kind of knowledge mapping representation method based on multicore Gaussian Profile

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425710A (en) * 2012-05-25 2013-12-04 北京百度网讯科技有限公司 Subject-based searching method and device
CN109446339A (en) * 2018-10-11 2019-03-08 广东工业大学 A kind of knowledge mapping representation method based on multicore Gaussian Profile

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
知识表示学习研究进展;刘知远 等;《计算机研究与发展》;20160229;第247-260页 *

Also Published As

Publication number Publication date
CN110472233A (en) 2019-11-19

Similar Documents

Publication Publication Date Title
CN111124840B (en) Method and device for predicting alarm in business operation and maintenance and electronic equipment
CN108197652B (en) Method and apparatus for generating information
US11416760B2 (en) Machine learning based user interface controller
CN113360711A (en) Model training and executing method, device, equipment and medium for video understanding task
CN115587543A (en) Federal learning and LSTM-based tool residual life prediction method and system
CN110472233B (en) Relation similarity measurement method and system based on head-tail entity distribution in knowledge base
CN114298299A (en) Model training method, device, equipment and storage medium based on course learning
CN115481694B (en) Data enhancement method, device and equipment for training sample set and storage medium
CN115905558A (en) Knowledge graph-based XAI model evaluation method, device, equipment and medium
CN114186168A (en) Correlation analysis method and device for intelligent city network resources
CN113313261A (en) Function processing method and device and electronic equipment
CN111160487A (en) Method and device for expanding face image data set
CN112861874A (en) Expert field denoising method and system based on multi-filter denoising result
CN110889396A (en) Energy internet disturbance classification method and device, electronic equipment and storage medium
CN115221366B (en) Method and device for identifying key nodes in urban rail transit network
CN110796561A (en) Influence maximization method and device based on three-hop velocity attenuation propagation model
CN114844889B (en) Video processing model updating method and device, electronic equipment and storage medium
CN115982455B (en) Flow adjustment method and device based on fuzzy breakpoint regression model and electronic equipment
Ji et al. An adaptive radial basis function method using weighted improvement
CN113553407B (en) Event tracing method and device, electronic equipment and storage medium
CN109255432B (en) Neural network model construction method and device, storage medium and electronic equipment
US20230004774A1 (en) Method and apparatus for generating node representation, electronic device and readable storage medium
EP4198831A1 (en) Automated feature engineering for predictive modeling using deep reinforcement learning
CN115600129A (en) Information identification method and device, electronic equipment and storage medium
CN116628495A (en) Method and device for determining importance of data source, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant