CN116955653A - Knowledge graph data enhancement method and device and electronic equipment - Google Patents

Knowledge graph data enhancement method and device and electronic equipment Download PDF

Info

Publication number
CN116955653A
CN116955653A CN202311213014.7A CN202311213014A CN116955653A CN 116955653 A CN116955653 A CN 116955653A CN 202311213014 A CN202311213014 A CN 202311213014A CN 116955653 A CN116955653 A CN 116955653A
Authority
CN
China
Prior art keywords
triplet
existing
rule
distribution
knowledge graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311213014.7A
Other languages
Chinese (zh)
Inventor
张建伟
刘靖楠
姜东基
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Capinfo Co ltd
Original Assignee
Capinfo Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Capinfo Co ltd filed Critical Capinfo Co ltd
Priority to CN202311213014.7A priority Critical patent/CN116955653A/en
Publication of CN116955653A publication Critical patent/CN116955653A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data enhancement method and device of a knowledge graph and electronic equipment, and the method and device are used for acquiring the knowledge graph to be processed; the relation between the entities is a two-way relation; the logic rule is the mapping from the relationship path to the relationship; acquiring logic rules and data distribution information by adopting a random walk mode, traversing each existing triplet in the to-be-processed knowledge graph, and determining at least one candidate triplet corresponding to each existing triplet; and calculating the score of each candidate triplet to determine the corresponding enhanced triplet, and updating the enhanced triplet into the to-be-processed knowledge graph. The method defines the logic rules in the knowledge graph to be processed, learns the logic rules and the data distribution information in the knowledge graph to be processed by adopting a random walk mode, and enhances the data of the knowledge graph to be processed by utilizing the learned logic rules and the data distribution information, so that the purposes of fully utilizing the information and improving the auxiliary prediction accuracy can be achieved.

Description

Knowledge graph data enhancement method and device and electronic equipment
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for enhancing data of a knowledge graph, and an electronic device.
Background
In recent years, knowledge maps have been rapidly developed as a large-scale database storing knowledge, and are widely used in search engines, dialogue systems, commodity recommendation, various vertical fields, and the like by means of presentation learning. However, since almost all knowledge maps have incomplete problems, the existing representation learning method cannot learn vector representations completely containing entity semantic information, which causes problems of insufficient information utilization and inaccurate auxiliary prediction when the knowledge maps are applied.
Disclosure of Invention
The invention aims to provide a data enhancement method and device for a knowledge graph and electronic equipment, so as to solve the problems of insufficient information utilization and inaccurate auxiliary prediction of the knowledge graph.
The invention provides a data enhancement method of a knowledge graph, which comprises the following steps: acquiring a knowledge graph to be processed; the relation between the entities in the knowledge graph to be processed is a two-way relation; the logic rule in the knowledge graph to be processed is the mapping from the relation path to the relation; a random walk mode is adopted to acquire logic rules and data distribution information; wherein the data distribution information includes: the distribution of the relation in the knowledge graph to be processed, the distribution of the relation path, the distribution of the head entity and the tail entity under the existing relation path, and the distribution of the rule under the existing triplet; traversing each existing triplet in the knowledge graph to be processed according to the logic rule and the data distribution information, and determining at least one candidate triplet corresponding to the existing triplet aiming at each existing triplet; and calculating the score of each candidate triplet, determining the enhanced triplet corresponding to the existing triplet according to the score of each candidate triplet, and updating the enhanced triplet into the to-be-processed knowledge graph.
Further, the step of obtaining logic rules and data distribution information by adopting a random walk mode comprises the following steps: obtaining logic rules by adopting a random walk mode; traversing each existing triplet in the knowledge graph to be processed, and aiming at each existing triplet, obtaining at least one path starting from a head entity of the existing triplet and ending at a tail entity of the existing triplet by adopting a random walk mode; extracting a rule body from at least one obtained path to obtain the distribution of the rule body under the existing triplet; updating the distribution of the relation in the to-be-processed knowledge graph, the distribution of the relation path and the distribution of the head entity and the tail entity under the existing relation path.
Further, the step of updating the distribution of the relationship in the knowledge graph to be processed, the distribution of the relationship path, and the distribution of the head entity and the tail entity under the existing relationship path includes: updating the frequency of each rule head, the frequency of each rule body, the frequency of each rule head under each rule body, the frequency of each rule body head entity under each rule body and the frequency of a tail entity based on the distribution of the rule bodies under the existing triplet; after traversing each existing triplet, carrying out normalization processing on the frequency of each rule head, the frequency of each rule body and the frequency of each rule head under the rule body, the frequency of each rule body under the head entity and the frequency of the tail entity respectively to obtain the distribution of the relation, the distribution of the relation path and the distribution of the head entity and the tail entity under the existing relation path in the knowledge graph to be processed.
Further, for each existing triplet, the step of determining at least one candidate triplet corresponding to the existing triplet includes: querying the distribution of rule bodies under each existing triplet aiming at each existing triplet; querying the distribution of the head entity, the distribution of the tail entity and the distribution of the rule head under the rule body aiming at each rule body under the existing triplet; and forming at least one candidate triplet according to the rule head, the head entity and the tail entity corresponding to all rule bodies under the existing triplet.
Further, the step of calculating a score for each candidate triplet includes:
the score for each candidate triplet is calculated using the following formula:
wherein, the method comprises the following steps ofx,r,y) Representing the existing triplet;xrepresenting a header entity in the existing triplet;yrepresenting the tail entity in the existing triplet;rrepresenting header entitiesxWith tail entityyBetween which are locatedIs a relationship of (2); (ht) Representing the candidate triplet;hrepresenting a head entity in the candidate triplet;trepresenting tail entities in the candidate triples; />Representing header entitieshAnd tail entitytA relationship between; />Representing the first under the existing tripletiFrequency of individual rule body,/->Represent the firstiFrequency of individual rule bodies under global statistics, +. >Represent the firstiFrequency of jth rule head under individual rule,/-)>Indicating the frequency of the jth rule header under global statistics,/->Representing under the current rule bodykFrequency of individual head-to-tail entity pairs.
Further, the step of determining an enhanced triplet corresponding to the existing triplet based on the score of each candidate triplet includes: and determining the candidate triples with the highest scores as enhancement triples corresponding to the existing triples.
The invention provides a data enhancement device of a knowledge graph, which comprises: the first acquisition module is used for acquiring a knowledge graph to be processed; the relation between the entities in the knowledge graph to be processed is a two-way relation; the logic rule in the knowledge graph to be processed is the mapping from the relation path to the relation; the second acquisition module is used for acquiring logic rules and data distribution information by adopting a random walk mode; wherein the data distribution information includes: the distribution of the relation in the knowledge graph to be processed, the distribution of the relation path, the distribution of the head entity and the tail entity under the existing relation path, and the distribution of the rule under the existing triplet; the traversal module is used for traversing each existing triplet in the to-be-processed knowledge graph according to the logic rule and the data distribution information, and determining at least one candidate triplet corresponding to the existing triplet aiming at each existing triplet; and the calculation module is used for calculating the score of each candidate triplet, determining the enhanced triplet corresponding to the existing triplet according to the score of each candidate triplet, and updating the enhanced triplet into the to-be-processed knowledge graph.
Further, the second acquisition module is further configured to: obtaining logic rules by adopting a random walk mode; traversing each existing triplet in the knowledge graph to be processed, and aiming at each existing triplet, obtaining at least one path starting from a head entity of the existing triplet and ending at a tail entity of the existing triplet by adopting a random walk mode; extracting a rule body from at least one obtained path to obtain the distribution of the rule body under the existing triplet; updating the distribution of the relation in the to-be-processed knowledge graph, the distribution of the relation path and the distribution of the head entity and the tail entity under the existing relation path.
The invention provides an electronic device, which comprises a processor and a memory, wherein the memory stores machine executable instructions which can be executed by the processor, and the processor executes the machine executable instructions to realize the data enhancement method of the knowledge graph of any one of the above.
The present invention provides a machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement a data enhancement method of a knowledge-graph of any of the above.
The invention provides a data enhancement method and device for a knowledge graph and electronic equipment, and the knowledge graph to be processed is obtained; the relation between the entities in the knowledge graph to be processed is a two-way relation; the logic rule in the knowledge graph to be processed is the mapping from the relation path to the relation; a random walk mode is adopted to acquire logic rules and data distribution information; wherein the data distribution information includes: the distribution of the relation in the knowledge graph to be processed, the distribution of the relation path, the distribution of the head entity and the tail entity under the existing relation path, and the distribution of the rule under the existing triplet; traversing each existing triplet in the knowledge graph to be processed according to the logic rule and the data distribution information, and determining at least one candidate triplet corresponding to the existing triplet aiming at each existing triplet; and calculating the score of each candidate triplet, determining the enhanced triplet corresponding to the existing triplet according to the score of each candidate triplet, and updating the enhanced triplet into the to-be-processed knowledge graph. The method defines the logic rules in the knowledge graph to be processed, learns the logic rules and the data distribution information in the knowledge graph to be processed by adopting a random walk mode, and enhances the data of the knowledge graph to be processed by utilizing the learned logic rules and the data distribution information, so that the purposes of fully utilizing the information and improving the auxiliary prediction accuracy can be achieved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a data enhancement method of a knowledge graph according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a random walk algorithm according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a rule learning algorithm according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data enhancement device for knowledge graph according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be clearly and completely described in connection with the embodiments, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Currently, a knowledge graph is used as a structured database for storing knowledge, and the basic constituent elements of the knowledge graph are entities and relationships between the entities. The basic form of knowledge organization in the knowledge graph is triads, namely (head entity, relation, tail entity), each pair of triads represents a piece of knowledge in the real world, for example (Zhou Shu, author, manic diary) represents Zhou Shuren is the author of the work manic diary. Knowledge maps are widely applied to many internet information fields, such as semantic search, semantic question-answering, commodity recommendation and the like, and the application modes are all based on a method for representing learning, so that entities and relations are converted into semantic vectors to serve downstream tasks. However, since knowledge graphs widely have incomplete problems, the existing knowledge graph-oriented expression learning algorithm cannot fully learn semantic information contained in each entity, so that the problems of insufficient information utilization, inaccurate precision and the like exist in the application field. Based on the above, the embodiment of the invention provides a data enhancement method and device for a knowledge graph and electronic equipment, and the technology can be applied to applications requiring data enhancement for the knowledge graph.
For the convenience of understanding the present embodiment, first, a data enhancement method for a knowledge graph disclosed in the present embodiment is described, as shown in fig. 1, where the method includes the following steps:
step S102, acquiring a knowledge graph to be processed; the relation between the entities in the knowledge graph to be processed is a two-way relation; the logic rule in the knowledge graph to be processed is the mapping from the relationship path to the relationship.
The knowledge graph to be processed is essentially a semantic network for revealing the relationship between entities, and can formally describe things in the real world and the interrelationships thereof; in actual implementation, the knowledge graph to be processed can be converted into graph data, wherein the entities in the knowledge graph to be processed are used as nodes of the graph, and the relationships among the entities are used as edges of the graph. As the edges in the to-be-processed knowledge graph are directed edges, the embodiment of the disclosure adds reverse edges to all edges on the graph, namely the relationship between the entities is a two-way relationship, so that the information of the to-be-processed knowledge graph can be richer.
In this embodiment, the logic rule in the knowledge graph to be processed is defined as a mapping from a relationship path to a relationship, where the mapping is one-to-one, or one-to-many, in general, one-to-many, i.e., one relationship path, corresponding to a plurality of relationships, and each relationship has a corresponding score, where the score may be used to indicate a probability that the relationship is true, i.e., the higher the score, the greater the likelihood that the relationship is true, and the lower the score, the likelihood that the relationship is true. The logic rules of the knowledge-graph to be processed are abstracted from the facts in the knowledge-graph to be processed, and formally defined as:
Wherein, is calledIs a regular body, is->In the form of a regular head,for the relation in the knowledge graph to be processed, +.>Is an abstract collection of entities in a relationship. The fact ∈in knowledge graph>And->One example of such a relationship is respectively. The rule body corresponds to a path between a head entity and a tail entity of the rule head.
Step S104, a random walk mode is adopted to acquire logic rules and data distribution information; wherein the data distribution information includes: the distribution of the relationship in the knowledge graph to be processed, the distribution of the relationship path, the distribution of the head entity and the tail entity under the existing relationship path, and the distribution of the rule body under the existing triplet.
Random walk refers to randomly selecting the next position according to probability distribution, namely randomly walking in unordered paths. The existing relationship path can be understood as the existing relationship path in the knowledge graph to be processed; the existing triples can be understood as triples which are actually existing in the knowledge graph to be processed; in actual implementation, random walk can be performed in the knowledge graph to be processed, logic rules in the knowledge graph to be processed are learned, and distribution of relations in the knowledge graph to be processed, distribution of relation paths, distribution of head entities and tail entities under existing relation paths and distribution of rule bodies under existing triples are obtained.
And S106, traversing each existing triplet in the to-be-processed knowledge graph according to the logic rule and the data distribution information, and determining at least one candidate triplet corresponding to the existing triplet aiming at each existing triplet.
The candidate triples may be valid triples or invalid triples; when in actual implementation, traversing each existing triplet in the knowledge graph to be processed by utilizing the learned logic rule and data distribution information to obtain candidate triples corresponding to each existing triplet respectively; where each existing triplet may correspond to one candidate triplet or to multiple candidate triples.
And S108, calculating the score of each candidate triplet, determining the enhanced triplet corresponding to the existing triplet according to the score of each candidate triplet, and updating the enhanced triplet into the to-be-processed knowledge graph.
The score of the candidate triplet may be used to indicate the probability that the candidate triplet is true, e.g., the higher the score, the higher the probability that the candidate triplet is true, the lower the score, the lower the probability that the candidate triplet is true; in actual implementation, a score may be calculated for each existing triplet, and an enhancement triplet corresponding to the existing triplet may be determined according to the score, for example, a candidate triplet with the highest score may be determined as an enhancement triplet, and after the enhancement triplet is obtained, the enhancement triplet may be supplemented to the to-be-processed knowledge graph, so as to implement data enhancement of the to-be-processed knowledge graph.
The data enhancement method of the knowledge graph acquires the knowledge graph to be processed; the relation between the entities in the knowledge graph to be processed is a two-way relation; the logic rule in the knowledge graph to be processed is the mapping from the relation path to the relation; a random walk mode is adopted to acquire logic rules and data distribution information; wherein the data distribution information includes: the distribution of the relation in the knowledge graph to be processed, the distribution of the relation path, the distribution of the head entity and the tail entity under the existing relation path, and the distribution of the rule under the existing triplet; traversing each existing triplet in the knowledge graph to be processed according to the logic rule and the data distribution information, and determining at least one candidate triplet corresponding to the existing triplet aiming at each existing triplet; and calculating the score of each candidate triplet, determining the enhanced triplet corresponding to the existing triplet according to the score of each candidate triplet, and updating the enhanced triplet into the to-be-processed knowledge graph. The method defines the logic rules in the knowledge graph to be processed, learns the logic rules and the data distribution information in the knowledge graph to be processed by adopting a random walk mode, and enhances the data of the knowledge graph to be processed by utilizing the learned logic rules and the data distribution information, so that the purposes of fully utilizing the information and improving the auxiliary prediction accuracy can be achieved.
The embodiment of the invention also provides another data enhancement method of the knowledge graph, which is realized on the basis of the method of the embodiment, and comprises the following steps:
step one, acquiring a knowledge graph to be processed; the relation between the entities in the knowledge graph to be processed is a two-way relation; the logic rule in the knowledge graph to be processed is the mapping from the relationship path to the relationship.
And step two, acquiring logic rules by adopting a random walk mode.
Traversing each existing triplet in the to-be-processed knowledge graph, and obtaining at least one path starting from a head entity of the existing triplet and ending at a tail entity of the existing triplet by adopting a random walk mode aiming at each existing triplet.
And step four, extracting the rule body from at least one obtained path to obtain the distribution of the rule body under the existing triplet.
In actual implementation, a random walk mode can be adopted to learn logic rules in the knowledge graph to be processed; for each existing triplet in the knowledge-graph to be processed, the purpose of performing random walk is to find one or more paths from the head entity of the current existing triplet and ending in the tail entity of the current existing triplet, such paths can be used as rule heads, and the relation of the current existing triplet can be used as rule bodies. In each step of the random walk, the probability of going from the current entity to all the tail entities along the direction of the relation is the same, the current existing triples are eliminated in the first step of the random walk, and the entities which do not walk back to the previous step are set in the second and subsequent steps of the random walk. For each existing triplet in the knowledge graph to be processed, the embodiment sets the number of paths to be obtained as M, the value of M can be specifically set according to actual requirements, and M paths meeting the conditions can not be obtained by performing random walk in the knowledge graph to be processed, so that the embodiment sets the maximum random walk times for each side as M, namely the set circulation times for each side as M. The random walk process of this embodiment can thus be completed, with reference to a schematic of a random walk algorithm as shown in fig. 2; figure 2 shows a pseudo-code for a random walk for each existing triplet in the knowledge-graph, from which the random walk process can be completed.
The embodiment of the disclosure learns logic rules in the knowledge graph to be processed in a random walk mode. Traversing each existing triplet in the knowledge graph, obtaining paths from a head entity of the current existing triplet to a tail entity of the current triplet for the current existing triplet in a random walk mode, sampling rule bodies from the paths, recording distribution of rule bodies under the current existing triplet, and obtaining the path lengthFor example, the distribution of rule bodies under the current existing triplet is:
wherein, the liquid crystal display device comprises a liquid crystal display device,for the currently existing triplet ++>For a rule body obtained by random walk under the current existing triplet +.>Is the frequency of this rule; k= … p; />Is the total number of rule sets under the current triplet. The values of i and j are related to the path length, for example, path length +.>I=1, j=2.
And fifthly, updating the distribution of the relation in the to-be-processed knowledge graph, the distribution of the relation path and the distribution of the head entity and the tail entity under the existing relation path.
The fifth step can be specifically realized by the following steps A and B:
and step A, updating the frequency of each rule head, the frequency of each rule body, the frequency of each rule head under each rule body, the frequency of each rule body head entity under each rule body and the frequency of a tail entity based on the distribution of the rule bodies under the existing triplet.
And B, after traversing each existing triplet, carrying out normalization processing on the frequency of each rule head, the frequency of each rule body and the frequency of each rule head under the rule body, the frequency of each rule body under the head entity and the frequency of the tail entity respectively to obtain the distribution of the relation in the knowledge graph to be processed, the distribution of the relation path and the distribution of the head entity and the tail entity under the existing relation path.
When the distribution of the rule bodies under the existing triplet is obtained, the frequency number of each rule head, the frequency number of each rule body and the frequency number of the rule head under each rule body can be updated simultaneously. After all the existing triples are traversed, the frequency number of each rule head (referring to the frequency of each rule head under the global) and the frequency number of each rule body (referring to the frequency of each rule body under the global) are converted into probability distribution in a normalized mode, and the distribution of the relationship is obtained as follows:
Wherein, the liquid crystal display device comprises a liquid crystal display device,is the first part of the knowledge graph>Seed relation (s)/(s)>For its frequency +.>Is the total number of species of the relationship.
The distribution of all relationship paths is:
wherein, the liquid crystal display device comprises a liquid crystal display device,is a rule->Is the frequency of the rule; it should be noted that the values of α and β are related to the path length, e.g., path length +.>Then α=1, β=2.
The distribution of head and tail entities under a given relationship path is:
wherein, the liquid crystal display device comprises a liquid crystal display device,for the current relationship path, +.>Is the +.>For head-tail entity, let us go of>For its frequency +.>Is the total number of head and tail entity pairs under the current relation.
The rule map is:
wherein, the liquid crystal display device comprises a liquid crystal display device,for regular head->For the distribution of the rule under the rule head, +.>For the rule total seed number,/->The number of the rule head is the number of the rule head. In particular, reference may be made to a schematic diagram of a rule learning algorithm as shown in fig. 3; the pseudo code in fig. 3 can be used to traverse the knowledge graph to establish a rule mapping from the relationship path to the relationship in the knowledge graph, and obtain the relationship in the knowledge graph, the relationship path, the distribution of the head and tail entities under the given relationship path, and the distribution of the rule under the given triplet.
Step six, traversing each existing triplet in the knowledge graph to be processed according to the logic rule and the data distribution information, and inquiring the distribution of rule bodies under each existing triplet aiming at each existing triplet.
And step seven, inquiring the distribution of the head entities, the distribution of the tail entities and the distribution of the rule heads under the rule body aiming at each rule body under the existing triplet.
And step eight, forming at least one candidate triplet according to rule heads, head entities and tail entities corresponding to all rule bodies under the existing triplet.
Traversing the existing triples of the to-be-processed knowledge-graph, for each existing triplet, inquiring the distribution of rule bodies under the current existing triplet, for each rule body, inquiring the distribution of head and tail entities and rule heads under the rule body, and forming one or more candidate triples by the rule heads and head and tail entities corresponding to all rule bodies under the current existing triplet. The candidate triples are typically not present in the existing triples, and some non-edges may be added to the existing triples, e.g., there may be no connection between the two entities, and the candidate triples may add to this relationship. Each existing triplet may obtain a plurality of candidate triples, and the available valid and invalid of the obtained plurality of candidate triples need to be scored and determined according to a following calculation formula.
Step nine, calculating the score of each candidate triplet by adopting the following formula:
Wherein, the method comprises the following steps ofx,r,y) Representing the existing triplet;xrepresenting a header entity in the existing triplet;yrepresenting the tail entity in the existing triplet;rrepresenting header entitiesxWith tail entityyA relationship between; (ht) Representing the candidate triplet;hrepresenting a head entity in the candidate triplet;trepresenting tail entities in the candidate triples; />Representing header entitieshAnd tail entitytA relationship between; />Representing the first under the existing tripletiFrequency of individual rule body,/->Represent the firstiFrequency of individual rule bodies under global statistics, +.>Represent the firstiFrequency of jth rule head under individual rule,/-)>Represents the jthFrequency of rule head under global statistics, +.>Representing under the current rule bodykFrequency of individual head-to-tail entity pairs.
From the above, the inverse document frequency can be used to calculate the score of each candidate triplet, where the score is obtained by multiplying the score of the rule under the existing triplet (i.e., frequency), the score of the rule head under the rule (i.e., frequency), and the score of the head-tail entity pair under the rule (i.e., frequency).
And step ten, determining the candidate triples with the highest scores as enhanced triples corresponding to the existing triples, and updating the enhanced triples into the to-be-processed knowledge graph.
After the calculation of the scores of all the candidate triples is completed, the candidate triples with the highest score ranking can be selected as enhanced triples for entity enhancement corresponding to the existing triples. After the enhanced triples are obtained, the enhanced triples and the existing triples are combined to be used as training data, and the knowledge graph vector representation obtained through training can be used for downstream tasks; the vector representation obtained through training can effectively overcome the problem of incompleteness of the to-be-processed knowledge graph, and further improves the effect of the to-be-processed knowledge graph in the application field.
According to the data enhancement method of the knowledge graph, logic rules in the knowledge graph to be processed are defined, the logic rules and the data distribution information are learned in a random walk mode, and similar facts are found to be used as enhancement data according to the obtained logic rules and the data distribution information. The learning and application of the logic rules can fully mine potential semantic information in the knowledge graph to be processed, can fully make up for the defect of incomplete knowledge graph to be processed, and can effectively improve the efficiency and accuracy of application of the knowledge graph to be processed.
The method comprises the steps of abstracting a to-be-processed knowledge graph into graph data, defining logic rules in the to-be-processed knowledge graph, learning the to-be-processed knowledge graph by using a random walk method on the graph, and enhancing the existing knowledge in the to-be-processed knowledge graph by using the learned logic rules so as to achieve the purposes of fully utilizing entity semantic information and improving precision, thereby solving the problems of insufficient information utilization and inaccurate auxiliary prediction caused by incompleteness in large-scale knowledge graph application.
The embodiment of the invention provides a data enhancement device of a knowledge graph, as shown in fig. 4, the device comprises: a first obtaining module 40, configured to obtain a knowledge graph to be processed; the relation between the entities in the knowledge graph to be processed is a two-way relation; the logic rule in the knowledge graph to be processed is the mapping from the relation path to the relation; a second obtaining module 41, configured to obtain logic rules and data distribution information by adopting a random walk manner; wherein the data distribution information includes: the distribution of the relation in the knowledge graph to be processed, the distribution of the relation path, the distribution of the head entity and the tail entity under the existing relation path, and the distribution of the rule under the existing triplet; a traversing module 42, configured to traverse each existing triplet in the to-be-processed knowledge graph according to the logic rule and the data distribution information, and determine, for each existing triplet, at least one candidate triplet corresponding to the existing triplet; and the calculating module 43 is configured to calculate a score of each candidate triplet, determine an enhanced triplet corresponding to the existing triplet according to the score of each candidate triplet, and update the enhanced triplet to the knowledge graph to be processed.
The data enhancement device of the knowledge graph acquires the knowledge graph to be processed; the relation between the entities in the knowledge graph to be processed is a two-way relation; the logic rule in the knowledge graph to be processed is the mapping from the relation path to the relation; a random walk mode is adopted to acquire logic rules and data distribution information; wherein the data distribution information includes: the distribution of the relation in the knowledge graph to be processed, the distribution of the relation path, the distribution of the head entity and the tail entity under the existing relation path, and the distribution of the rule under the existing triplet; traversing each existing triplet in the knowledge graph to be processed according to the logic rule and the data distribution information, and determining at least one candidate triplet corresponding to the existing triplet aiming at each existing triplet; and calculating the score of each candidate triplet, determining the enhanced triplet corresponding to the existing triplet according to the score of each candidate triplet, and updating the enhanced triplet into the to-be-processed knowledge graph. The device defines the logic rules in the knowledge graph to be processed, learns the logic rules and the data distribution information in the knowledge graph to be processed by adopting a random walk mode, and enhances the data of the knowledge graph to be processed by utilizing the learned logic rules and the data distribution information, so that the purposes of fully utilizing the information and improving the auxiliary prediction accuracy can be achieved.
Further, the method comprises the steps of, the second acquisition module 41 is further configured to: obtaining logic rules by adopting a random walk mode; traversing each existing triplet in the knowledge graph to be processed, and aiming at each existing triplet, obtaining at least one path starting from a head entity of the existing triplet and ending at a tail entity of the existing triplet by adopting a random walk mode; extracting a rule body from at least one obtained path to obtain the distribution of the rule body under the existing triplet; updating the distribution of the relation in the to-be-processed knowledge graph, the distribution of the relation path and the distribution of the head entity and the tail entity under the existing relation path.
Further, the second obtaining module 41 is further configured to: updating the frequency of each rule head, the frequency of each rule body, the frequency of each rule head under each rule body, the frequency of each rule body head entity under each rule body and the frequency of a tail entity based on the distribution of the rule bodies under the existing triplet; after traversing each existing triplet, carrying out normalization processing on the frequency of each rule head, the frequency of each rule body and the frequency of each rule head under the rule body, the frequency of each rule body under the head entity and the frequency of the tail entity respectively to obtain the distribution of the relation, the distribution of the relation path and the distribution of the head entity and the tail entity under the existing relation path in the knowledge graph to be processed.
Further, the traversing module 42 is further configured to: querying the distribution of rule bodies under each existing triplet aiming at each existing triplet; querying the distribution of the head entity, the distribution of the tail entity and the distribution of the rule head under the rule body aiming at each rule body under the existing triplet; and forming at least one candidate triplet according to the rule head, the head entity and the tail entity corresponding to all rule bodies under the existing triplet.
Further, the calculating module 43 is further configured to: the score for each candidate triplet is calculated using the following formula:
wherein, the method comprises the following steps ofx,r,y) Representing the existing triplet;xrepresenting a header entity in the existing triplet;yrepresenting the tail entity in the existing triplet;rrepresenting header entitiesxWith tail entityyA relationship between; (ht) Representing the candidate triplet;hrepresenting a head entity in the candidate triplet;trepresenting tail entities in the candidate triples; />Representing header entitieshAnd tail entitytA relationship between; />Representing the first under the existing tripletiFrequency of individual rule body,/->Represent the firstiFrequency of individual rule bodies under global statistics, +.>Represent the firstiFrequency of jth rule head under individual rule,/-)>Indicating the frequency of the jth rule header under global statistics,/- >Representing under the current rule bodykFrequency of individual head-to-tail entity pairs.
Further, the calculating module 43 is further configured to: and determining the candidate triples with the highest scores as enhancement triples corresponding to the existing triples.
The data enhancement device for the knowledge graph provided by the embodiment of the invention has the same implementation principle and technical effects as those of the data enhancement method embodiment of the knowledge graph, and for brief description, reference can be made to corresponding contents in the data enhancement method embodiment of the knowledge graph where the data enhancement device embodiment of the knowledge graph is not mentioned.
The embodiment of the present invention further provides an electronic device, as shown in fig. 5, where the electronic device includes a processor 130 and a memory 131, where the memory 131 stores machine executable instructions that can be executed by the processor 130, and the processor 130 executes the machine executable instructions to implement the data enhancement method of the knowledge graph.
Further, the electronic device shown in fig. 5 further includes a bus 132 and a communication interface 133, and the processor 130, the communication interface 133, and the memory 131 are connected through the bus 132.
The memory 131 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the system network element and at least one other network element is implemented via at least one communication interface 133 (which may be wired or wireless), and may use the internet, a wide area network, a local network, a metropolitan area network, etc. Bus 132 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 5, but not only one bus or type of bus.
The processor 130 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in processor 130. The processor 130 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 131, and the processor 130 reads the information in the memory 131, and in combination with its hardware, performs the steps of the method of the foregoing embodiment.
The embodiment of the invention also provides a machine-readable storage medium, which stores machine-executable instructions that, when being called and executed by a processor, cause the processor to implement the data enhancement method of the knowledge graph, and the specific implementation can be referred to the method embodiment and will not be repeated herein.
The data enhancement method and apparatus for a knowledge graph and the computer program product of an electronic device provided in the embodiments of the present invention include a computer readable storage medium storing program codes, where the instructions included in the program codes may be used to execute the method described in the foregoing method embodiments, and specific implementation may refer to the method embodiments and will not be repeated herein.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. A method for enhancing data of a knowledge graph, the method comprising:
acquiring a knowledge graph to be processed; the relationship between the entities in the knowledge graph to be processed is a bidirectional relationship; the logic rule in the knowledge graph to be processed is the mapping from the relation path to the relation;
the logic rule and the data distribution information are acquired by adopting a random walk mode; wherein the data distribution information includes: the distribution of the relation in the knowledge graph to be processed, the distribution of the relation path, the distribution of the head entity and the tail entity under the existing relation path and the distribution of the rule under the existing triplet;
Traversing each existing triplet in the knowledge graph to be processed according to the logic rule and the data distribution information, and determining at least one candidate triplet corresponding to each existing triplet;
and calculating the score of each candidate triplet, determining an enhanced triplet corresponding to the existing triplet according to the score of each candidate triplet, and updating the enhanced triplet into the to-be-processed knowledge graph.
2. The method of claim 1, wherein the step of obtaining the logic rules and data distribution information using a random walk comprises:
the logic rule is obtained by adopting a random walk mode;
traversing each existing triplet in the knowledge graph to be processed, and aiming at each existing triplet, obtaining at least one path starting from a head entity of the existing triplet and ending at a tail entity of the existing triplet by adopting a random walk mode;
extracting rule bodies from the at least one obtained path to obtain the distribution of rule bodies under the existing triplet;
updating the distribution of the relation in the to-be-processed knowledge graph, the distribution of the relation path and the distribution of the head entity and the tail entity under the existing relation path.
3. The method according to claim 2, wherein the step of updating the distribution of the relationships in the knowledge-graph to be processed, the distribution of the relationship paths, the distribution of the head entities and the tail entities under the existing relationship paths comprises:
updating the frequency of each rule head, the frequency of each rule body, the frequency of each rule head under each rule body, the frequency of each rule body head entity under each rule body and the frequency of a tail entity based on the distribution of the rule bodies under the existing triplet;
after traversing each existing triplet, carrying out normalization processing on the frequency of each rule head, the frequency of each rule body and the frequency of each rule head under the rule body, the frequency of each rule body under the head entity and the frequency of the tail entity respectively to obtain the distribution of the relation, the distribution of the relation path and the distribution of the head entity and the tail entity under the existing relation path in the knowledge graph to be processed.
4. The method of claim 1, wherein for each of the existing triples, determining at least one candidate triplet corresponding to the existing triplet comprises:
querying the distribution of rule bodies under each existing triplet aiming at each existing triplet;
Querying the distribution of the head entity, the distribution of the tail entity and the distribution of the rule head under the rule body aiming at each rule body under the existing triplet;
and forming at least one candidate triplet according to the rule head, the head entity and the tail entity corresponding to all rule bodies under the existing triplet.
5. The method of claim 1, wherein the step of calculating a score for each of the candidate triples comprises:
the score for each of the candidate triples is calculated using the following formula:
wherein, the method comprises the following steps ofx,r,y) Representing the existing triplet;xrepresenting a header entity in the existing triplet;yrepresenting the tail entity in the existing triplet;rrepresenting header entitiesxWith tail entityyA relationship between; (ht) Representing the candidate triplet;hrepresenting a head entity in the candidate triplet;trepresenting tail entities in the candidate triples; />Representing header entitieshAnd tail entitytA relationship between; />Representing the first under the existing tripletiFrequency of individual rule body,/->Represent the firstiFrequency of individual rule bodies under global statistics, +.>Represent the firstiFrequency of jth rule head under individual rule,/-)>Representing the frequency of the jth rule header under global statistics,representing under the current rule body kFrequency of individual head-to-tail entity pairs.
6. The method of claim 1, wherein determining an enhanced triplet corresponding to each of the candidate triples based on the scores of the existing triples comprises:
and determining the candidate triples with the highest scores as enhancement triples corresponding to the existing triples.
7. A data enhancement device for a knowledge graph, the device comprising:
the first acquisition module is used for acquiring a knowledge graph to be processed; the relationship between the entities in the knowledge graph to be processed is a bidirectional relationship; the logic rule in the knowledge graph to be processed is the mapping from the relation path to the relation;
the second acquisition module is used for acquiring the logic rule and the data distribution information by adopting a random walk mode; wherein the data distribution information includes: the distribution of the relation in the knowledge graph to be processed, the distribution of the relation path, the distribution of the head entity and the tail entity under the existing relation path and the distribution of the rule under the existing triplet;
the traversing module is used for traversing each existing triplet in the knowledge graph to be processed according to the logic rule and the data distribution information, and determining at least one candidate triplet corresponding to the existing triplet aiming at each existing triplet;
And the calculation module is used for calculating the score of each candidate triplet, determining the enhanced triplet corresponding to the existing triplet according to the score of each candidate triplet, and updating the enhanced triplet into the to-be-processed knowledge graph.
8. The apparatus of claim 7, wherein the second acquisition module is further configured to:
the logic rule is obtained by adopting a random walk mode;
traversing each existing triplet in the knowledge graph to be processed, and aiming at each existing triplet, obtaining at least one path starting from a head entity of the existing triplet and ending at a tail entity of the existing triplet by adopting a random walk mode;
extracting rule bodies from the at least one obtained path to obtain the distribution of rule bodies under the existing triplet;
updating the distribution of the relation in the to-be-processed knowledge graph, the distribution of the relation path and the distribution of the head entity and the tail entity under the existing relation path.
9. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor, the processor executing the machine executable instructions to implement the data enhancement method of the knowledge-graph of any one of claims 1-6.
10. A machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement the data enhancement method of the knowledge-graph of any one of claims 1-6.
CN202311213014.7A 2023-09-20 2023-09-20 Knowledge graph data enhancement method and device and electronic equipment Pending CN116955653A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311213014.7A CN116955653A (en) 2023-09-20 2023-09-20 Knowledge graph data enhancement method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311213014.7A CN116955653A (en) 2023-09-20 2023-09-20 Knowledge graph data enhancement method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN116955653A true CN116955653A (en) 2023-10-27

Family

ID=88460464

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311213014.7A Pending CN116955653A (en) 2023-09-20 2023-09-20 Knowledge graph data enhancement method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN116955653A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417171A (en) * 2020-11-23 2021-02-26 南京大学 Data augmentation method for knowledge graph representation learning
CN112597316A (en) * 2020-12-30 2021-04-02 厦门渊亭信息科技有限公司 Interpretable reasoning question-answering method and device
US20210241050A1 (en) * 2020-02-03 2021-08-05 Samsung Electronics Co., Ltd. System and method for efficient multi-relational entity understanding and retreival
CN114399049A (en) * 2022-03-22 2022-04-26 中国科学院自动化研究所 Automatic completion method for knowledge graph
CN115795042A (en) * 2022-09-26 2023-03-14 中国计量大学 Knowledge graph completion method based on path and graph context
CN116166812A (en) * 2022-12-12 2023-05-26 中国科学院自动化研究所 Knowledge graph completion method and device, electronic equipment and storage medium
CN116226396A (en) * 2023-01-10 2023-06-06 北京理工大学 Time sequence knowledge graph reasoning method based on logic rule and relation multiple coding

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210241050A1 (en) * 2020-02-03 2021-08-05 Samsung Electronics Co., Ltd. System and method for efficient multi-relational entity understanding and retreival
CN112417171A (en) * 2020-11-23 2021-02-26 南京大学 Data augmentation method for knowledge graph representation learning
CN112597316A (en) * 2020-12-30 2021-04-02 厦门渊亭信息科技有限公司 Interpretable reasoning question-answering method and device
CN114399049A (en) * 2022-03-22 2022-04-26 中国科学院自动化研究所 Automatic completion method for knowledge graph
CN115795042A (en) * 2022-09-26 2023-03-14 中国计量大学 Knowledge graph completion method based on path and graph context
CN116166812A (en) * 2022-12-12 2023-05-26 中国科学院自动化研究所 Knowledge graph completion method and device, electronic equipment and storage medium
CN116226396A (en) * 2023-01-10 2023-06-06 北京理工大学 Time sequence knowledge graph reasoning method based on logic rule and relation multiple coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
史丹: "基于强化学习的知识图谱多跳推理方法研究", 中国优秀硕士学位论文全文数据库 信息科技辑, no. 1, pages 138 - 4004 *

Similar Documents

Publication Publication Date Title
CN107291945B (en) High-precision clothing image retrieval method and system based on visual attention model
CN110928992B (en) Text searching method, device, server and storage medium
CN112085056B (en) Target detection model generation method, device, equipment and storage medium
CN111651641B (en) Graph query method, device and storage medium
CN111274981B (en) Target detection network construction method and device and target detection method
US9690969B2 (en) Information processing apparatus, non-transitory computer readable medium, and information processing method
WO2023116632A1 (en) Video instance segmentation method and apparatus based on spatio-temporal memory information
CN106202224B (en) Search processing method and device
US20240029397A1 (en) Few-shot image recognition method and apparatus, device, and storage medium
CN105320764A (en) 3D model retrieval method and 3D model retrieval apparatus based on slow increment features
CN110543879A (en) SSD target detection method based on SE module and computer storage medium
CN113111580A (en) Method for constructing battery SOH evaluation model and method for evaluating battery SOH value
CN111475736A (en) Community mining method, device and server
Zhu et al. Efficient multi-class semantic segmentation of high resolution aerial imagery with dilated linknet
Chen et al. Approximation algorithms for 1-Wasserstein distance between persistence diagrams
CN112541515A (en) Model training method, driving data processing method, device, medium and equipment
CN113033194A (en) Training method, device, equipment and storage medium of semantic representation graph model
CN116955653A (en) Knowledge graph data enhancement method and device and electronic equipment
CN110705695B (en) Method, device, equipment and storage medium for searching model structure
CN113012752A (en) Alpha transmembrane protein secondary and topological structure prediction method and system
Garcia-Espinosa et al. Automatic annotation for weakly supervised pedestrian detection
CN113299345B (en) Virus gene classification method and device and electronic equipment
CN115641430B (en) Method, device, medium and computer equipment for determining interest surface
CN117809025B (en) Attention network-based target tracking method, device, equipment and storage medium
CN116414776A (en) Retrieval method, device, equipment and storage medium of monitoring video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination