CN112685573A - Knowledge graph embedding training method and related device - Google Patents

Knowledge graph embedding training method and related device Download PDF

Info

Publication number
CN112685573A
CN112685573A CN202110013880.6A CN202110013880A CN112685573A CN 112685573 A CN112685573 A CN 112685573A CN 202110013880 A CN202110013880 A CN 202110013880A CN 112685573 A CN112685573 A CN 112685573A
Authority
CN
China
Prior art keywords
sample
negative
knowledge
knowledge graph
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110013880.6A
Other languages
Chinese (zh)
Inventor
陈川
杜尔鑫
郑子彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202110013880.6A priority Critical patent/CN112685573A/en
Publication of CN112685573A publication Critical patent/CN112685573A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a knowledge graph embedding training method and a related device, wherein the method comprises the following steps: acquiring network topology information in a spectrogram structure of a knowledge graph; calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to a similarity calculation method and the network topology information; calculating a second similarity distance between the positive sample and the negative sample based on the first similarity distance and entity nodes included by the positive sample and the negative sample in the knowledge map spectrogram; calculating the comprehensive weight corresponding to each negative sample according to the second similarity distance corresponding to each negative sample; and performing model training of corresponding types according to the positive samples, the negative samples, each negative sample and corresponding comprehensive weight to obtain knowledge graph embedded representation. The technical problem that in the existing training process of knowledge graph embedding, the accuracy of the result of embedded representation is possibly low due to the fact that different load examples are viewed identically is solved.

Description

Knowledge graph embedding training method and related device
Technical Field
The present application relates to the field of knowledge graph technology, and in particular, to a knowledge graph embedding training method and related apparatus.
Background
The so-called knowledge graph is a semantic network which reveals the relationship between entities and formally describes and stores complex things of the real world and the mutual relationship thereof by using a graph structure. The development of knowledge graph embedding research enables a large number of human character symbol data sets existing in the real world to be understood by machines, utilized and expanded, so that the occurrence of the knowledge graph enables related applications such as intelligent search, personalized recommendation, intelligent question answering and the like to be better in performance. However, the real world is extremely complex, and it is an extremely large project to contain all the relationships of the real world.
The work of knowledge graph embedding is to represent semantic information of a research object as a low-dimensional continuous vector through supervised training of machine learning. In the embedding training process, there is a difference between the introduced different negative examples, however, in the embedding training process, all the negative examples are considered the same, which may result in a lower accuracy of the result of the embedded representation.
Disclosure of Invention
The application provides a knowledge graph embedding training method and a related device, and solves the technical problem that in the existing knowledge graph embedding training process, the accuracy of an embedding expressed result is possibly low due to the fact that different load examples are viewed identically.
In view of the above, a first aspect of the present application provides a knowledge-graph embedding training method, including:
acquiring network topology information in a spectrogram structure of a knowledge graph;
calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to a similarity calculation method and the network topology information;
calculating a second similarity distance between the positive sample and the negative sample based on the first similarity distance and entity nodes included by the positive sample and the negative sample in the knowledge map spectrogram;
calculating the comprehensive weight corresponding to each negative sample according to the second similarity distance corresponding to each negative sample;
and performing model training of corresponding types according to the positive samples, the negative samples, each negative sample and corresponding comprehensive weight to obtain knowledge graph embedded representation.
Optionally, calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to a similarity calculation method and the network topology information, specifically including:
and calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to a Euclidean distance calculation method and the network topology information.
Optionally, calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to a similarity calculation method and the network topology information, specifically including:
and calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to a cosine distance calculation method and the network topology information.
Optionally, calculating a second similarity distance between the positive example and the negative example based on the first similarity distance and the entity nodes included in the positive example and the negative example in the knowledge graph spectrogram, specifically including:
acquiring a positive sample and a negative sample corresponding to the positive sample in a knowledge map spectrogram;
acquiring a first node in the positive sample and a second node in the negative sample, wherein the first node is an entity node in the positive sample, and the second node is an entity node in the negative sample;
and taking a first similarity distance between the first node and the second node as a second similarity distance between the positive sample and the negative sample.
Optionally, calculating a comprehensive weight corresponding to each negative sample according to the second similarity distance corresponding to each negative sample, specifically including:
respectively calculating the context weight of the neighbor node and the context weight of the edge in the load sample according to the second similarity distance corresponding to the load sample;
and calculating to obtain corresponding comprehensive weight according to the neighbor node context weight and the edge context weight corresponding to each load sample.
Optionally, when the trained model is a translation distance model, performing model training of a corresponding type according to the positive examples, the negative examples, each of the negative examples, and the corresponding comprehensive weight to obtain a knowledge graph embedded representation, specifically including:
combining the positive sample and each corresponding negative sample into a sample combination, and setting the comprehensive weight of the negative samples in the sample combination as a first weight of the sample combination;
and training the translation distance model according to the sample combination and the corresponding first weight to obtain knowledge graph embedded representation.
Optionally, when the trained model is a semantic matching model, performing model training of a corresponding type according to the positive examples, the negative examples, each of the negative examples, and the corresponding comprehensive weight to obtain a knowledge graph embedded representation, specifically including:
setting a second weight corresponding to the normal sample as a preset weight;
and training the semantic matching model according to the positive sample, the preset weight corresponding to the positive sample, and the comprehensive weights corresponding to the negative samples to obtain knowledge graph embedded representation.
A second aspect of the present application provides a knowledge-graph embedding training apparatus, comprising:
the acquisition unit is used for acquiring network topology information in a spectrogram structure of the knowledge graph;
the first calculation unit is used for calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to a similarity calculation method and the network topology information;
a second calculating unit, configured to calculate a second similarity distance between the positive sample and the negative sample based on the first similarity distance and entity nodes included in the positive sample and the negative sample in the knowledge map spectrogram;
the third calculating unit is used for calculating the comprehensive weight corresponding to each load sample according to the second similarity distance corresponding to each load sample;
and the training unit is used for carrying out model training of corresponding types according to the positive samples, the negative samples, all the negative samples and corresponding comprehensive weights to obtain knowledge graph embedded representation.
The third invention of the present application provides a knowledge-graph-embedded training apparatus, comprising a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the knowledge-graph embedding training method according to the first aspect according to instructions in the program code.
A fourth aspect of the present application provides a storage medium for storing program code for performing the method of knowledge-graph embedding training according to the first aspect.
According to the technical scheme, the method has the following advantages:
the application provides a knowledge graph embedding training method, which comprises the following steps: acquiring network topology information in a spectrogram structure of a knowledge graph; calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to a similarity calculation method and the network topology information; calculating a second similarity distance between the positive sample and the negative sample based on the first similarity distance and entity nodes included by the positive sample and the negative sample in the knowledge map spectrogram; calculating the comprehensive weight corresponding to each negative sample according to the second similarity distance corresponding to each negative sample; and performing model training of corresponding types according to the positive samples, the negative samples, each negative sample and corresponding comprehensive weight to obtain knowledge graph embedded representation. The contribution of different load examples of differentiation in this application gives different weights to different load examples, makes different load examples obtain different concerns under the environment of difference, makes the model reach better effect to solved current training process to knowledge map embedding, looked at same benevolence to different load examples, probably led to embedding the lower technical problem of the result degree of accuracy that shows.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a simple schematic diagram of a knowledge-graph;
FIG. 2 is a flowchart illustrating a first embodiment of a knowledge-graph embedding training method according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating a second embodiment of a knowledge-graph embedding training method according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of an embodiment of a knowledge-graph embedding training apparatus in an embodiment of the present application.
Detailed Description
For ease of understanding, the relevant principles and definitions in the knowledge-graph are first described as follows:
as shown in fig. 1, a knowledge graph formally describes and stores real-world complex things and their interrelationships using a graph structure. These structured data typically appear in the form of triples (subject node (h), relationship (r), object node (t)), such as (zhongshan university, startup time, 1924), (zhongshan university, startup, grand zhongshan), (1924, year, 20), and so on.
Wherein, the negative examples are: for a positive sample (h, r, t), replacing a subject node h or an object node t, and sampling to obtain a negative sample (h ', r, t) or (h, r, t '), which is marked as (h ', r ', t ').
The difference exists between different load examples, which means that the reference value of the model training is different. However, in the existing model training process, the weights of different negative samples are equal, and the samples are treated equally, which is obviously not suitable. Therefore, the present application hopes to differentiate the weights of different load examples, so that the different load examples have different contributions to the model training, thereby achieving higher model accuracy.
In view of this, the embodiment of the present application provides a knowledge graph embedding training method and a related device, which solve the technical problem that in the existing knowledge graph embedding training process, the accuracy of the result of embedding representation is possibly low due to the fact that different negative examples are viewed identically.
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 2, a flowchart of a first embodiment of a knowledge-graph embedding training method in the embodiment of the present application is shown.
The knowledge graph embedding training method in the embodiment comprises the following steps:
step 201, network topology information in a spectrogram structure of a knowledge graph is obtained.
The knowledge graph is essentially a graph network, and specific connection relations exist among nodes and are not independent. Different nodes are connected with each other through different edges to form a huge network. The structural information formed by these nodes and edges is the network topology information.
The network topology information can supplement information except the information carried by the triples, so that the model can fuse other relevant information except the fact. Specifically, in the process of training a translation distance model, network topology information can be used for judging the reasonability of a certain positive and negative sample combination, and the reference significance of the combination on training is measured, so that the importance of the combination is determined. In the training process of the semantic matching model, the network topology information can be used for judging the rationality of the load sample and measuring the reference significance of the load sample on training, so that the importance of the load sample is determined.
The network topology information in this embodiment includes: the neighbor node context information and the edge context information are respectively explained as follows:
1) neighbor node context information, with the main node h as the center hspecificAnd all object nodes directly connected with the host node h are the outgoing neighbor nodes. Set of all neighbor nodes { enties | enties ═ t, (h) is presentspecific,r,t)∈D+And is the out-neighbor node context information. Correspondingly, if the object node t is taken as the center tspecificThen all host nodes { Entities | Entities ═ h, (h, r, t) directly connected to the guest node tspecific)∈D+Is the context information from the node to the neighbor node, wherein D+The triple set, i.e., the sample set, of the knowledge map spectrogram originally exists in the structure of the knowledge map spectrogram.
Where neighbor node context information may also reveal some information. For example, if the neighbor node context information includes keywords such as male/female, chinese, guangzhou, manager, doctor, etc., it can be inferred that the entity is a person. If the neighbor context information includes keywords such as south china, subtropical monsoon climate, cantonese tower, 7434.4 square kilometers, etc., it can be inferred that the entity is a city.
2) Context information, with a subject node h as a center hspecificThen all the relations directly connecting with the subject node h are its direct edges. Set of all directly connected edges { relationships | relationships ═ r, (h)specific,r,t)∈D+And the information is the context information on the edge of the node. Because the knowledge graph is a directed network structure, it can be described as outlining contextual information. Correspondingly, if the object node t is taken as the center tspecificThen all edges of the directly connected object node t { relationships | relationships ═ r, (h, r, t)specific)∈D+And the context information of the object node entry side is obtained.
Edge context information, like neighbor node context information, exposes some information in different ways. For example, if the context information includes keywords such as gender, language, city, job title, academic calendar, etc., it can be inferred that the entity is a person. If the context information includes keywords such as region, climate, landmark building, area, etc., it can be inferred that the entity is a city.
Step 202, calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to the similarity calculation method and the network topology information.
During the training process, the network topology information is used to determine the rationality and importance of the load examples. In this embodiment, the similarity index is used to measure the relationship. The similarity can be expressed by distance, and if the distance is small, the similarity is high. Specifically, the more similar the network topology information of the two is, the smaller the distance is, the higher the similarity is, and vice versa.
Specifically, first similarity distances between different entity nodes in a knowledge graph spectrogram are calculated according to a similarity calculation method and network topology information.
Step 203, calculating a second similarity distance between the positive sample and the negative sample based on the first similarity distance and the entity nodes included in the positive sample and the negative sample in the knowledge graph spectrogram.
As can be seen from the above description, the entity (subject node or object node) in the positive sample is replaced to obtain the negative sample. After the different entity nodes between the positive sample and the negative sample are defined correspondingly, the second similarity distance between the corresponding positive sample and the corresponding negative sample can be calculated according to the first similarity distance between the two entity nodes.
And 204, calculating the comprehensive weight corresponding to each negative sample according to the second similarity distance corresponding to each negative sample.
The relationship between the generated negative sample triplets and the original positive sample triplets can be visually observed by utilizing the similarity distances. If the two are close and the similarity distance is small, the replaced entity and the replaced entity have close relation and high similarity. It is further inferred that the generated negative example is likely to be a false negative example, likely to exist in the real world but not observed. Logically, this negative/positive combination is not of great reference significance and should be reduced in magnitude throughout the training process. If the two are far away from each other and the similarity distance is large, there is often no association between the replaced entity and the replaced entity. It is further inferred that the generated negative example is likely to be a true negative example. Logically, the reference meaning of the negative example/positive and negative example combination is larger, and the importance in the whole training process should be increased in a proper amount.
The existing training process treats each sample in the load equally, i.e. different load samples have the same weight in the training process. The contribution of each negative example is differentiated during the training process of the present embodiment.
And step 205, performing model training of corresponding types according to the positive samples, the negative samples, all the negative samples and corresponding comprehensive weights to obtain knowledge graph embedded representation.
After the comprehensive weight corresponding to each negative sample is obtained, corresponding model training can be carried out according to the comprehensive weight corresponding to the negative sample and the positive sample, and knowledge graph embedded expression is obtained.
In the embodiment, network topology information in a spectrogram structure of a knowledge graph is obtained; calculating a first similarity distance between different entity nodes in a knowledge graph spectrogram according to a similarity calculation method and network topology information; calculating a second similarity distance between the positive sample and the negative sample of each entity node according to the first similarity distance; calculating the comprehensive weight corresponding to each load sample according to the second similarity distance corresponding to each load sample; and performing model training of corresponding types according to the positive samples, the negative samples, all the negative samples and corresponding comprehensive weights to obtain knowledge graph embedded representation. The contribution of different load examples of differentiation in this application gives different weights to different load examples, makes different load examples obtain different concerns under the environment of difference, makes the model reach better effect to solved current training process to knowledge map embedding, looked at same benevolence to different load examples, probably led to embedding the lower technical problem of the result degree of accuracy that shows.
The above is an embodiment one of the knowledge-graph embedding training methods provided in the embodiments of the present application, and the following is an embodiment two of the knowledge-graph embedding training methods provided in the embodiments of the present application.
Referring to fig. 3, a flowchart of a second embodiment of a knowledge-graph embedding training method in the embodiment of the present application is shown.
The knowledge graph embedding training method in the embodiment comprises the following steps:
step 301, obtaining network topology information in a spectrogram structure of a knowledge graph.
It should be noted that the description of step 301 is the same as that of step 101 in the embodiment, and is not repeated here.
Step 302, calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to the similarity calculation method and the network topology information.
In one embodiment, calculating a first similarity distance between different entity nodes in a knowledge graph spectrogram according to a similarity calculation method and network topology information specifically includes:
and calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to the Euclidean distance calculation method and the network topology information.
At this time, the calculation formula of the first similarity distance is:
Figure BDA0002886036490000081
wherein Euclidean _ Distance (p, q) and Euclidean _ Distance (q, p) are Euclidean distances between two entity nodes p and q, and p isiThe value is 1 when the connection is performed, and is 0 when the connection is not performed, and the context information at this time is divided into a neighbor node context and an edge context. q. q.siAnd n is the length of the set of the context information.
In another embodiment, calculating a first similarity distance between different entity nodes in a knowledge graph spectrogram according to a similarity calculation method and network topology information specifically includes:
and calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to the cosine distance calculation method and the network topology information.
At this time, the calculation formula of the first similarity distance is:
Figure BDA0002886036490000091
in the formula, Cosine _ Distance (p, q) and Cosine _ Distance (q, p) are Cosine distances between two entity nodes p and q.
It should be noted that, as defined in the above formula, the first similarity distance includes a neighbor node context distance and an edge context distance.
It is understood that the similarity calculation method may be various, such as a jaccard distance, a pearson correlation distance, a manhattan distance, and the like. Those skilled in the art can make the adaptive setting as necessary according to the above examples.
Step 303, acquiring a positive sample and a negative sample corresponding to the positive sample in the knowledge map spectrogram.
And acquiring a positive sample and a corresponding negative sample in the knowledge map spectrogram.
And step 304, acquiring a first node in the positive sample and a second node in the negative sample, wherein the first node is an entity node in the positive sample, and the second node is an entity node in the negative sample.
It can be understood that the difference between the positive sample and the negative sample is that the entity nodes included in the positive sample are different, so that after the positive sample and the corresponding negative sample are obtained in this embodiment, a first node in the positive sample and a second node in the negative sample are obtained respectively, where the first node and the second node are the different entity nodes.
And 305, taking the first similarity distance between the first node and the second node as a second similarity distance between the positive sample and the negative sample.
The first similarity distance between different entity nodes is obtained through the calculation in step 302, that is, the first similarity distance between the first entity node and the second entity node obtained in step 304 is known, and the difference between the corresponding positive example and the corresponding negative example is embodied in the difference between the at least two (one pair of) entity nodes, so that the first similarity distance between the at least two entity nodes can be used as the second similarity distance between the positive example and the negative example.
And step 306, respectively calculating the context weight of the neighbor node and the context weight of the edge in the negative sample according to the second similarity distance corresponding to the negative sample.
It is to be understood that the second similarity distance of the negative example calculated above includes the neighbor node context distance and the edge context distance, so the neighbor node context weight and the edge context weight of the negative example can be calculated according to the second similarity distance described above.
The calculation formula of the context weight of the neighbor node is as follows:
Figure BDA0002886036490000101
in the formula, WneighboursFor the neighbor node context weight, negbour _ Distance ((h, r, t), (h ', r ', t ')) is the neighbor node context Distance in the second Similarity Distance of the positive and negative examples, Similarity _ Method (h, h ') is the neighbor node context Distance in the first Similarity Distance between the host nodes of the positive and negative examples, and Similarity _ Method (t, t ') is the neighbor node context Distance in the first Similarity Distance between the object nodes of the positive and negative examples.
The calculation formula of the context weight is as follows:
Figure BDA0002886036490000102
in the formula, WedgesEdge _ Distance ((h, r, t), (h ', r ', t ')) is the Edge-context Distance in the second Similarity distances of positive and negative examples, Similarity _ Method (E) for Edge-context weighth→r,Eh'→r') Similarity _ Method (E) as the edge context distance in the first Similarity distance between the positive and negative sample principal nodesr→t,Er'→t') Is an edge context distance in the first similarity distance between the object nodes of the positive and negative examples, Eh→rEdge context information for the subject node of the sample, Eh'→r'Edge context information for a load example principal node, Er→tContext information for the object node of the sample instance, Er'→t'The edge context information of the object node of the load sample.
And 307, calculating to obtain corresponding comprehensive weight according to the neighbor node context weight and the edge context weight corresponding to each load sample.
The corresponding comprehensive weight can be obtained through the context weight of the neighbor node and the context weight of the edge obtained in step 306, and the specific calculation formula can be as follows:
Weight((h,r,t),(h',r',t'))=1.0+λneighboursWneighboursedgesWedges
in the formula, Weight ((h, r, t), (h ', r ', t ')) is the comprehensive Weight, λneighboursIs the ratio of the context weight of the neighbor node in the comprehensive weight, lambdaedgesIs the ratio of the upper and lower context weights in the composite weight.
And 308, performing model training of corresponding types according to the positive samples, the negative samples, all the negative samples and corresponding comprehensive weights to obtain knowledge graph embedded representation.
In an embodiment, when the trained model is a translation distance model, performing model training of a corresponding type according to the positive sample, the negative sample, each negative sample and the corresponding comprehensive weight to obtain a knowledge graph embedded representation, specifically comprising:
combining the positive sample and each corresponding negative sample into a sample combination, and setting the comprehensive weight of the negative samples in the sample combination as a first weight of the sample combination;
and training the translation distance model according to the sample combination and the corresponding first weight to obtain knowledge graph embedded representation.
Correspondingly, the error equation of the translation distance model at this time is:
Figure BDA0002886036490000111
in the formula, max (0, margin-f)r(h,t)+fr'(h ', t')) is the Hinge error (Hinge Loss) between positive and negative examples, fr(h, t) is the evaluation score of the evaluation function in the model to the positive example, fr'(h ', t') is the evaluation score of the evaluation function in the model to the negative sample.
In another embodiment, when the trained model is a semantic matching model, performing model training of a corresponding type according to the positive examples, the negative examples, each negative example and the corresponding comprehensive weight to obtain a knowledge graph embedded representation, specifically comprising:
setting a second weight corresponding to the sampling case as a preset weight;
and training the semantic matching model according to the positive samples, the preset weights corresponding to the positive samples and the comprehensive weights corresponding to the negative samples to obtain knowledge graph embedded representation.
Accordingly, the error equation of the semantic matching model at this time is:
Figure BDA0002886036490000112
in the formula, WeightpositiveFor the hyperparameter, Weight ((H, R, T), (H ', R', T ')) is the composite Weight corresponding to the negative sample, and (H, R, T) is the sampling source of the negative sample (H', R ', T').
In the embodiment, network topology information in a spectrogram structure of a knowledge graph is obtained; calculating a first similarity distance between different entity nodes in a knowledge graph spectrogram according to a similarity calculation method and network topology information; calculating a second similarity distance between the positive sample and the negative sample of each entity node according to the first similarity distance; calculating the comprehensive weight corresponding to each load sample according to the second similarity distance corresponding to each load sample; and performing model training of corresponding types according to the positive samples, the negative samples, all the negative samples and corresponding comprehensive weights to obtain knowledge graph embedded representation. The contribution of different load examples of differentiation in this application gives different weights to different load examples, makes different load examples obtain different concerns under the environment of difference, makes the model reach better effect to solved current training process to knowledge map embedding, looked at same benevolence to different load examples, probably led to embedding the lower technical problem of the result degree of accuracy that shows.
The second embodiment of the knowledge-graph embedding training method provided in the embodiments of the present application is as follows.
Referring to fig. 4, a schematic structural diagram of an embodiment of a knowledge-graph embedding training apparatus in an embodiment of the present application is shown.
This embodiment knowledge-graph embedding trainer includes:
an obtaining unit 401, configured to obtain network topology information in a spectrogram structure of a knowledge graph;
a first calculating unit 402, configured to calculate a first similarity distance between different entity nodes in a knowledge graph spectrogram according to a similarity calculation method and network topology information;
a second calculating unit 403, configured to calculate a second similarity distance between the positive example and the negative example based on the first similarity distance and the entity nodes included in the positive example and the negative example in the knowledge graph spectrogram;
a third calculating unit 404, configured to calculate, according to the second similarity distance corresponding to each negative example, a comprehensive weight corresponding to the negative example;
and a training unit 405, configured to perform model training of corresponding types according to the positive examples, the negative examples, each negative example, and corresponding comprehensive weight, so as to obtain a knowledge graph embedded representation.
In the embodiment, network topology information in a spectrogram structure of a knowledge graph is obtained; calculating a first similarity distance between different entity nodes in a knowledge graph spectrogram according to a similarity calculation method and network topology information; calculating a second similarity distance between the positive sample and the negative sample of each entity node according to the first similarity distance; calculating the comprehensive weight corresponding to each load sample according to the second similarity distance corresponding to each load sample; and performing model training of corresponding types according to the positive samples, the negative samples, all the negative samples and corresponding comprehensive weights to obtain knowledge graph embedded representation. The contribution of different load examples of differentiation in this application gives different weights to different load examples, makes different load examples obtain different concerns under the environment of difference, makes the model reach better effect to solved current training process to knowledge map embedding, looked at same benevolence to different load examples, probably led to embedding the lower technical problem of the result degree of accuracy that shows.
The embodiment of the application also provides an embodiment of knowledge-graph embedding training equipment, wherein the knowledge-graph embedding training equipment in the embodiment comprises a processor and a memory; the memory is used for storing the program codes and transmitting the program codes to the processor; the processor is configured to perform the knowledge-graph embedding training method according to the first embodiment or the second embodiment according to instructions in the program code.
The embodiment of the present invention further provides an embodiment of a storage medium, where the storage medium is used to store a program code, and the program code is used to execute the knowledge graph embedding training method according to the first embodiment or the second embodiment.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of a unit is only one logical functional division, and there may be other divisions when implemented, for example, multiple units or components may be combined or integrated into another grid network to be installed, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to the needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A knowledge-graph embedding training method is characterized by comprising the following steps:
acquiring network topology information in a spectrogram structure of a knowledge graph;
calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to a similarity calculation method and the network topology information;
calculating a second similarity distance between the positive sample and the negative sample based on the first similarity distance and entity nodes included by the positive sample and the negative sample in the knowledge map spectrogram;
calculating the comprehensive weight corresponding to each negative sample according to the second similarity distance corresponding to each negative sample;
and performing model training of corresponding types according to the positive samples, the negative samples, each negative sample and corresponding comprehensive weight to obtain knowledge graph embedded representation.
2. The knowledge graph embedding training method of claim 1, wherein calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to a similarity calculation method and the network topology information specifically comprises:
and calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to a Euclidean distance calculation method and the network topology information.
3. The knowledge graph embedding training method of claim 1, wherein calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to a similarity calculation method and the network topology information specifically comprises:
and calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to a cosine distance calculation method and the network topology information.
4. The knowledge graph embedding training method of claim 1, wherein calculating a second similarity distance between a positive example and a negative example based on the first similarity distance and entity nodes included in the positive example and the negative example in the knowledge graph spectrogram specifically comprises:
acquiring a positive sample and a negative sample corresponding to the positive sample in a knowledge map spectrogram;
acquiring a first node in the positive sample and a second node in the negative sample, wherein the first node is an entity node in the positive sample, and the second node is an entity node in the negative sample;
and taking a first similarity distance between the first node and the second node as a second similarity distance between the positive sample and the negative sample.
5. The method of knowledge-graph embedding training according to claim 1, wherein calculating the synthetic weight corresponding to each negative sample according to the second similarity distance corresponding to the negative sample specifically comprises:
respectively calculating the context weight of the neighbor node and the context weight of the edge in the load sample according to the second similarity distance corresponding to the load sample;
and calculating to obtain corresponding comprehensive weight according to the neighbor node context weight and the edge context weight corresponding to each load sample.
6. The method of claim 1, wherein when the trained model is a translational distance model, performing model training of a corresponding type according to the positive examples, the negative examples, each of the negative examples, and corresponding comprehensive weights to obtain a knowledge-graph embedded representation, specifically comprising:
combining the positive sample and each corresponding negative sample into a sample combination, and setting the comprehensive weight of the negative samples in the sample combination as a first weight of the sample combination;
and training the translation distance model according to the sample combination and the corresponding first weight to obtain knowledge graph embedded representation.
7. The method of claim 1, wherein when the trained model is a semantic matching model, performing model training of a corresponding type according to the positive examples, the negative examples, each of the negative examples, and corresponding comprehensive weights to obtain a knowledge-graph embedded representation, specifically comprising:
setting a second weight corresponding to the normal sample as a preset weight;
and training the semantic matching model according to the positive sample, the preset weight corresponding to the positive sample, and the comprehensive weights corresponding to the negative samples to obtain knowledge graph embedded representation.
8. A knowledge-graph-embedded training device, comprising:
the acquisition unit is used for acquiring network topology information in a spectrogram structure of the knowledge graph;
the first calculation unit is used for calculating a first similarity distance between different entity nodes in the knowledge graph spectrogram according to a similarity calculation method and the network topology information;
a second calculating unit, configured to calculate a second similarity distance between the positive sample and the negative sample based on the first similarity distance and entity nodes included in the positive sample and the negative sample in the knowledge map spectrogram;
the third calculating unit is used for calculating the comprehensive weight corresponding to each load sample according to the second similarity distance corresponding to each load sample;
and the training unit is used for carrying out model training of corresponding types according to the positive samples, the negative samples, all the negative samples and corresponding comprehensive weights to obtain knowledge graph embedded representation.
9. A knowledge-graph-embedded training apparatus, the apparatus comprising a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the knowledge-graph embedding training method of any one of claims 1 to 7 according to instructions in the program code.
10. A storage medium for storing program code for performing the knowledge-graph embedding training method of any one of claims 1 to 7.
CN202110013880.6A 2021-01-06 2021-01-06 Knowledge graph embedding training method and related device Pending CN112685573A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110013880.6A CN112685573A (en) 2021-01-06 2021-01-06 Knowledge graph embedding training method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110013880.6A CN112685573A (en) 2021-01-06 2021-01-06 Knowledge graph embedding training method and related device

Publications (1)

Publication Number Publication Date
CN112685573A true CN112685573A (en) 2021-04-20

Family

ID=75456024

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110013880.6A Pending CN112685573A (en) 2021-01-06 2021-01-06 Knowledge graph embedding training method and related device

Country Status (1)

Country Link
CN (1) CN112685573A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377968A (en) * 2021-08-16 2021-09-10 南昌航空大学 Knowledge graph link prediction method adopting fused entity context
CN113642392A (en) * 2021-07-07 2021-11-12 上海交通大学 Target searching method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642392A (en) * 2021-07-07 2021-11-12 上海交通大学 Target searching method and device
CN113642392B (en) * 2021-07-07 2023-11-28 上海交通大学 Target searching method and device
CN113377968A (en) * 2021-08-16 2021-09-10 南昌航空大学 Knowledge graph link prediction method adopting fused entity context
CN113377968B (en) * 2021-08-16 2021-10-29 南昌航空大学 Knowledge graph link prediction method adopting fused entity context

Similar Documents

Publication Publication Date Title
Kulkarni et al. Survey of personalization techniques for federated learning
Zanghi et al. Clustering based on random graph model embedding vertex features
Amancio et al. Identification of literary movements using complex networks to represent texts
CN111143672B (en) Knowledge graph-based professional speciality scholars recommendation method
CN111143569A (en) Data processing method and device and computer readable storage medium
CN108287875B (en) Character co-occurrence relation determining method, expert recommending method, device and equipment
CN111930518A (en) Knowledge graph representation learning-oriented distributed framework construction method
CN112685573A (en) Knowledge graph embedding training method and related device
CN112966091A (en) Knowledge graph recommendation system fusing entity information and heat
Foxcroft et al. Name2vec: Personal names embeddings
Das et al. Sentence embedding models for similarity detection of software requirements
Chang et al. A word embedding-based approach to cross-lingual topic modeling
CN111078859A (en) Author recommendation method based on reference times
Umaashankar et al. Atlas: A dataset and benchmark for e-commerce clothing product categorization
CN113204643A (en) Entity alignment method, device, equipment and medium
Putrada et al. Shuffle split-edited nearest neighbor: A novel intelligent control model compression for smart lighting in edge computing environment
CN117236410A (en) Trusted electronic file large language model training and reasoning method and device
CN117391497A (en) News manuscript quality subjective and objective scoring consistency evaluation method and system
Xin et al. Entity disambiguation based on parse tree neighbours on graph attention network
Wang et al. More: A metric learning based framework for open-domain relation extraction
CN112463974A (en) Method and device for establishing knowledge graph
CN116720519A (en) Seedling medicine named entity identification method
Zhu et al. Construction of transformer substation fault knowledge graph based on a depth learning algorithm
CN114491076B (en) Data enhancement method, device, equipment and medium based on domain knowledge graph
CN116978087A (en) Model updating method, device, equipment, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210420