CN111858955B

CN111858955B - Knowledge graph representation learning enhancement method and device based on encryption federal learning

Info

Publication number: CN111858955B
Application number: CN202010629643.8A
Authority: CN
Inventors: 刘明生; 马伯元; 张诣; 温洪念; 许爱雪; 滕琦; 杜林峰; 赵尉钦
Original assignee: Shijiazhuang Institute of Railway Technology
Current assignee: Shijiazhuang Institute of Railway Technology
Priority date: 2020-07-01
Filing date: 2020-07-01
Publication date: 2023-08-18
Anticipated expiration: 2040-07-01
Also published as: CN111858955A

Abstract

The invention belongs to the technical field of knowledge graph representation learning, and provides a knowledge graph representation learning enhancement method based on encryption federal learning. The invention also provides an asynchronous training device, a generating type countermeasure learning device and a federal learning device so as to implement the method. The technical scheme provided by the invention realizes federal learning under multiple knowledge patterns among the untrusted data providers under homomorphic encryption, and enhances the representation learning ability of the respective knowledge patterns of the data providers.

Description

Knowledge graph representation learning enhancement method and device based on encryption federal learning

Technical Field

The invention relates to an artificial intelligence data processing technology, in particular to a knowledge graph representation learning enhancement method and device based on encryption federal learning.

Background

Federal learning (Federated Learning) is a distributed machine learning technique that utilizes information in non-communicating databases to perform global machine learning model training across the databases. The federal learning technology can realize the common modeling of multiple databases on the basis of ensuring the data privacy safety and legal compliance, namely, training a machine learning model together, and improving the effect of the machine learning model.

Knowledge Graph (knowledgegraph) represents a Knowledge base where learning is a semantic network. The machine learning semantic model stored in the form of a multi-relation graph is obtained by extracting natural entities and the relation among the natural entities. In the knowledge graph, the entity and the relation are represented by semantic information carried in the word vector. Based on the knowledge graph to the entity and relation representation in objective time, the computer system can better organize, manage and learn and understand big data in the Internet.

In the field of Natural Language Processing (NLP), a word vector represents a word with a vector, so that a natural language is mathematically symbolized for a computer to process the natural language. More specifically, the term vector in this patent refers to "knowledge word vector" of entity nodes in the knowledge graph.

The homomorphic encryption technology is a technology that ciphertext obtained by encrypting data is subjected to certain operation, and the operation result is decrypted to obtain a corresponding data operation result. Homomorphic encryption can achieve mutual confidentiality between the data provider and the operation executor, i.e., the data provider does not reveal plaintext data to the operation executor. The homomorphic encryption technology can realize data exchange and calculation under the untrusted condition, and is the basis of untrusted cloud calculation and distributed calculation.

In the prior art, when a data provider learns the knowledge graph representation of the data set, the knowledge graph representation of other data providers in the network cannot be used for enhancing the local learning effect because the knowledge graph representation of the other data providers is not trusted.

Disclosure of Invention

The invention provides a knowledge graph representation learning enhancement method and device based on encryption federal learning, which are used for performing federal learning under multiple knowledge graphs among untrusted data providers under homomorphic encryption, so as to enhance the representation learning ability of the respective knowledge graphs of the data providers.

An embodiment of a first aspect of the present invention provides a knowledge-graph representation learning enhancement method based on encryption federal learning, including:

at a first data processing end, performing characterization learning on a first knowledge graph to obtain a first word vector of a first entity in the knowledge graph;

at a second data processing end, performing characterization learning on a second knowledge graph to obtain a second word vector of a second entity aligned with the first entity in the knowledge graph;

the second data processing end receives the homomorphic encrypted first word vector, performs federal learning in a form of generating an countermeasure network by using the first word vector and the second word vector, and obtains a fused third word vector of the second entity;

and at a second data processing end, after replacing the second word vector of the second entity in the second knowledge-graph with the third word vector, continuing to perform characterization learning on the second knowledge-graph so as to obtain an enhanced fourth word vector of each entity in the knowledge-graph.

In order to improve the effect of federal learning, the method further improves the information exchange between the data processing ends to be expanded to a jump node of an alignment entity, and in an improved embodiment of the learning enhancement method represented by the knowledge graph, the method comprises the following steps:

at a first data processing end, performing characterization learning on a first knowledge graph to obtain a first word vector of a first entity in the knowledge graph and a first word vector of a one-hop node of the first entity;

at a second data processing end, performing characterization learning on a second knowledge graph to obtain a second word vector of a second entity aligned with the first entity and a second word vector of a second entity one-hop node aligned with the first entity one-hop node in the knowledge graph;

the second data processing end receives the homomorphic encrypted first word vector, performs federal learning in a form of generating an countermeasure network by using the first word vector and the second word vector, and obtains a fused third word vector of the second entity and a one-hop node;

and at a second data processing end, replacing the second word vector of the second entity and the first jump node of the second entity in the second knowledge graph with the third word vector, and continuing to perform characterization learning on the second knowledge graph so as to obtain an enhanced fourth word vector of each entity in the knowledge graph.

In a preferred embodiment, the knowledge graph represents a learning enhancement method, and the representation learning is performed by a transition method.

In a preferred embodiment, in the learning enhancement method, the step of performing federal learning in the form of a generated type countermeasure network includes:

the set of first word vectors received by the second data provider is noted as x= { X ₁ ,…,x _n The collection of the second word vectors corresponding to the first word vectors in the X in the second knowledge graph is marked as Y= { Y } ₁ ,…,y _n }；

According to the generator W in the generated countermeasure network, learning a discriminator in the generated countermeasure network so as to distinguish random sampling from WX= { Wx ₁ ,…,Wx _n Sum y= { Y ₁ ,…,y _n Elements of };

the generator W in the generating type countermeasure network is learned, so that the generator W maps the elements in the X to the word vectors of the corresponding nodes in the Y as accurately as possible, and the discriminator is difficult to discriminate whether one element belongs to WX or Y;

training the generated countermeasure network, and,

taking the elements in WX obtained after training as a third word vector,

or alternatively, the process may be performed,

taking the element in WX obtained after training and the element in the corresponding Y as a third word vector,

or alternatively, the process may be performed,

and taking the element in WX obtained after training is finished and the element in the corresponding Y as a third word vector.

A further improvement of each of the foregoing technical solutions resides in that the second data processing end shares the countermeasure generation network with the first data processing end.

An embodiment of a second aspect of the present invention provides an asynchronous training device, disposed at a data processing end, for implementing the above knowledge graph representation learning enhancement method based on encryption federal learning, where the device includes:

the reading module is used for reading the local original knowledge base and preprocessing the local original knowledge base into a local knowledge map containing word vectors;

the representation learning module is used for receiving a first request, and starting one-time representation learning of the local knowledge graph as a response so as to update word vectors of all entities in the local knowledge graph;

the communication module is used for communicating with the federal learning server and other data processing terminals so as to receive shared word vector information of the knowledge maps of the other data processing terminals; the shared word vector information is homomorphic encrypted;

the federation learning module is used for sending the shared word vector information and the local corresponding word vector into the generated type countermeasure network learning and obtaining the fused word vector; the fused word vector is used for replacing the word vector of all or part of the entities of the local knowledge graph.

In an embodiment of the asynchronous training device, the communication module sends shared word vector information of the local knowledge graph to the other data processing end; the shared word vector information is homomorphic encrypted.

In one embodiment of the asynchronous exercise device, an operation monitoring module is included; the operation monitoring module is used for supervising the asynchronous training device to execute the knowledge graph representation learning enhancement method of any one of claims 1 to 5; and/or for arbitrating the operating state of the asynchronous training device; and/or for adjusting the learning effect of the federal learning module.

An embodiment of a third aspect of the present invention provides a generating type countermeasure learning apparatus deployed at a data processing end or a federal learning server, the apparatus including:

a memory for storing computer executable code and a generative antagonism network;

the communication interface is used for being in communication connection with the federal learning module of the asynchronous training device through the communication module of the asynchronous training device provided by the second aspect, so that the federal learning module receives the first word vector set and the second word vector set, and sends the third word vector set to the federal learning module;

a processor for reading and executing the computer executable code to configure and train the generated countermeasure network; the processor, when executing the computer executable code, the instructions of the computer executable code cause the processor to:

the set of first word vectors is noted as x= { X ₁ ,…,x _n The set of the second word vectors corresponding to the first word vectors in X is marked as Y= { Y } ₁ ,…,y _n }；

training the generated countermeasure network, and,

the elements in WX obtained after training is finished are used as the third word vector to be output,

or alternatively, the process may be performed,

taking the element in WX obtained after training and the element in Y as the element obtained after averaging as the third word vector output,

or alternatively, the process may be performed,

and outputting the third word vector by taking the element obtained by summing the element in WX obtained after training and the element in the corresponding Y.

An embodiment of a fourth aspect of the present invention provides a federal learning apparatus comprising the asynchronous training apparatus of the second aspect described above and the generative challenge learning apparatus of the third aspect described above. The asynchronous training devices are distributed and deployed on different data processing terminals in a network, so that the enhancement of the distributed knowledge graph representation learning is realized.

In the technology provided by the aspects of the invention, federal learning is based on the generation of a countermeasure network technology (GAN), and semantic information, such as Word vectors (Word vectors), contained in other knowledge maps of an aligned entity is introduced by mapping the Word vectors of the aligned entity in different knowledge maps to the same characterization space, so that the expression capability of the Word vectors of the aligned entity is improved. Wherein the federal learning strategy comprises: acquiring information of aligned entities in different knowledge maps and original word vectors, extracting the word vectors of the entities in the different knowledge maps by using the information of the aligned entities as input for generating an countermeasure network for training, obtaining fused word vectors, replacing the word vectors of the aligned entities in the original representation learning result by the fused word vectors, and carrying out next round of representation learning training by taking the word vectors as initial values; on the basis of the strategy, one-hop nodes of the alignment entity in other knowledge maps are introduced to be used as the input for generating the countermeasure network together, so that the semantic richness of the fusion word vector is further enhanced. The encryption federation learning uses homomorphic encryption algorithm in the transmission process of the word vectors of the alignment entities carrying the information of the alignment entities, takes ciphertext of the word vectors in different knowledge maps as input for generating an countermeasure network to train, and decrypts the obtained fusion word vectors. Due to the homomorphism of the encryption algorithm, semantic information is not lost in the process of word vector fusion and decryption of the ciphertext. The asynchronous training device is a framework for jointly carrying out federal learning by multiple knowledge patterns, and in the invention, after a word vector model of a certain knowledge pattern is enhanced, fusion requests and alignment node information are sent to other knowledge patterns, so that the other knowledge patterns are guided to carry out federal learning training. The invention has the advantage of not revealing the specific information of the participated knowledge graph; the multi-knowledge map nodes jointly learn federally, so that the good effect of the overall knowledge representation capability is improved.

Drawings

In order to more clearly illustrate the examples of the invention or the technical solutions of the prior art, the drawings used in the examples or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are examples of the invention and that other drawings can be obtained from them without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an embodiment of a knowledge-graph representation learning enhancement method based on encrypted federal learning according to the present invention;

fig. 2 is a schematic structural diagram of an asynchronous training device in an embodiment of a knowledge graph representation learning enhancement device based on encryption federal learning.

Detailed Description

It should be noted that, in the prior art related to the present invention, the data owners may be companies, organizations, etc. with computing requirements, or may be devices that perform computing tasks, such as edge computing devices, data processing ends, etc., where each data owner has its own knowledge base, and these knowledge bases may be structured or unstructured data sets. Knowledge bases of individual data owners are in correlation and all require a feature learning of their own knowledge base for further machine learning of classification, analysis, etc. Because the data owners cannot share the knowledge bases, more semantic information optimization characterization learning results cannot be directly obtained, and the knowledge bases can be characterized and learned only in the local range of the trust of the data owners to obtain the original knowledge graph of the knowledge bases. The knowledge graphs at least comprise a set of Entities (relationships), relations and Facts, wherein the Entities can be examples, concepts or literal quantities (Facts), and because each knowledge base has correlation, the original knowledge graphs of each knowledge base must have the same or equivalent Entities, and when one entity e1 of one knowledge graph and one entity e2 of another knowledge graph are given the same or equivalent relation, the entity e1 and the entity e2 are mutually aligned Entities of each other in the respective knowledge graphs. This same or equivalent relationship, including ontology matching (ontology matching) or entity alignment (entity alignment) in the knowledge fusion (knowledgement fusion) method, is used to provide a basis for data alignment for federal learning. The aligned entity nodes of the various knowledge maps on the graph structure can be considered to be mapped to the same characterization space in the invention. By the knowledge graph representation learning enhancement method based on encryption federal learning, in some embodiments, a better characterization embedding effect can be obtained locally for data owners, and in other embodiments, a characterization embedding effect based on the union of knowledge bases of the data owners can be obtained.

It should be noted that the token learning (Representation Learning) is also called representation learning, and is a method for obtaining vectorized expressions of each entity or relation by using machine learning, so as to facilitate extraction of useful information when constructing a classifier or other prediction variables. In machine learning, token learning is a technical integration of feature learning, i.e., converting raw data into a form that can be developed by machine learning, which avoids the complexity of manually extracting features and allows learning to use features while grasping the extraction method.

The technical concept of the invention is that firstly, the knowledge graph is subjected to representation and learning training locally to obtain word vectors of aligned entities, then, the word vectors of aligned nodes sent from the outside are received in a federal learning mode, the word vectors of the aligned nodes locally are updated through GAN of federal learning, and finally, the next time of the knowledge graph is subjected to representation and learning training locally, so that the enhanced representation and learning effects are obtained in a circulating way. For the purpose of making the objects, technical solutions and advantages of the present examples more apparent, the technical solutions in the present examples will be clearly and completely described below with reference to the accompanying drawings in the present examples, and it is apparent that the described examples are some, but not all, examples of the present invention. All other examples, based on examples in this invention, which a person of ordinary skill in the art would obtain without making any inventive effort, are within the scope of the invention.

A first embodiment of the first aspect of the present invention provides a knowledge-graph representation learning enhancement method based on encrypted federal learning. In this embodiment, the data owner F1 has a knowledge base D1, and a first data processing end for learning a knowledge graph representation of D1 is operated in a local security network thereof; the data owner F2 has a knowledge base D2, and a second data processing end for learning the knowledge map representation of the D2 is operated in a local security network; d1 and D2 have relevance, for example, both include the fact that they are related to both the geographic concept "beijing" and the vehicle "airplane", and the data owners F1 and F2 cannot disclose their respective knowledge bases to each other, but the first data processing end and the second data processing end have a cross-gateway communication connection. The present embodiment improves the learning effect of the second data processing end on the representation of D2 by the following steps. As shown in the flowchart of the embodiment method of fig. 1, the method of the present example includes steps 101 to 106.

And step 101, acquiring a knowledge graph of the original data set, and performing knowledge representation learning on the independent knowledge graph.

The original data set is a knowledge base of each data owner, and the relevant knowledge bases are considered to be basically static and not updated in the implementation period of the method, so that the number of entities of the knowledge graph obtained in the step is not changed in the subsequent process. In this embodiment, knowledge information of an original dataset is extracted by an information extraction technology, and then a corresponding original knowledge graph is obtained by semantic analysis, and each entity in the original knowledge graph is converted into a dense vector for describing a word vector corresponding to the entity. In some other embodiments, the word vector corresponding to the entity may also be obtained through other word enabling or word2vec technologies. And performing representation learning on the original knowledge graph to obtain a word vector set of each alignment entity of the original knowledge graph.

Specifically, in this embodiment, at the second data processing end, the second knowledge graph extracted by D2 is subjected to representation learning, and all word vectors in the knowledge graph are updated, that is, a word vector set of the knowledge graph after the multi-element relation is embedded through representation learning is obtained, where the set includes second word vectors of all second entities in the second knowledge graph.

The one-time representation learning of one knowledge graph in the invention can be realized by TransE, transH, transR, transD and other methods. The specific flow of knowledge representation learning on a knowledge graph implemented by the TransE in the embodiment is exemplified as follows:

step 201, the relationship information between all the entities of a knowledge graph and the entities is read, wherein the relationship information includes the relationship between different entities, for example, "capital" in "china-capital-beijing" is the relationship information, (word vector 1, both relationships, word vector 2) to form a triplet. Each entity and the relationship information are respectively represented by a dense vector, and the dense vector representing the entity is the word vector of the entity, and the dense vector representing the relationship is the word vector of the relationship. And expressing the word vectors of the entities and the word vectors of the relations of the knowledge graph, wherein the distributed vector based on the entities and the relations is expressed as a plurality of triplet examples of (h, l, t), the relation word vector l is regarded as translation from the entity word vector h to the entity word vector t, and the set of all the triplet examples of the knowledge graph is S.

Step 202, in knowledge representation learning, h+l is made as equal as possible to t by continuously adjusting h, l and t, i.e. h+l=t.

Let knowledge represent learned loss functions as:

wherein h, t are token vectors of entities in the knowledge-graph, l is token vector of relationships in the knowledge-graph, S is the set of all triples in the knowledge-graph to be trained, S' is the set of negative samples of all triples in the knowledge-graph to be trained, [ the following] ₊ The absolute value operation is represented, and gamma is a preset super parameter.

Specifically, according to the method of transition, a specific Algorithm flow of the primary knowledge representation learning in this embodiment is as follows Algorithm 1:

where k is the dimension of the generated token vector, E is the set of all entities in the knowledge-graph to be trained, and L is the set of all relationships in the knowledge-graph to be trained.

In Algorithm 1, lines 1-3 represent elements of each triplet (h, l, t) of each input untrained knowledge-graph during initialization, and a token vector is generated by random allocation, and the module length normalization is unified to be 1.

In Algorithm 1, rows 4-12 represent training of each token vector of the knowledge graph, and the process is as follows: firstly, extracting a set S containing the number of triples b from S by adopting minimatch _batch As a sample set for the current training, then according to S _batch Generating T by means of negative sampling _batch 。T _batch Each element of (a) is S _batch And (b) a triplet pair ((h, l, t), (h ', l, t')) of a triplet (h, l, t) of a corresponding randomly generated negative sample (h ', l, t'). Wherein, the negative sample refers to: the negative samples corresponding to (h, l, t) are (h ', l, t') if and only if (h, l, t) belong to S _batch (h ', l, t') is not S _batch 。

Then for T _batch Is updated by gradient descent.

Through the TransE algorithm, knowledge representation learning can be independently performed on a knowledge graph. After the word vector of the knowledge graph in the process is updated, a word vector set of the knowledge graph is obtained, wherein the word vector set is obtained after the embedding of the multiple relations through the TransE, and the knowledge graph is regarded as the knowledge graph to obtain the improvement of knowledge representation capability.

Step 102, obtaining a certain knowledge graph to obtain a message with improved knowledge representation capability.

Specifically, at the first data processing end, performing representation learning on the first knowledge graph extracted by the D1 is completed, and first word vectors of all first entities in the knowledge graph are obtained. The second data processing end receives a message for improving knowledge representation capability from the first data processing end based on the whole network broadcasting, point-to-point or third party scheduling. In some embodiments, the forwarding of the message may be forwarded by a coordinator trusted by each data processing end of federal learning, so as to include, in the messages, non-encrypted information for each first entity aligned with the entity, and when the second data processing end processes the second knowledge-graph, screen the first entity corresponding to the second entity in the second knowledge-graph, so as to allocate a corresponding index and stack, or obtain a second word vector of the second entity aligned with the first entity in the knowledge-graph.

And step 103, sending the aligned node word vector to other knowledge graph nodes. The aligned nodes are intersections of entity node sets in the knowledge maps, for example, the first knowledge map has an entity "Beijing", and the second knowledge map also has an entity "Beijing", so that the "Beijing" is one of aligned entity nodes of the two knowledge maps. The homomorphic encryption technology is utilized to protect data in the process of sharing data among different knowledge graphs, federal learning is carried out after the transmitted word vectors are encrypted, and the knowledge graph nodes transmitting the word vectors can be ensured not to leak word vector information and entity information to other knowledge graph nodes.

Specifically, the second data processing end asynchronously acquires information of first vectors of first entities aligned with a plurality of second entities of the second knowledge graph in the first knowledge graph from the first data processing end, wherein the information is based on homomorphic encryption in federal learning.

And 105, after the knowledge graph receives the word vector, performing federal learning by using the generated type countermeasure network.

Specifically, the second data processing end receives the homomorphic encrypted first word vector, performs federal learning in a form of generating an countermeasure network by using the first word vector and the second word vector, and obtains a fused third word vector of the second entity. The other data processing end receives word vector information of the alignment entity, and inputs the word vector information and the alignment node word vector of the local knowledge graph together to generate an countermeasure network for training, and acquires a trained fusion word vector which is used for replacing the original word vector in the local knowledge graph.

Exemplary, in this embodiment, the step of performing federal learning in the form of a generated countermeasure network to obtain a fused word vector includes:

step 301, recording a set of first word vectors received by a second data provider as x= { X ₁ ,…,x _n Acquiring a set of word vectors of alignment nodes provided by a first data provider by taking the first data provider as a far-end data owner; the set of the second word vectors corresponding to the first word vectors in the X in the second knowledge graph is marked as Y= { Y ₁ ,…,y _n I.e., a set of word vectors for corresponding aligned nodes within the local knowledge-graph.

Step 302, learning a discriminator in the generated countermeasure network according to the generator W in the generated countermeasure network to distinguish random sampling at wx= { WX ₁ ,…,Wx _n Sum y= { Y ₁ ,…,y _n Elements of }. WX is a set of vector elements corresponding to each element in X, which are randomly generated by the generator W, and which contain partial information of the corresponding first word vector.

Step 303, learning a generator W in the generated countermeasure network to map the element in X onto the word vector of the corresponding node in Y as accurately as possible, so that it is difficult for the discriminator to discriminate whether an element belongs to WX or Y;

and step 304, training the generated countermeasure network, wherein the element in WX obtained after the training is finished is used as a third word vector, or the element obtained after the training is finished and the element in the corresponding Y are averaged, or the element obtained after the training is finished and the element in the corresponding Y are summed, is used as a third word vector. That is, there are three modes of fusion vector, which are all processes of aligning node ebedding, including:

(1) Directly replacing an alignment node enabling a GAN result;

(2) The alignment node emebedding is replaced after the result of the GAN is averaged with the alignment node emebedding;

(3) The result of the GAN is summed with its node emmbedding and the aligned node emmbedding is replaced.

The learning method of the generator and the discriminator adopts a standard depth countermeasure network training flow, and for given two groups of samples X and Y, the discriminator and the generator are updated by a random gradient descent method in sequence so as to minimize the objective function of the discriminator and the objective function of the generator.

The objective function of the arbiter can be written as:

the objective function of the generator can be written as:

wherein: θ _D Is a parameter of the arbiter that is used to determine the parameters of the arbiter,representing that the arbiter considers the z-word vector as belonging to the element in Y, -/->Representing the z-word considered by the arbiterThe vector is of +.>Is a component of the group.

After training the generator and the arbiter, the GAN ultimately yields the word vectors in WX.

And 106, replacing the original word vector of the knowledge graph with the obtained fusion word vector, and continuing knowledge representation learning.

Specifically, at the second data processing end, after replacing the second word vector of the second entity in the second knowledge-graph with the third word vector, continuing to perform representation learning on the second knowledge-graph so as to obtain the fourth word vector after the enhancement of each entity in the knowledge-graph.

Step 103, 105, 106 send the word vector of each node of the knowledge graph to the data owners of the aligned nodes of other knowledge graphs, the other data owners receive the word vector of each aligned node of one node from each data owner, and the word vector of the aligned node in the own knowledge graph is input together to generate an countermeasure network for training, the fusion word vector trained by the node is obtained, and the original word vector of the node is replaced.

A second embodiment of the first aspect of the present invention provides a knowledge-graph representation learning enhancement method based on encrypted federal learning. The difference from the first embodiment is that step 103 is replaced by step 104.

And 104, transmitting the word vector of the one-hop node of the alignment node and the word vector of the alignment node to other knowledge graph nodes. So that federal learning is performed in step 105 in the form of generating a countermeasure network and replacing the trained fused word vector with the original word vector of the aligned node.

In this embodiment, steps 104, 105, 106 send the word vectors of all the one-hop nodes of the alignment nodes and the word vectors of the alignment nodes to the knowledge-graph nodes of other data owners together, so as to generate the form of antagonism network for federation learning, and replace the original word vectors of the alignment nodes with the fusion word vectors obtained by training.

Correspondingly, in this embodiment, at a first data processing end, performing feature learning on a first knowledge graph to obtain a first word vector of a first entity in the knowledge graph and a first word vector of a one-hop node of the first entity; at a second data processing end, performing characterization learning on a second knowledge graph to obtain a second word vector of a second entity aligned with the first entity and a second word vector of a second entity one-hop node aligned with the first entity one-hop node in the knowledge graph; the second data processing end receives the homomorphic encrypted first word vector, performs federal learning in a form of generating an countermeasure network by using the first word vector and the second word vector, and obtains a fused third word vector of the second entity and a one-hop node; and at a second data processing end, replacing the second word vector of the second entity and the first jump node of the second entity in the second knowledge graph with the third word vector, and continuing to perform characterization learning on the second knowledge graph so as to obtain an enhanced fourth word vector of each entity in the knowledge graph.

In a third embodiment of the first aspect of the present invention, the second data processing end shares the countermeasure generation network with the first data processing end.

In a fourth embodiment of the first aspect of the present invention, a third data provider exists in the network, and the third data provider, according to the method of each embodiment, improves the learning ability of the self knowledge representation by using word vector information of the aligned entity nodes in the respective knowledge maps of the first data provider and the second data provider asynchronously.

The invention also provides an embodiment of the knowledge graph representation learning enhancement device based on encryption federal learning. The apparatus of this embodiment is a federal learning apparatus, which includes an asynchronous training apparatus deployed at both a first data processing end local to a first data owner and a second data processing end local to a second data owner, and a generative countermeasure learning apparatus deployed at a federal learning server.

Wherein, each asynchronous training device has a structure as shown in fig. 2, and comprises:

the reading module 11 is configured to read the local original knowledge base and preprocess the local original knowledge base into a local knowledge graph containing word vectors. The module reads the original knowledge base of the data owner into the federation learning device, preprocesses the knowledge base, prepares operation conditions for federation learning, and if the word vectors and the relations are obtained in the step 101, constructs a triplet set.

A token learning module 12 is configured to receive a first request, and in response, initiate a token learning of the local knowledge-graph to update word vectors of respective entities in the local knowledge-graph. The module is used for starting knowledge representation learning on each knowledge graph in a distributed mode and recording learning conditions of own knowledge graphs, such as completing characterization learning in steps 101 and 106. The first request may be provided by the present asynchronous training device or may be provided by an external generating type countermeasure learning device.

The communication module 13 is used for communicating with the federal learning server and other data processing terminals so as to receive shared word vector information of the knowledge maps of the other data processing terminals; the shared word vector information is homomorphic encrypted. The module is used for communicating the knowledge graph nodes with the whole distributed federation learning device and other knowledge graph nodes, sending shared word vector information to other knowledge graphs when the self knowledge representation capability is improved, namely word vectors of all aligned nodes, and receiving the shared word vector information sent by other knowledge graph nodes. Specifically, the communication module 13 is configured to receive the message of step 102. In some other embodiments, the communication module sends the shared word vector information of the local knowledge graph to the other data processing end; the shared word vector information is homomorphic encrypted.

The federation learning module 14 is configured to send the shared word vector information and the local corresponding word vector into a generating type countermeasure network learning, and obtain a fused word vector; the fused word vector is used for replacing the word vector of all or part of the entities of the local knowledge graph. The module is used for sending the received shared word vector and the corresponding word vector thereof into the generation countermeasure network learning, and replacing the word vector of the self alignment node of the knowledge graph with the obtained fusion word vector.

An operation monitoring module 15, configured to supervise the asynchronous training device to perform the learning enhancement method represented by the knowledge graph provided in the first aspect; and/or for arbitrating the operating state of the asynchronous training device; and/or for adjusting the learning effect of the federal learning module.

The generating type countermeasure learning device of the present embodiment includes: a memory for storing computer executable code and a generative antagonism network; a communication interface, configured to be communicatively connected to the federal learning module of the asynchronous training device through the communication module of any one of claims 6 to 8, so as to receive the set of first word vectors and the set of second word vectors from the federal learning module, and send the set of third word vectors to the federal learning module; a processor for reading and executing the computer executable code to configure and train the generated countermeasure network; the processor, when executing the computer executable code, the instructions of the computer executable code cause the processor to: the set of first word vectors is noted as x= { X ₁ ,…,x _n The set of the second word vectors corresponding to the first word vectors in X is marked as Y= { Y } ₁ ,…,y _n -a }; according to the generator W in the generated countermeasure network, learning a discriminator in the generated countermeasure network so as to distinguish random sampling from WX= { Wx ₁ ,…,Wx _n Sum y= { Y ₁ ,…,y _n Elements of }; the generator W in the generating type countermeasure network is learned, so that the generator W maps the elements in the X to the word vectors of the corresponding nodes in the Y as accurately as possible, and the discriminator is difficult to discriminate whether one element belongs to WX or Y; training the generated countermeasure network, and outputting the third word vector by taking the element in WX obtained after training, or outputting the third word vector by taking the element in WX obtained after training and the element in Y correspondingly after averaging, or summing the element in WX obtained after training and the element in Y correspondinglyIs the third word vector output. In this embodiment, each data processing end trains the same generated type countermeasure network together, so that a characterization embedding effect based on the union of knowledge bases of all data owners can be obtained.

The device of this embodiment may be correspondingly used to implement the technical solution of the method embodiment shown in fig. 1, and its implementation principle and technical effects are similar, and will not be described here again.

The embodiment of the invention discloses a knowledge graph representation learning enhancement method and an asynchronous training device based on encryption federal learning. And acquiring a knowledge graph of the original data set, and performing knowledge representation learning on the independent knowledge graph. After learning a certain knowledge graph to obtain improvement of knowledge representation capability, the word vectors of the aligned nodes are encrypted and then sent to other knowledge graph nodes, and the other knowledge graph nodes perform federal learning after receiving the word vectors. In federal learning, the received word vector and the word vector of the alignment node of the knowledge graph are input together to generate an countermeasure network for training, the fusion word vector is obtained after training, and the original word vector of the node of the knowledge graph is replaced. On the basis, the word vector of the one-hop node of the alignment node and the word vector of the alignment node are input together to generate an countermeasure network, federal learning is carried out, and the trained fusion word vector is used for replacing the original word vector of the node.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. A knowledge graph representation learning enhancement method based on encryption federal learning, comprising:

2. The knowledge-graph representation learning enhancement method according to claim 1, comprising:

3. The knowledge-graph representation learning enhancement method of claim 1, wherein the representation learning is representation embedding by a transition method.

4. The knowledge-graph representation learning enhancement method of claim 1, wherein said step of federally learning in the form of a generated countermeasure network comprises:

recording a set of first word vectors received by a second data provider as

Combining the second knowledge graph with ++>The set of second word vectors corresponding to the respective first word vector is denoted +.>；

According to the generator in the generated countermeasure networkLearning a arbiter in the generated countermeasure network to causeWhich is able to distinguish between random sampling +.>And->Is an element of (2);

learning a generator in the generated countermeasure networkSo that it will +.>The elements in (a) are mapped to->On the word vector of the corresponding node in (c), making it difficult for the discriminator to discriminate that an element belongs to +.>Whether or not it is->；

Training the generated countermeasure network, and,

obtained after training is finishedThe element in (a) is a third word vector,

or alternatively, the process may be performed,

obtained after training is finishedThe element obtained after averaging the element in the corresponding Y is a third word vector,

or alternatively, the process may be performed,

obtained after training is finishedThe element obtained after summing the element in the corresponding Y is the third word vector.

5. The knowledge-graph representation learning enhancement method according to any one of claims 1 to 4, characterized in that: the second data processing end shares the countermeasure generation network with the first data processing end.

6. An asynchronous training device deployed at a data processing end, comprising:

7. The asynchronous training device of claim 6, wherein: the communication module sends shared word vector information of the local knowledge graph to the other data processing end; the shared word vector information is homomorphic encrypted.

8. The asynchronous training device of claim 6, wherein: comprises an operation monitoring module; the operation monitoring module is used for supervising the asynchronous training device to execute the knowledge graph representation learning enhancement method of any one of claims 1 to 5; and/or for arbitrating the operating state of the asynchronous training device; and/or for adjusting the learning effect of the federal learning module.

9. A generating type countermeasure learning device deployed at a data processing end or a federal learning server, comprising:

a communication interface, configured to be communicatively connected to the federal learning module of the asynchronous training device through the communication module of any one of claims 6 to 8, so as to receive the set of first word vectors and the set of second word vectors from the federal learning module, and send the set of third word vectors to the federal learning module;

recording the set of first word vectors asWill->The set of second word vectors corresponding to the respective first word vector is denoted +.>；

According to the generator in the generated countermeasure networkLearning a discriminator in said generated challenge network to enable it to distinguish between random samplings +.>And->Is an element of (2);

Training the generated countermeasure network, and,

obtained after training is finishedIs output for the third word vector,

or alternatively, the process may be performed,

obtained after training is finishedThe element obtained after the average of the element in the corresponding Y is the third word vector output,

or alternatively, the process may be performed,

obtained after training is finishedAnd the element obtained after summing the element in the corresponding Y is the third word vector.

10. A federal learning apparatus, comprising:

an asynchronous training device as claimed in any one of claims 6 to 8;

the generative resistance learning device of claim 9.