CN116757275A - Knowledge graph federal learning device and method - Google Patents

Knowledge graph federal learning device and method Download PDF

Info

Publication number
CN116757275A
CN116757275A CN202310674594.3A CN202310674594A CN116757275A CN 116757275 A CN116757275 A CN 116757275A CN 202310674594 A CN202310674594 A CN 202310674594A CN 116757275 A CN116757275 A CN 116757275A
Authority
CN
China
Prior art keywords
data
federal learning
knowledge
graph
public
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310674594.3A
Other languages
Chinese (zh)
Other versions
CN116757275B (en
Inventor
汤克云
徐炽明
高俊杰
徐荣文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingxin Data Technology Co ltd
Original Assignee
Jingxin Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingxin Data Technology Co ltd filed Critical Jingxin Data Technology Co ltd
Priority to CN202310674594.3A priority Critical patent/CN116757275B/en
Publication of CN116757275A publication Critical patent/CN116757275A/en
Application granted granted Critical
Publication of CN116757275B publication Critical patent/CN116757275B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Medical Informatics (AREA)

Abstract

The application discloses a federal learning device and a federal learning method of a knowledge graph, wherein the federal learning device comprises a participant, a secondary party, federal learning, the knowledge graph and a privacy knowledge graph, the participant refers to a real demand party in a federal learning computing task, an arbitrator refers to a coordinator in the federal learning computing task, the federal learning is used as an entrance for realizing federal learning between the participants, the knowledge graph is used as a data carrier for the participants and the arbitrator to perform the federal learning computing task, and the privacy knowledge graph provides training/prediction data required to be used in the federal learning. According to the federal learning device and method, each party respectively utilizes own privacy data and a public knowledge graph constructed and issued by one federal learning secondary party to construct the respective knowledge graph locally, and each party forms a privacy data set with a common sample ID or a common characteristic by performing entity alignment, data enhancement and construction ID solving functions in the respective knowledge graph, so that the universality of a model is improved.

Description

Knowledge graph federal learning device and method
Technical Field
The application belongs to the technical field of big data calculation, and particularly relates to a federal learning device and method of a knowledge graph
Background
With the advent of the big data age, the value of data is increasingly prominent. However, enterprises and individuals have great concerns in terms of data sharing due to data privacy and security issues. Federal learning is used as a distributed machine learning method for protecting data privacy, and can realize the common utilization of multi-party data under the condition of not directly sharing original data. Federal learning can be subdivided into lateral federal learning (HFL), longitudinal federal learning (VFL), and federal migration learning (FTL) based on sample and feature overlap conditions.
The federal transfer learning is a technology for performing knowledge transfer and model training by using data in different related fields, is suitable for a federal learning method of a scene with no same point in a feature space and a sample ID space, and has the problems of high calculation cost, difficult privacy protection, low model universality and the like.
In order to solve the problems, the application provides a federal learning device and a federal learning method for knowledge maps.
Disclosure of Invention
Aiming at the defects of the prior art, the application aims to provide a federal learning device and a federal learning method for knowledge maps, each party respectively utilizes own privacy data and a public knowledge map constructed and issued by a federal learning secondary party, each party can form a privacy data set with a common sample ID or a public feature by performing functions such as entity alignment, data enhancement, construction and intersection ID and the like in the respective knowledge maps, and the method ensures that each party can protect the privacy of data and simultaneously enhances the data feature enhancement and the data alignment feature through the conversion of the knowledge maps, thereby improving the universality and the performance effect of a model, expanding the application range of longitudinal federal learning and being beneficial to fully utilizing the value of the data.
The aim of the application can be achieved by the following technical scheme:
a federal learning device for knowledge patterns comprises participants, a secondary party, federal learning, knowledge patterns and privacy knowledge patterns.
The participants refer to real demand parties in the federal learning computing task and are used for supporting data and computing power, and the number of the participants is multiple.
The arbitrator refers to a coordinator in the federal learning computing task.
The federal learning is used as an entrance for realizing federal learning among the participants, and has the functions of agreeing with a safety protocol and providing federal learning modeling and training parameter interaction.
The knowledge graph serves as a data carrier for the participants and the arbitrators to perform federal learning computing tasks.
The privacy knowledge graph provides training/prediction data which needs to be used in federal learning, each participant temporarily generates and maintains the training/prediction data when the privacy calculation task is started, and the training/prediction data can be automatically destroyed after the privacy calculation task is finished.
Further, one of the participants is used as an initiator of the federal learning computing task to start the computing task, and the rest of the participants are responsible for assisting in completing the federal learning computing task.
Further, on the federal learning computing task, the arbitrator plays a role of a coordinator and is responsible for overall parameter transmission in the federal learning computing task; on the knowledge graph replacement task, an arbitrator bears an intermediate medium of external knowledge graph data and is responsible for acquiring external data and providing the external data to each participant.
Further, the participants are responsible for training and predicting functions of the model in the federal learning process; the arbitrator is responsible for parameter interactions in the federal learning process.
Furthermore, the knowledge graph of the participant is mainly used for carrying out knowledge alignment with the external graph data to generate privacy sub-graph data required by federal learning; and the knowledge graph of the arbitrator mainly provides public graph data support for the participants according to task requirements.
Furthermore, the knowledge graph of all the computing nodes comprises a graph database and a single-machine graph machine learning middle stage.
Further, for the secondary party in the computing node, the knowledge graph is further provided with a public comprehensive graph data warehouse and an external graph data acquisition module.
The federal learning method of the knowledge graph is applied to the federal learning device, and specifically comprises the following steps:
s1: initiating a federal learning computing task;
the initiator uploads the private data to the private node thereof to construct data needed to be used by the federal learning computing task, which is marked as D private The initiator is one of the participants.
The initiator initiates a federal learning computing task, and the assembled data requirement is recorded as D R And specific federal learning algorithm FL is required A Wherein D is R The following three data are included:
metadata information of private data in initiator, noted as GD meta ,GD meta The method is used for analyzing and constructing public knowledge patterns based on the metadata types of the sponsor by the arbitrator, and unifying the data types of all parties in the construction of the privacy knowledge patterns based on the public knowledge patterns.
Data type information of private data, which is denoted as GD category ,GD category Initiator D-based construction for arbiter analysis private Public knowledge graph related to data types.
Request type information of public data, noted GD common_category ,GD common_category The method is used for a arbitrator to acquire knowledge data of related types disclosed by the arbitrator or according to the types.
Final initiator will D R 、FL A And sending the message to an arbitrator.
S2: constructing a privacy knowledge graph;
the privacy knowledge graph is a temporary data carrier created in the privacy calculation task, training/predicting data are provided for the privacy calculation task, and self business data and public knowledge data are fused.
S21: initiating a public data acquisition requirement;
the arbitration transmits an electronic confirmation function of the federal learning calculation task to other participants of the federal learning calculation task, wherein the electronic confirmation function comprises D R And FL (field effect transistor) A After each participant receives the electronic confirmation function, D in the corresponding function R 、FL A Further confirmation, each participant in the confirmation process is according to D R In (a) to upload its private data D private Into its privacy node.
S22: inquiring public data;
after each participant confirms the federal learning calculation task, the arbitrator starts to calculate the task according to D R Searching D in self knowledge graph R GD in (a) category And GD common_category When the public data resources of the arbiter find that the public data resources of the arbiter are insufficient to meet the federal learning calculation task, the arbiter actively acquires the externally corresponding public knowledge data, and then performs knowledge fusion and entity connection operation according to the public knowledge data and the public knowledge data, and finally the formed public knowledge data is recorded as COM graph The secondary party will make the COM graph And GD meta Down to the parties.
S23: constructing a knowledge graph;
each party receives COM from the arbitrator graph Then, the privacy knowledge graph belonging to the federal learning calculation task is established by combining the owned data, wherein the privacy knowledge graph comprises COM graph And D private Two parts, COM graph Data record V in (a) public ;D private Data record V in (a) private
S3: constructing commonality data;
and each party performs commonality query operation on the respective knowledge graph, and the operation of commonality query is completed in an unsupervised mode in the whole course.
First for V public And V private Training and using a GAE algorithm model to carry out coding operation on data of each node in the graph, turning each node into a vector form to display, recording Embed, and then calculating the space distance of each node in a vector space by using a similarity algorithm, wherein the similarity algorithm has the following formula:
Sim(V public ,V private )=cos(Embed public ,Embed private )
based on similarity algorithm, creating temporary relation record E for points reaching a certain threshold tmp
Preliminary set V public And V private Combining the two pieces of initial image data to perform knowledge fusion on the initial image data.
In the knowledge fusion process, the ID features of the data are referenced to the public data, and the specific features of the data are referenced to the private data.
When meeting E tmp V of relationship public And V private When the two are identified to be fused, the V is preferentially judged private The existing data features will be V private Copy of data characteristics of (2) to V public In (1) breaking V public And V private Connection E of (2) tmp Simultaneously deleting point V in private data private The V is set public Marked as V pubpri
When not meeting E tmp V of relationship public And V is equal to pubpri When there is entity connection relationship, then the V public Marked as V C Entity, finally each party uses V public Mainly, V private Outputting a respective privacy knowledge graph after the knowledge fusion of each party is completed for private assistance according to the knowledge fusion step, and marking the privacy knowledge graph after the knowledge fusion of each party as G pubpri
Subsequently, G is processed by adopting reference resolution and entity disambiguation technology in the knowledge graph pubpri Carry out further cleaning work according to GD meta For G pubpri The entity attributes in the model are converted into data types, the entity attribute data types of all parties are unified, and finally, a privacy knowledge graph for training of an initial edition is formed and is marked as G pubpri_init
S4: enhancing data;
for G pubpri_init The data of (2) is subjected to data enhancement processing, and the enhancement means of the data enhancement processing comprises two steps: and (5) data quality detection and knowledge data enhancement.
The data quality detection mainly detects null values, noise values, outliers and data connectivity in data.
The federation learning device automatically selects a data quality detection set suitable for the characteristics of the federation learning computing task according to different data types and service types in the federation learning computing task, and then performs marking processing on data with quality problems.
And then repairing the data of the abnormal mark in the mark processing, returning to the knowledge graph generated in the step S2 again for recall query based on the mark by the federal learning device, performing feature repair processing on the data which can be recalled, re-filling the normally available data features, deleting the abnormal mark of the corresponding data, and directly deleting the data which can not be recalled or the data which still has the abnormal mark by the federal learning device.
The finally obtained privacy knowledge graph is marked as G pubpri_res
S5: longitudinal federal learning;
before a specific longitudinal federal learning algorithm is performed, the data of each party needs to be aligned by a sample;
g to each party pubpri_res And carrying out sample alignment, wherein the sample alignment adopts relation alignment, and the samples with the same relation are regarded as one type.
Will G pubpri_res In (a) and (b)Properties and corresponding → ∈>Extracting the relation to form a wide table data, which is marked as W data ,W data The following table shows:
wherein V represents G pubpri_res Inner meets E in S3 tmp An entity of the relationship; x is x 1pri 、x 2pri 、x 3pri ...x npri Then at D for the corresponding entity private Private features of (a); x is x 4pub 、x 5pub 、x 6pub ...x npub Then at V for the entity public Is a public feature; as can be seen from S3, the relationship column is defined by V public Is provided as a criterion for relational alignment.
W of each party data Adopting a relation column as an intersection ID column, combining all the parties to perform safety intersection, and obtaining intersection data of all the parties on a common relation column, wherein the intersection data is marked as W intersection_data Finally based on the corresponding W intersection_data Performing FL A And (5) training an algorithm.
Further, the null value detection in the data quality detection is completed by using simple quantity statistics; detection of noise values and outliers is accomplished in combination with knowledge of the correlation of normal distributions; the detection of data connectivity is accomplished using a graph machine algorithm.
The application has the beneficial effects that:
1. according to the federal learning device and method, each party respectively utilizes own privacy data and a public knowledge graph constructed and issued by a federal learning secondary party to construct respective knowledge graphs locally, and each party can form a privacy data set with a common sample ID or common characteristics by performing functions such as entity alignment, data enhancement and construction exchange ID in the respective knowledge graphs.
2. The federal learning device constructs the data set into usable data for horizontal and vertical federal learning through the construction process of the knowledge patterns of all parties, rather than only federal migration learning, and mainly constructs the data secondarily, so that the federal learning device has universality on a federal learning model, and can convert federal migration learning into horizontal and vertical federal learning, reduce the complexity of the federal learning model and the like, and further reduce the calculation cost of the model.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort.
FIG. 1 is a diagram of the overall architecture of the Federal learning device of the present application;
FIG. 2 is a flow chart of the present application for each party in constructing a privacy calculation subgraph;
FIG. 3 is a general flow chart of the Federal learning method of the present application;
FIG. 4 is a detailed flow chart of the present application for constructing a privacy knowledge graph;
FIG. 5 is a flow chart of the federal learning device routine of the present application;
fig. 6 is a privacy knowledge graph constructed by knowledge fusion in various aspects of the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
A federal learning device for knowledge patterns is shown in figure 1, and comprises participants, a arbitrator, federal learning, knowledge patterns and privacy knowledge patterns.
The participants refer to real demand parties in the federal learning computing task and are used for supporting data and computing power, the number of the participants is multiple, one participant serves as a federal learning computing task initiator to start the computing task, and the rest participants are responsible for assisting in completing the federal learning computing task.
The arbitrator is a coordinator in the federal learning computing task, and on the federal learning computing task, the arbitrator plays a role of the coordinator and is responsible for overall parameter transmission in the federal learning computing task; on the knowledge graph replacement task, an arbitrator bears an intermediate medium of external knowledge graph data and is responsible for acquiring external data and providing the external data to each participant.
The federal learning is used as an entrance for realizing federal learning among the participants, has the functions of agreeing on a safety protocol and providing federal learning modeling and training parameter interaction, and the participants are responsible for training and predicting the model in the federal learning process; and the arbitrator is responsible for parameter interaction in the federal learning process.
The knowledge graph is used as a data carrier for performing federal learning calculation tasks by the participants and the arbitrators, wherein the knowledge graph of the participants is mainly used for performing knowledge alignment with external graph data to generate privacy sub-graph data required by federal learning; the knowledge graph of the arbitrator mainly provides public graph data support for the participators according to task demands, and the knowledge graph of all the computing nodes comprises a graph database and a single machine learning center; for an arbitrator in the computing node, the knowledge graph is provided with a public comprehensive graph data warehouse and an external graph data acquisition module.
The privacy knowledge graph provides training/prediction data needed to be used in federal learning, each participant temporarily generates and maintains the training/prediction data when the privacy calculation task is started, and the training/prediction data can be automatically destroyed after the privacy calculation task is finished.
The federal learning method of the knowledge graph is applied to the federal learning device, and as shown in fig. 2-6, the federal learning method specifically comprises the following steps:
s1: initiating a federal learning computing task;
the initiator uploads the private data to the private node thereof to construct data needed to be used by the federal learning computing task, which is marked as D private The initiator is one of the participants;
the initiator initiates a federal learning computing task, and the assembled data requirement is recorded as D R And specific federal learning algorithm FL is required A Wherein D is R The following three data are included:
metadata information of private data in initiator, noted as GD meta ,GD meta The method is used for analyzing and constructing public knowledge patterns based on the metadata types of the sponsors by the arbitrators, unifying the data types of all the parties in the construction of the privacy knowledge patterns based on the public knowledge patterns, so that the sponsors and all the parties can be unified better in the respective D private Is helpful to the final federal learning model training.
The type information of the data of the privacy data is recorded as GD category ,GD category Initiator D-based construction for arbiter analysis private The public knowledge patterns related to the data types are as close as possible to unify the data types of all parties in the construction of the private knowledge patterns based on the public knowledge patterns, and the knowledge patterns of all parties are aligned, so that the private knowledge patterns of all parties can achieve a certain intersection in a spatial sense, but are not still remained in the following states: the privacy knowledge graph of the node a belongs to the class X, while the privacy knowledge graph of the node B belongs to the class Y, and the knowledge graph still cannot reach the aim of searching for public relation pairs.
Request type information of public data, noted GD common_category ,GD common_category Knowledge data of related types disclosed by the arbitrator or outside are obtained according to the type analysis, so that the knowledge data can meet the requirement of the initiator D to a certain extent private Feature dimension is increased under the condition of insufficient features, features are enhanced, final federal learning model training is facilitated, and an arbitrator builds based on category informationThe public knowledge graph of the (2) is helpful for the privacy knowledge graph of each party to find out a public relation pair on the entity-relation-entity.
Final initiator will D R 、FL A And sending the message to an arbitrator.
S2: constructing a privacy knowledge graph;
the privacy knowledge graph is a temporary data carrier created in the privacy calculation task, training/prediction data are provided for the privacy calculation task, and self business data and public knowledge data are fused, so that data which can be safely exchanged can be obtained under the condition that all parties are in different data forms.
S21: initiating a public data acquisition requirement;
the arbitration transmits an electronic confirmation function of the federal learning calculation task to other participants of the federal learning calculation task, wherein the electronic confirmation function comprises D R And FL (field effect transistor) A After each participant receives the electronic confirmation function, the participant will aim at D in the function R 、FL A Further confirmation, each participant in the confirmation process is according to D R In (a) to upload its private data D preivate Into its privacy node, where the parties may not necessarily satisfy D R The privacy data of the medium condition can also be uploaded, and mainly all the participants are required to be connected with the initiator to acquire what federal learning model is required to be initiated in the federal learning, which type of data is aligned, and the like.
S22: inquiring public data;
after each participant confirms the federal learning calculation task, the arbitrator starts to calculate the task according to D R Searching D in self knowledge graph R GD in (a) category And GD common_category When the public data resources of the arbiter are found to be insufficient to meet the federal learning calculation task, the arbiter actively acquires the externally corresponding public knowledge data, performs knowledge fusion and entity connection operation according to the public knowledge data and the public data, finally forms larger and more comprehensive public knowledge data, and marks the larger and more comprehensive public knowledge data as COM graph The secondary party will make the COM araph 、GD meta To each party (initiator and participant), in the applicationThe "parties" are collectively referred to as the initiator and the participant.
S23: constructing a knowledge graph;
each party receives COM from the arbitrator graph Then, the privacy knowledge graph belonging to the federal learning calculation task is created by combining the owned data, wherein the privacy knowledge graph comprises COM graph And D private Two parts, for COM graph Is owned by all parties, so the data is the same, record V public The method comprises the steps of carrying out a first treatment on the surface of the For D private Belongs to the self-maintenance of all parties, so that the data are different and the V is recorded private
S3: constructing commonality data;
each party performs commonality query operation on the respective knowledge graph;
because the federal learning device belongs to a full-automatic knowledge map conversion device, the error marking of public knowledge data and private data is lack of manpower, and the operation of the application aiming at the commonality query is completed in an unsupervised way in the whole course.
First for V public And V private Training uses GAE (Graph Auto Encoder) algorithm model to encode each node data in the graph, turns each node into vector form to show, records Embed, and does so in order to map public data and private data into the same vector space, then uses similarity algorithm to calculate the space distance of each node in vector space, since the system defaults that there is complete available data connection in both public data and private data, only V is used here public And V private The points between the two are subjected to similarity calculation, the data in the points are not calculated, and the similarity algorithm formula is as follows:
Sim(V public ,V private )=cos(Embed public ,Embed private )
based on similarity algorithm, creating temporary relation record E for points reaching a certain threshold tmp
Through the steps, V can be preliminarily obtained public And V private Is combined into one pieceThe method comprises the steps of carrying out knowledge fusion on preliminary graph data aiming at the preliminary graph data, wherein the knowledge fusion aims at constructing target graph data with obviously repeated points in filtered privacy data and public data.
In the knowledge fusion process, the ID features of the data are based on common data, the specific features of the data are based on private data, and when E is satisfied tmp V of relationship public And V private When it is identified that the two can be fused, the priority determination V private The existing data features will be V private Copy of data characteristics of (2) to V public In (1) breaking V public And V private Connection E of (2) tmp Simultaneously deleting point V in private data private The V is set public Marked as V pubpri
When not meeting E tmp V of relationship public And V is equal to pubpri When there is entity connection relationship, then the V public Marked as V C Entity, finally each party uses V public Mainly, V private After the knowledge fusion of each party is completed according to the knowledge fusion steps for private assistance, a respective private knowledge graph is output, like V in SQL sentences public left join V private The privacy knowledge graph after the knowledge fusion of each party is marked as G pubpri G as shown in FIG. 6 pubpri Is formed to provide two aspects of characteristics for the following federal study:
G pubpri comprises V private And V public Enhancing the physical attribute characteristics of parties D private Data features, i.e. D private Originally only contains the privacy data characteristics of each party, has a little shortage in characteristic dimension and is based on V public Constructed G pubpri Is internally provided with V public Features.
V public COM based on secondary cutting party graph Constructed from, therefore, G of each party pubpri Is a similar knowledge graph, so that V can exist C Entity, i.e. as in party a there is an entity relationship pair: a is that 1 →V C G of B and A pubpri Are all based on V public Constructed so that V is also present in the B side C Entity, existence relation pair B 1 →V C . In the present application, such a.fwdarw.V is used C The relationship, the condition of composing longitudinal federal learning safety intersection ID, namely from V C From this relational view consider A 1 And B 1 Belonging to the same kind of entity.
Subsequently, G is processed by adopting reference resolution and entity disambiguation technology in the knowledge graph pubpri The main purpose of the further cleaning work is to improve the data quality in the model training process, eliminate a large amount of noise data, and according to GD meta For G pubpri The entity attributes in the model are converted into data types, the entity attribute data types of all parties are unified, and finally, a privacy knowledge graph for training of an initial edition is formed and is marked as G pubpri_init
S4: enhancing data;
for G pubpri_init The data of (2) is subjected to data enhancement processing, and the enhancement means of the data enhancement processing comprises two steps: and (5) data quality detection and knowledge data enhancement.
Aiming at data quality detection, the application mainly detects four directions of null value, noise value, abnormal value and data connectivity in data.
Null detection in data quality detection can be accomplished using simple quantity statistics; the detection of noise values and abnormal values is completed by combining statistical related knowledge such as normal distribution; the graph machine algorithm is used for data connectivity.
The federation learning device can automatically select a data quality detection set suitable for the characteristics of the federation learning computing task according to different data types and service types in the federation learning computing task, and then performs marking processing on data with quality problems.
And then, repairing the data marked abnormally in the marking process, returning the data to the knowledge graph generated in the step S2 again to perform recall query based on the marking, performing feature repairing process on the data which can be recalled, refilling the data features which are normally available, deleting the abnormal marking of the corresponding data, and directly deleting the data which cannot be recalled or the data which still has the abnormal marking by the federal learning device.
The finally obtained privacy knowledge graph is marked as G pubpri_res
S5: longitudinal federal learning;
before a specific longitudinal federal learning algorithm is performed, the data of each party needs to be aligned by a sample, and each party in the steps finally forms G pubpri_res Which is a knowledge graph.
G to each party pubpri_res Sample alignment was performed due to G pubpri_res There is no commonality in the entity ID space, so in the application, the sample alignment adopts the relation alignment, and the samples with the same relation are regarded as one type, thereby achieving the same effect as the sample ID alignment.
Will G pubpri_res In (a) and (b)Properties and corresponding → ∈>Extracting the relation to form a wide table data, which is marked as W data ,W data The following table shows:
wherein V represents G pubpri_res Inner meets E in S3 tmp An entity of the relationship; x is x 1pri 、x 2pri 、x 3pri ...x npri Then at D for the entity private Private features of (a); x is x 4pub 、x 5pub 、x 6pub ...x npub Then is at V for the corresponding entity public Is a public feature; as can be seen from S3, the relationship column is defined by V public Is provided as a criterion for relational alignment.
W of each party data Adopting a relation column as an intersection ID column, combining all the parties to perform safety intersection, wherein each party can obtain intersection data on a common relation column, and the intersection data is marked as W intersection_data Finally based on the W intersection_data Performing FL A And (5) training an algorithm.
In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing has shown and described the basic principles, principal features and advantages of the application. It will be understood by those skilled in the art that the present application is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present application, and various changes and modifications may be made without departing from the spirit and scope of the application, which is defined in the appended claims.

Claims (9)

1. The federal learning device for the knowledge graph is characterized by comprising a participant, a arbitrator, federal learning, the knowledge graph and a privacy knowledge graph;
the participants refer to real demand parties in the federal learning calculation task and are used for supporting data and calculation force, and the number of the participants is multiple;
the arbitration party refers to a coordinator in the federal learning computing task;
the federal learning is used as an entrance for realizing federal learning among the participants, and has the functions of agreeing on a safety protocol and providing federal learning modeling and training parameter interaction;
the knowledge graph is used as a data carrier for the participants and the arbitrators to perform federal learning calculation tasks;
the privacy knowledge graph provides training/prediction data which needs to be used in federal learning, each participant temporarily generates and maintains the training/prediction data when the privacy calculation task is started, and the training/prediction data can be automatically destroyed after the privacy calculation task is finished.
2. A federal learning apparatus for a knowledge-graph in accordance with claim 1, wherein one of said participants initiates a federal learning computing task as a federal learning computing task initiator, and the remaining participants are responsible for assisting in completion of the federal learning computing task.
3. The federal learning device for knowledge graph according to claim 2, wherein, on the federal learning computing task, the arbitrator takes on the role of coordinator and takes charge of overall parameter transmission in the federal learning computing task; on the knowledge graph replacement task, an arbitrator bears an intermediate medium of external knowledge graph data and is responsible for acquiring external data and providing the external data to each participant.
4. A federal learning device for knowledge-graph according to claim 3, wherein the participants are responsible for model training and prediction functions in the federal learning process; the arbitrator is responsible for parameter interactions in the federal learning process.
5. The federal learning device of a knowledge graph according to claim 4, wherein the knowledge graph of the participant is mainly used for performing knowledge alignment with external graph data to generate privacy sub-graph data required for federal learning; and the knowledge graph of the arbitrator mainly provides public graph data support for the participants according to task requirements.
6. The federal learning apparatus for knowledge-graph according to claim 5, wherein the knowledge-graph of all computing nodes comprises a graph database and a stand-alone graph machine learning center.
7. The federal learning apparatus for knowledge-graph according to claim 6, wherein the knowledge-graph has one more public integrated graph data warehouse and external graph data acquisition module for the secondary party in the computing node.
8. The federal learning method of the knowledge graph, which is applied to the federal learning device of claim 7, is characterized in that the federal learning method specifically comprises the following steps:
s1: initiating a federal learning computing task;
the initiator uploads the private data to the private node thereof to construct data needed to be used by the federal learning computing task, which is marked as D private The initiator is one of the participants;
the initiator initiates a federal learning computing task, and the assembled data requirement is recorded as D R And specific federal learning algorithm FL is required A Wherein D is R The following three data are included:
metadata information of private data in initiator, noted as GD meta ,GD meta The method comprises the steps of analyzing and constructing a public knowledge graph based on the metadata types of an initiator by an arbitrator, and unifying the data types of all parties in the construction of a self privacy knowledge graph based on the public knowledge graph;
data type information of private data, which is denoted as GD category ,GD category Initiator D-based construction for arbiter analysis private Public knowledge maps related to data types;
request type information of public data, noted GD common_category ,GD common_category Knowledge data of related types disclosed by the secondary party or the external party is obtained according to the type analysis;
final initiator will D R 、FL A Sending to an arbitrator;
s2: constructing a privacy knowledge graph;
the privacy knowledge graph is a temporary data carrier created in the privacy calculation task, training/predicting data are provided for the privacy calculation task, and self business data and public knowledge data are fused;
s21: initiating a public data acquisition requirement;
the arbitration transmits an electronic confirmation function of the federal learning calculation task to other participants of the federal learning calculation task, wherein the electronic confirmation function comprises D R And FL (field effect transistor) A After each participant receives the electronic confirmation function, D in the corresponding function R 、FL A Further confirmation, each participant in the confirmation process is according to D R In (a) to upload its private data D private Into its privacy node;
s22: inquiring public data;
after each participant confirms the federal learning calculation task, the arbitrator starts to calculate the task according to D R Searching D in self knowledge graph R GD in (a) category And GD common_category When the public data resources of the arbiter find that the public data resources of the arbiter are insufficient to meet the federal learning calculation task, the arbiter actively acquires the externally corresponding public knowledge data, and then performs knowledge fusion and entity connection operation according to the public knowledge data and the public knowledge data, and finally the formed public knowledge data is recorded as COM graph The secondary party will make the COM graph And GD meta Issuing to each party;
s23: constructing a knowledge graph;
each party receives COM from the arbitrator graph Then, the privacy knowledge graph belonging to the federal learning calculation task is established by combining the owned data, wherein the privacy knowledge graph comprises COM graph And D private Two parts, COM graph Data record V in (a) public ;D private Data record V in (a) private
S3: constructing commonality data;
all the parties perform commonality query operation on the respective knowledge maps, and the operation of commonality query is completed in an unsupervised mode in the whole course;
first for V public And V private Training theAnd (3) carrying out coding operation on data of each node in the graph by using a GAE algorithm model, turning each node into a vector form for display, recording Embed, and then calculating the spatial distance of each node in a vector space by using a similarity algorithm, wherein the similarity algorithm has the formula as follows:
Sim(V public ,V private )=cos(Embed public ,Embed private )
based on similarity algorithm, creating temporary relation record E for points reaching a certain threshold tmp
Preliminary set V public And V private Combining the two pieces of initial graph data to perform knowledge fusion on the initial graph data;
in the knowledge fusion process, the ID features of the data are used as references of public data, and the specific features of the data are used as references of private data;
when meeting E tmp V of relationship public And V private When the two are identified to be fused, the V is preferentially judged private The existing data features will be V private Copy of data characteristics of (2) to V public In (1) breaking V public And V private Connection E of (2) tmp Simultaneously deleting point V in private data private The V is set public Marked as V pubpri
When not meeting E tmp V of relationship public And V is equal to pubpri When there is entity connection relationship, then the V public Marked as V C Entity, finally each party uses V public Mainly, V private Outputting a respective privacy knowledge graph after the knowledge fusion of each party is completed for private assistance according to the knowledge fusion step, and marking the privacy knowledge graph after the knowledge fusion of each party as G pubpri
Subsequently, G is processed by adopting reference resolution and entity disambiguation technology in the knowledge graph pubpri Carry out further cleaning work according to GD meta For G pubpri The entity attributes in the model are converted into data types, the entity attribute data types of all parties are unified, and finally, a privacy knowledge graph for training of an initial edition is formedDenoted as G pubpri_init
S4: enhancing data;
for G pubpri_init The data of (2) is subjected to data enhancement processing, and the enhancement means of the data enhancement processing comprises two steps: data quality detection and knowledge data enhancement;
the data quality detection mainly detects null value, noise value, abnormal value and data connectivity in the data;
the federation learning device automatically selects a data quality detection set suitable for the characteristics of the federation learning computing task according to different data types and service types in the federation learning computing task, and then performs marking processing on data with quality problems;
then, repairing the data of the abnormal mark in the mark processing, returning to the knowledge graph generated in the step S2 again to perform recall query based on the mark, performing feature repairing processing on the data which can be recalled, refilling the normal available data features, deleting the abnormal mark of the corresponding data, and directly deleting the data which can not be recalled or the data which still has the abnormal mark by the federal learning device;
the finally obtained privacy knowledge graph is marked as G pubpri_res
S5: longitudinal federal learning;
before a specific longitudinal federal learning algorithm is performed, the data of each party needs to be aligned by a sample;
g to each party pubpri_res Performing sample alignment, wherein the sample alignment adopts relation alignment, and samples with the same relation are regarded as one type;
will G pubpri_res In (a) and (b)Properties and corresponding → ∈>Extracting the relation to form a wide table data, which is marked as W data ,W data The following table shows:
wherein V represents G pubpri_res Inner meets E in S3 tmp An entity of the relationship; x is x 1pri 、X 2pri 、X 3pri …x npri Then at D for the corresponding entity private Private features of (a); x is x 4pub 、x 5pub 、x 6pub …x npub Then at V for the entity public Is a public feature; as can be seen from S3, the relationship column is defined by V public Providing as a criterion for relationship alignment;
w of each party data Adopting a relation column as an intersection ID column, combining all the parties to perform safety intersection, and obtaining intersection data of all the parties on a common relation column, wherein the intersection data is marked as W intersection_data Finally based on the corresponding W intersection_data Performing FL A And (5) training an algorithm.
9. The federal learning method of a knowledge-graph according to claim 8, wherein the null value detection in the data quality detection is accomplished using simple quantity statistics; detection of noise values and outliers is accomplished in combination with knowledge of the correlation of normal distributions; the detection of data connectivity is accomplished using a graph machine algorithm.
CN202310674594.3A 2023-06-07 2023-06-07 Knowledge graph federal learning device and method Active CN116757275B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310674594.3A CN116757275B (en) 2023-06-07 2023-06-07 Knowledge graph federal learning device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310674594.3A CN116757275B (en) 2023-06-07 2023-06-07 Knowledge graph federal learning device and method

Publications (2)

Publication Number Publication Date
CN116757275A true CN116757275A (en) 2023-09-15
CN116757275B CN116757275B (en) 2024-06-11

Family

ID=87952586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310674594.3A Active CN116757275B (en) 2023-06-07 2023-06-07 Knowledge graph federal learning device and method

Country Status (1)

Country Link
CN (1) CN116757275B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767411A (en) * 2020-07-01 2020-10-13 深圳前海微众银行股份有限公司 Knowledge graph representation learning optimization method and device and readable storage medium
CN111858955A (en) * 2020-07-01 2020-10-30 石家庄铁路职业技术学院 Knowledge graph representation learning enhancement method and device based on encrypted federated learning
CN113434626A (en) * 2021-08-27 2021-09-24 之江实验室 Multi-center medical diagnosis knowledge map representation learning method and system
CN113886598A (en) * 2021-09-27 2022-01-04 浙江大学 Knowledge graph representation method based on federal learning
CN114757361A (en) * 2022-03-25 2022-07-15 中国铁道科学研究院集团有限公司 Multi-mode intermodal transportation data sharing method and system based on federal learning
CN114936372A (en) * 2022-04-06 2022-08-23 湘潭大学 Model protection method based on three-party homomorphic encryption longitudinal federal learning
CN115600689A (en) * 2022-09-20 2023-01-13 天翼电子商务有限公司(Cn) Single-party real-time prediction algorithm based on federal learning
WO2023005133A1 (en) * 2021-07-28 2023-02-02 深圳前海微众银行股份有限公司 Federated learning modeling optimization method and device, and readable storage medium and program product
CN115687640A (en) * 2022-10-18 2023-02-03 宁波大学 Multi-task knowledge graph completion method based on federal learning
CN115907043A (en) * 2022-11-18 2023-04-04 上海淇毓信息科技有限公司 Multi-party multi-model privacy transaction-based federal learning content push method and device
CN116011014A (en) * 2023-01-10 2023-04-25 北京八分量信息科技有限公司 Privacy computing method and privacy computing system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767411A (en) * 2020-07-01 2020-10-13 深圳前海微众银行股份有限公司 Knowledge graph representation learning optimization method and device and readable storage medium
CN111858955A (en) * 2020-07-01 2020-10-30 石家庄铁路职业技术学院 Knowledge graph representation learning enhancement method and device based on encrypted federated learning
WO2023005133A1 (en) * 2021-07-28 2023-02-02 深圳前海微众银行股份有限公司 Federated learning modeling optimization method and device, and readable storage medium and program product
CN113434626A (en) * 2021-08-27 2021-09-24 之江实验室 Multi-center medical diagnosis knowledge map representation learning method and system
WO2023025255A1 (en) * 2021-08-27 2023-03-02 之江实验室 Multi-center medical diagnosis knowledge graph representation learning method and system
CN113886598A (en) * 2021-09-27 2022-01-04 浙江大学 Knowledge graph representation method based on federal learning
CN114757361A (en) * 2022-03-25 2022-07-15 中国铁道科学研究院集团有限公司 Multi-mode intermodal transportation data sharing method and system based on federal learning
CN114936372A (en) * 2022-04-06 2022-08-23 湘潭大学 Model protection method based on three-party homomorphic encryption longitudinal federal learning
CN115600689A (en) * 2022-09-20 2023-01-13 天翼电子商务有限公司(Cn) Single-party real-time prediction algorithm based on federal learning
CN115687640A (en) * 2022-10-18 2023-02-03 宁波大学 Multi-task knowledge graph completion method based on federal learning
CN115907043A (en) * 2022-11-18 2023-04-04 上海淇毓信息科技有限公司 Multi-party multi-model privacy transaction-based federal learning content push method and device
CN116011014A (en) * 2023-01-10 2023-04-25 北京八分量信息科技有限公司 Privacy computing method and privacy computing system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YONG LI ET AL.: "Privacy-Preserving Federated Learning Framework Based on Chained Secure Multiparty Computing", 《IEEE INTERNET OF THINGS JOURNAL》, 8 September 2020 (2020-09-08) *
何雯;白翰茹;李超;: "基于联邦学习的企业数据共享探讨", 信息与电脑(理论版), no. 08, 25 April 2020 (2020-04-25) *
陈名杨: "群体知识图谱:分布式知识迁移与联邦式图谱推理", 《智能科学与技术学报》, vol. 4, no. 1, 31 March 2022 (2022-03-31) *

Also Published As

Publication number Publication date
CN116757275B (en) 2024-06-11

Similar Documents

Publication Publication Date Title
CN111461874A (en) Credit risk control system and method based on federal mode
CN110929039B (en) Data processing method, device, equipment and storage medium
CN104346366A (en) Test data expansion method and device
WO2020042597A1 (en) Cross-modal retrieval method and system
CN113486190B (en) Multi-mode knowledge representation method integrating entity image information and entity category information
WO2021232609A1 (en) Semantic segmentation method and system for rgb-d image, medium and electronic device
CN111737364B (en) Safe multi-party data fusion and federal sharing method, device, equipment and medium
Su et al. Uncertainty guided multi-view stereo network for depth estimation
CN113240683A (en) Attention mechanism-based lightweight semantic segmentation model construction method
CN111353988A (en) KNN dynamic self-adaptive double-image convolution image segmentation method and system
CN117095163A (en) Small sample image semantic segmentation method and device based on meta alignment and meta mask
Bai et al. Hda2l: Hierarchical domain-augmented adaptive learning for sketch-based 3d shape retrieval
CN116757275B (en) Knowledge graph federal learning device and method
CN114155171A (en) Image restoration method and system based on intensive multi-scale fusion
Xie et al. pmbqa: Projection-based blind point cloud quality assessment via multimodal learning
CN104462558B (en) The method and device of word in a kind of modification Lucene index files
CN110188913A (en) A kind of finishing Order splitting algorithm
CN113704853A (en) Multisource road network data automatic fusion method based on road element topological feature classification
Li et al. Two-stream adaptive-attentional subgraph convolution networks for skeleton-based action recognition
Nasiritousi et al. Legitimacy under institutional complexity: Mapping stakeholder perceptions of legitimate institutions and their sources of legitimacy in global renewable energy governance
CN117521209A (en) Integration and display method, system and storage medium of municipal design data
CN106780432B (en) A kind of objective evaluation method for quality of stereo images based on sparse features similarity
Berrou et al. Smart city development strategy profile: Use case modeling based on simplicial complexes
Jin et al. Jointly texture enhanced and stereo captured network for stereo image super-resolution
Cheng et al. Two-stage image dehazing with depth information and cross-scale non-local attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant