CN111914038B - Federal computing method, apparatus, device, and storage medium - Google Patents

Federal computing method, apparatus, device, and storage medium Download PDF

Info

Publication number
CN111914038B
CN111914038B CN202010997997.8A CN202010997997A CN111914038B CN 111914038 B CN111914038 B CN 111914038B CN 202010997997 A CN202010997997 A CN 202010997997A CN 111914038 B CN111914038 B CN 111914038B
Authority
CN
China
Prior art keywords
computing
data
node
task
federation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010997997.8A
Other languages
Chinese (zh)
Other versions
CN111914038A (en
Inventor
吕亮亮
冯智
宋传园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010997997.8A priority Critical patent/CN111914038B/en
Publication of CN111914038A publication Critical patent/CN111914038A/en
Priority to US17/189,385 priority patent/US20220091891A1/en
Priority to KR1020210028427A priority patent/KR20220039526A/en
Priority to EP21160993.8A priority patent/EP3971728A1/en
Priority to JP2021057612A priority patent/JP2021103588A/en
Application granted granted Critical
Publication of CN111914038B publication Critical patent/CN111914038B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/256Integrating or interfacing systems involving database management systems in federated or virtual databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2101Auditing as a secondary aspect

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure provides a federation computing method, relates to the field of data processing, and in particular relates to the field of federation computing of big data. The federal computing method includes: obtaining a plurality of metadata from a plurality of computing nodes and distributing the plurality of metadata, the metadata representing characteristics of data in a data warehouse of the computing nodes; determining at least two computing nodes from a plurality of computing nodes that agree to perform a federated computing task, wherein the at least two computing nodes agree to perform the federated computing task based on a plurality of metadata; receiving a federation computing task submitted by at least one of the at least two computing nodes, and splitting the federation computing task into a plurality of subtasks; and distributing the plurality of subtasks to the at least two computing nodes for executing the plurality of subtasks between the at least two computing nodes. The present disclosure also provides a federal computing device, apparatus, and storage medium.

Description

Federal computing method, apparatus, device, and storage medium
Technical Field
The present disclosure relates to the field of data processing, and in particular, to the field of federal computing for big data, and more particularly, to a federal computing method, apparatus, device, and storage medium.
Background
Analysis technology based on mass data is increasingly restricted by unsmooth data circulation. In practice, mass data is generally owned by multiple entity parties, and in order to obtain more accurate data analysis results, federal computation needs to be performed on the basis of data warehouses of the multiple entity parties. The federal computation is based on data security and privacy protection technology, and the computation is implemented on a plurality of separated entity data warehouses, so that the data sharing can be realized, and the privacy security of the data can be ensured. However, a significant amount of prior preparation is often required before federal calculations are performed. For example, it is necessary to learn in advance about the data in the data warehouse of other entity side by means of investigation, negotiation, etc. to determine whether the data in these data warehouses is the required data. This requires significant effort, material and time, which affects the overall performance of the federal computer.
Disclosure of Invention
In view of this, the present disclosure provides a federal computing method, apparatus, device, and storage medium.
A first aspect of the present disclosure provides a federal computing method comprising:
obtaining a plurality of metadata from a plurality of computing nodes and publishing the plurality of metadata, the metadata representing characteristics of data in a data warehouse of the computing nodes;
Determining at least two computing nodes from the plurality of computing nodes that agree to perform a federated computing task, wherein the at least two computing nodes agree to perform the federated computing task based on the plurality of metadata;
receiving a federation computing task submitted by at least one of the at least two computing nodes and splitting the federation computing task into a plurality of subtasks; and
the plurality of subtasks are distributed to the at least two computing nodes for executing the plurality of subtasks between the at least two computing nodes.
A second aspect of the present disclosure provides a federal computing method comprising:
obtaining, from a coordinating node, a plurality of metadata uploaded to the coordinating node by a plurality of computing nodes, wherein the metadata of each computing node represents characteristics of data in a data warehouse of the computing node;
determining a computing node to cooperatively execute a first federal computing task from the plurality of computing nodes as a data provider node according to the plurality of metadata;
submitting the first federation computing task to the coordinating node for the coordinating node to split the first federation computing task into a plurality of first sub-tasks; and
At least one first sub-task of the plurality of first sub-tasks is received from the coordinating node and is executed in conjunction with the data provider node.
A third aspect of the present disclosure provides a federal computing device comprising:
a metadata management module configured to obtain a plurality of metadata from a plurality of computing nodes and to publish the plurality of metadata, the metadata representing characteristics of data in a data warehouse of the computing nodes;
a node determination module configured to determine at least two computing nodes of the plurality of computing nodes that agree to perform a federated computing task, wherein the at least two computing nodes agree to perform the federated computing task based on the plurality of metadata;
a task processing module configured to receive a federated computing task submitted by at least one of the at least two computing nodes and split the federated computing task into a plurality of subtasks; and
and a task distribution module configured to distribute the plurality of subtasks to the at least two computing nodes for executing the plurality of subtasks between the at least two computing nodes.
A fourth aspect of the present disclosure provides a federal computing device comprising:
a memory storing program instructions; and
a processor configured to execute the program instructions to perform the federal computing method provided in the first aspect of the present disclosure.
A fifth aspect of the present disclosure provides a federal computing device comprising:
a metadata query module configured to obtain, from the coordinating node, a plurality of metadata uploaded to the coordinating node by a plurality of computing nodes, wherein the metadata of each computing node represents a characteristic of data in a data warehouse of that computing node;
a first node determination module configured to determine, from the plurality of computing nodes, a computing node to cooperatively perform a first federated computing task as a data provider node from the plurality of computing nodes;
a task submitting module configured to submit the first federated computing task to the coordinating node for the coordinating node to split the first federated computing task into a plurality of first subtasks; and
and a first task execution module configured to receive at least one first sub-task of the plurality of first sub-tasks from the coordination node and execute the at least one first sub-task in cooperation with the data provider node.
A sixth aspect of the present disclosure provides a federal computing device comprising:
a memory storing program instructions; and
a processor configured to execute the program instructions to perform the federal computing method provided in the second aspect of the present disclosure.
A seventh aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions that, when executed, are configured to implement the federal computing method described above.
An eighth aspect of the present disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements the above method.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments thereof with reference to the accompanying drawings in which:
FIG. 1 schematically illustrates an implementation environment of a federal computing method according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a federal computing method according to an embodiment of the present disclosure;
FIGS. 3A and 3B schematically illustrate a flow chart of a federal computing method according to another embodiment of the present disclosure;
FIG. 4 schematically illustrates an overall architecture diagram of a federal computing method according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates an interaction process of a federal computing method according to an embodiment of the present disclosure;
6A-6F schematically illustrate examples of interfaces for implementing solutions to federal computing methods according to embodiments of the present disclosure;
FIGS. 7 and 8 schematically illustrate block diagrams of federated computing devices in accordance with embodiments of the present disclosure;
fig. 9 schematically illustrates a block diagram of a federated computing device adapted to perform federated calculations, in accordance with an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
FIG. 1 schematically illustrates an implementation environment of a federal computing method according to an embodiment of the present disclosure. As shown in fig. 1, the a-party, the B-party and the C-party are three entity parties having massive privacy data, respectively. For example, party a is a financial institution that has a large amount of private financial data, party B is an internet institution that has a large amount of personal private data, and party C is some medical structure that has a large amount of personal private data. The data of the three parties are stored in local data warehouses A1, B1, and C1, respectively. To further improve the performance of the respective models, parties a and C want to conduct federal learning based on data security and privacy protection with party B. The embodiment of the disclosure provides a method for coordinating federal learning processes among A party, B party and C party.
In the following embodiments, parties a, B, and C who own the data warehouse and wish to secure and privacy the data are represented by computing nodes. Here, the computing node may be a data consumer node or a data provider node. Data consumer nodes such as the a-party and C-party described above, and data provider nodes such as the B-party described above. In practice, the a-party and the C-party may also provide data to the outside, and the B-party may also acquire data from the outside. Thus, unless specifically stated otherwise, the computing node itself may be either a data consumer node or a data provider node. Further, in order to coordinate the federal learning process among the a-party, the B-party, and the C-party, a coordination node is also provided. Unlike computing nodes such as a-side, B-side, and C-side, the coordinating node is neither a data provider that provides data to the outside nor a data consumer that obtains data from the outside for computation. The coordination nodes are only used for providing overall scheduling for the federation computing process among the computing nodes so as to optimize the federation computing process. The federal computing method according to embodiments of the present disclosure can be applied to computing nodes and coordinating nodes to implement an overall solution for federal computing.
Fig. 2 schematically illustrates a flow chart of a federal computing method 200, according to an embodiment of the present disclosure, that may be applied to a coordinator node. As shown in fig. 2, the federal calculation method 200 includes the steps of:
in step S210, a plurality of metadata is acquired from a plurality of computing nodes, and a plurality of metadata is distributed, the metadata representing characteristics of data in a data warehouse of the computing nodes.
In step S220, at least two computing nodes that agree to perform a federated computing task are determined among the plurality of computing nodes, wherein the at least two computing nodes agree to perform the federated computing task based on the plurality of metadata.
In step S230, a federated computing task submitted by at least one of the at least two computing nodes is received and split into a plurality of subtasks.
In step S240, the plurality of subtasks are distributed to at least two computing nodes for executing the plurality of subtasks between the at least two computing nodes.
According to an embodiment, metadata is utilized to characterize data in a data warehouse at each compute node. According to an embodiment, metadata may include, but is not limited to, a name of a data warehouse, a name of a data table stored in the data warehouse, a field name of the data table, a field type, a number of rows, and the like. Thus, a computing node (i.e., a data consumer node) that wants to obtain usage rights for data in the data warehouse of other computing nodes (i.e., data provider nodes) can learn information of the data in the data warehouse of the other computing nodes by querying the metadata and further confirm whether the data is the desired data. According to an embodiment, a plurality of metadata may be published, and these published metadata may constitute a data mart, so as to implement centralized presentation of metadata. Thus, according to embodiments of the present disclosure, complex and time-consuming prior investigation before performing federal calculations may be omitted. All the computing nodes added into the presentation display metadata representing the data information in the local data warehouse according to the specified requirements, so that the preparation work of the early stage of federation computing is remarkably simplified.
After the data consumer node finds itself wanted data through metadata, it can negotiate with the data provider node that owns the data to strive for the right to use the data.
According to an embodiment, determining at least two computing nodes among the plurality of computing nodes that agree to perform the federated computing task may further comprise: constructing a data approval channel between at least two computing nodes in the plurality of computing nodes for the at least two computing nodes to negotiate data use based on the plurality of metadata via the data approval channel; constructing a task approval channel between at least two computing nodes which reach data use negotiation, wherein the task approval channel is used for approval of federal computing tasks by the at least two computing nodes which reach data use negotiation through the task approval channel; and taking at least two computing nodes which reach federation computing task approval as at least two computing nodes which agree to execute the federation computing task. In this embodiment, after the data consumer node finds its own desired data through metadata, the data consumer node first negotiates data usage rights with the data provider node that owns the data using the data approval channel, and only after both parties agree on the data usage rights, the data consumer node can negotiate the willingness of federal modeling with the data provider node using the task approval channel. Here, the data approval channel and the task approval channel are communication channels provided by the coordinating node for the interaction process between the computing nodes. It should be noted that in negotiating federal modeling intent, the specific configuration and parameters of the model are not considered, but only the problems associated with federal modeling and the use of data in the data warehouse are considered.
According to an embodiment, after the modeling willingness proposed by the data consumer node gets an approval of the data provider node, the data consumer node may configure the model and adjust the parameters of the model, and then submit the modeling task based on the configured and adjusted model to the coordinator node. The coordination node may further split the received modeling task into a plurality of subtasks and distribute the plurality of subtasks to the data consumer node and the data provider node, respectively, for joint execution. In this process, the data consumer node may configure and adjust the model as if modeling was performed using only the local data warehouse, without having to consider more interactive details of federal modeling based on the multi-party data warehouse, which may be processed by the coordinator node after submitting the modeling task to the coordinator node. For example, the coordinating node may further sub-divide the modeling task submitted by the computing node into a plurality of sub-tasks in greater detail.
Fig. 3A and 3B schematically illustrate a flow chart of a federal computing method 300, according to another embodiment of the present disclosure, that may be applied to a computing node. As shown in fig. 3A, the federal calculation method 300 includes the steps of:
In step S310, a plurality of metadata uploaded to the coordinating node by a plurality of computing nodes is acquired from the coordinating node, wherein the metadata of each computing node represents characteristics of data in a data warehouse of the computing node.
In step S320, a computing node from a plurality of computing nodes that is to cooperatively perform a first federated computing task is determined as a data provider node from a plurality of metadata.
In step S330, the first federated computing task is submitted to the coordinating node for the coordinating node to split the first federated computing task into a plurality of first sub-tasks.
In step S340, at least one first sub-task of the plurality of first sub-tasks is received from the coordinator node and is performed in cooperation with the data provider node.
According to the embodiment, the computing node which wants to perform federal computation, namely the data consumer node, can acquire a plurality of metadata through the coordination node, and further learn the condition of the data in the data warehouse owned by other computing nodes through the plurality of metadata. As previously described, metadata may include, but is not limited to, the name of the data warehouse, the name of the data tables stored in the data warehouse, the field names of the data tables, the field types and number of rows, and the like. For example, a computing node may roughly learn the content of data in a data warehouse of other computing nodes through the name of a data table and the field name of the data table, roughly learn the type and scale of data through the field type and the number of rows of the data table, and so on.
Next, after the data consumer node finds the data required for the federated calculation, the computing node that provided the data may be determined to be the data provider node that is to cooperatively perform the first federated calculation task. Here, the first federation computing task refers to a federation computing task initiated and submitted by a current computing node as a data consumer node, and the first subtask refers to a subtask obtained by splitting the first federation computing task. According to an embodiment, determining, from the plurality of computing nodes, a computing node to cooperatively perform a first federated computing task as a data provider node from the plurality of metadata may further comprise: determining a computing node in which data required for executing a first federation computing task is located from a plurality of computing nodes according to a plurality of metadata; negotiating data use with a computing node where data required for executing the first federal computing task is located via a data approval channel constructed by the coordinating node; the computing node which reaches the negotiation of data use carries out the approval of the federal computing task through a task approval channel constructed by the coordination node; and taking the computing node which achieves federation computing task approval as a data provider node. In this embodiment, the data consumer node first determines the computing node where the data required for federation computation is located, then negotiates the right to use the data with the computing node using the data approval channel, and only after the two parties agree on the right to use the data, the data consumer node can negotiate the willingness of federation modeling with the computing node using the task approval channel. As in the previous embodiments, the data consumer node does not consider the specific configuration and parameters of the model, but only the problems associated with federal modeling and the use of data in the data warehouse, when negotiating further federal modeling intent with the computing node that achieves data use negotiations via the task approval channel.
Next, after achieving the willingness of federal modeling and determining the data provider node, the data consumer node may further configure the model, e.g., may adjust parameters of the model, etc. According to an embodiment, the data consumer node builds a federal computing model based on data in the data warehouse local to the data consumer node and the determined data in the data warehouse of the data provider node, and the data consumer node may consider the model as a whole during the modeling. For example, according to an embodiment, a virtual model may be built on the basis of data in a local data warehouse and data in a data warehouse of a data provider node such that the effect of the virtual model is consistent with an optimal model built by physically aggregating data in a data warehouse of a data consumer node and a data provider node, so that more execution details may not be considered in model training. After the federation calculation model is configured, the data consumer node can submit the federation calculation task (the first federation calculation task) to the coordination node, and the coordination node processes the federation calculation task. According to an embodiment, the coordination node splits the federation computing task into a plurality of sub-tasks (first sub-tasks), it is easy to understand that the data consumer node and the data provider node are to receive at least one sub-task from the coordination node, respectively, and cooperatively execute the received sub-tasks so as to cooperatively execute the federation computing task. The subtasks refine the execution details of the federated computing task between the data consumer node and the data provider node, which is in fact the operation that the data consumer node and the data provider node need to perform together before being completed by the coordinator node, further simplifying the federated computing task performed at the data consumer node and the data provider node.
In the federal computing method according to the embodiment of the present disclosure, metadata is used to represent characteristics of data in a local data warehouse of each computing node, so as to provide the data in the data warehouse at other computing nodes to each computing node, thereby solving the problem that a lot of manpower and events are required to be consumed for investigation and negotiation before building a federal computing scene, and reducing modeling cost. In addition, by providing a data approval channel and a task approval channel between computing nodes before modeling and further subdividing the modeling task after submitting the modeling, complex interactive operation between computing nodes in the federal modeling process is simplified, the difficulty of federal computing modeling is reduced, and the modeling cost is further reduced.
The federation calculation method 300 in the embodiment of fig. 3A is applied to a data consumer node, and for a data provider node that has no data usage requirements and only provides data for use by other computing nodes, the federation calculation method 300 is as shown in fig. 3B, and includes the steps of:
in step S350, local metadata is uploaded to the coordinator node, the local metadata representing characteristics of the data in the local data repository.
In step S360, at least one of the plurality of computing nodes is determined to be a data consumer node to cooperatively perform a second linkage computing task.
In step S370, at least one second sub-task of the plurality of second sub-tasks split by the second linkage computing task is received from the coordinating node.
In step S380, at least one second subtask is performed in conjunction with the data consumer node.
As a data provider node, after connecting (e.g., registering) to the coordinator node, metadata representing data in the local data repository may first be uploaded to the coordinator node. The metadata may be exposed to other computing nodes through publications. When there are other computing nodes, i.e., data consumer nodes, that want to use data in a data warehouse local to the data provider node, the data provider node may be interacted with by coordinating the data approval channel and the task approval channel provided by the nodes. According to an embodiment, the data provider node determines at least one of the plurality of computing nodes as a data consumer node to cooperatively perform a second linkage computing task through interactions with the data consumer node. Here, the second federation computing task refers to a federation computing task received by the current computing node as a data provider node, and the second subtask refers to a subtask obtained by splitting the second federation computing task.
According to an embodiment, determining at least one of the plurality of computing nodes as a data consumer node to cooperatively perform a second joint computing task may further comprise: negotiating data use with at least one of the plurality of computing nodes via a data approval channel constructed by the coordinating node; the computing node which reaches the negotiation of data use carries out the approval of the federal computing task through a task approval channel constructed by the coordination node; and taking the computing node which achieves federation computing task approval as a data user node which is to cooperatively execute the second federation computing task. During the negotiation process, the data provider node may determine whether to approve the data approval and the task approval according to the negotiation content provided by the data consumer node. If the data provider node does not agree with the data use request sent by the data consumer node, the data consumer node cannot proceed with negotiation of task approval. If the data provider node agrees with the data usage request of the data consumer node, but does not agree with the modeling task request sent by the data consumer node, the data consumer node cannot continue to perform the modeling task. According to an embodiment, if the data provider node respectively passes the data usage request and the modeling task request of the data consumer node, that is, the data approval negotiation and the task approval negotiation are achieved between the data provider node and the data consumer node, the data provider node may receive at least one second sub-task of the plurality of second sub-tasks split by the second federation computing task from the coordination node, and perform the at least one second sub-task in cooperation with the data consumer node so as to cooperatively perform the federation computing task.
In practice, most computing nodes may act as both data consumer and data provider nodes, and therefore, in general, the federal computing method 300 including all of steps S310 through S380 may be applied to computing nodes. In addition, it should be understood that the sequence numbers of the operation steps in the above embodiments do not represent the order in which they are actually performed. In actual execution, steps S310 to S340 and steps S350 to S380 may be executed in parallel or alternately. For example, whether or not the computing node wants to use the data, the metadata representing the data in the local data repository may be uploaded to the coordinator node after registering with the coordinator node, i.e., step S350 is performed before step S310.
Fig. 4 schematically illustrates an overall architecture diagram of a federal computing method according to an embodiment of the present disclosure. In fig. 4, a coordinating node 410 and four computing nodes 420, 430, 440, and 450 are schematically shown. Coordinating node 410 is coupled with computing nodes 420, 430, 440, and 450, respectively, for overall scheduling of federated computing tasks among computing nodes 420, 430, 440, and 450. Computing nodes 420, 430, 440, and 450 may be coupled to each other, and secure data streams may be transmitted therebetween, which may enable federal computing for data security and privacy.
As shown in fig. 4, the coordination node 410 includes a metadata management unit 411, an participating management unit 412, and a calculation task coordination unit 413. Among other things, metadata management unit 411 may be used to receive metadata from computing nodes 420, 430, 440, and 450 and publish the received metadata. The participating management unit 412 may be configured to receive registrations of computing nodes 420, 430, 440, and 450 and create different user management documents for the different computing nodes 420, 430, 440, and 450. The participant management unit 412 may also construct data approval channels and task approval channels between the computing nodes 420, 430, 440, and 450 according to the requests of the data consumer nodes in the computing nodes 420, 430, 440, and 450 to simplify interactions between the computing nodes 420, 430, 440, and 450. The computing task orchestration unit 413 may receive federated computing tasks submitted by data consumer nodes in computing nodes 420, 430, 440, and 450, split the federated computing tasks to form a refined plurality of subtasks, and distribute the plurality of subtasks to corresponding computing nodes 420, 430, 440, and 450. The computing task coordination unit 413 may audit whether the federation computing task meets the federation computing security specification after receiving the federation computing task submitted by the data consumer node and before splitting the federation computing task, and reject the federation computing task if the federation computing task does not meet the federation computing security specification. Thus, the safety of federal computation can be further ensured. The computing task orchestration unit 413 may also monitor the status of the computing nodes 420, 430, 440, and 450, and discover failures of the computing nodes 420, 430, 440, and 450 in time.
As shown in fig. 4, taking the computing node 420 as an example, the computing node 420 includes a metadata upload unit 421, a task management unit 422, a task execution unit 423, and a monitoring alarm unit 424. The metadata uploading unit 421 is configured to extract characteristics of data in the local data repository to generate metadata, and upload the generated metadata to the coordination node 410. The metadata uploading unit 421 may also update metadata according to the update condition of the data in the local data repository, and upload the updated metadata to the coordinator node 410, so that the coordinator node 410 updates metadata information in time. The task management unit 422 may be used to conduct negotiations of data usage approval and modeling task approval with other computing nodes in the process of creating federated computing tasks. The task management unit 422 may be further configured to receive the split subtasks from the coordination node 410, and transmit the received subtasks to the task execution unit 423 for execution. The task execution unit 423 is mainly responsible for executing various federal computing tasks. According to embodiments of the present disclosure, federal computing tasks may include, but are not limited to, federal data queries, federal data analysis, and federal model training performed based on data in data warehouses of at least two computing nodes, and federal prediction from models resulting from federal model training. The monitoring alarm unit 424 is configured to monitor the status of the node locally and in real time at the computing node, so as to ensure the security of the federal computing task.
According to embodiments of the present disclosure, the coordination node 410 is utilized to uniformly coordinate and manage the various computing nodes 420, 430, 440, and 450. The various computing nodes 420, 430, 440, and 450 may focus more resources on the performance and security of federated computing tasks between other computing nodes, while interactions with other computing nodes are coordinated by the coordinating node 410, thereby facilitating simplified performance of federated computing tasks. In addition, when a new computing node needs to be added, the whole scheme can be conveniently expanded under the condition that other computing nodes are not influenced.
FIG. 5 schematically illustrates an interaction process of a federal computing method according to an embodiment of the present disclosure. As shown in fig. 5, in this embodiment, data consumer node 520 and data provider node 530 achieve negotiation of data usage via a data approval channel established between data consumer node 520 and data provider node 530 by coordinator node 510, and approval of federal computing tasks via a task approval channel established between data consumer node 520 and data provider node 530 by coordinator node 510. The above process is described in detail below with reference to fig. 5.
As shown in fig. 5, the data consumer node 520 first finds the data provider node 530 where the data required to establish the federated computing task is located by querying the metadata published by the coordinator node 510, as shown in step S5001. Next, the data consumer node 520 sends a data use request to the coordinator node 510, in which information of the computing node where the data required for performing the first federal computing task is located is included, as shown in step S5002. The coordination node 510 receives the data usage request sent by the data consumer node 520, determines the data provider node 530 where the data required to perform the federal computing task is located according to the data usage request, and sends the data usage request to the data provider node 530, as shown in step S5003. The data provider node 530 receives the data usage request sent by the data consumer node 520 from the coordinator node 510 and determines whether to allow the data consumer node 520 to use the data in its local data repository based on the relevant conditions. In the case where the data consumer node 520 is permitted to use the data in its local data repository, a use request reply is sent to the coordinator node 510, as shown in step S5004. The coordinator node 510 receives the use request reply from the data provider node 530, and transmits the use request reply to the data consumer node 510, as shown in step S5005. The data consumer node 520, upon receiving the data in the data repository that the data provider node 530 allows to use the data provider node 530, may further send a modeling task request to the coordinator node 510, as shown in step S5006. The coordinator node 510 receives the modeling task request from the data consumer node 520 and transmits the modeling task request to the data provider node 530, as shown in step S5007. The data provider node 530 receives from the coordinator node 510 the modeling task request sent by the compute node that has agreed upon the data usage negotiation, i.e., the data consumer node 520, and determines whether to allow the data consumer node 520 to use the data in its local data repository for modeling based on the relevant circumstances. In the case where the data consumer node 520 is allowed to use the data in the local data repository for modeling, a modeling request reply is sent to the coordinator node 510, as shown in step S5008. The coordinator node 510 receives the modeling request reply from the data provider node 530, and transmits the modeling request reply to the data consumer node 510, as shown in step S5009. After receiving the modeling request reply from coordination node 510, data consumer node 520 may proceed with the process of configuring and adjusting the model of the binding calculation.
According to the embodiment of the disclosure, the data consumer node 520 and the data provider node 530 conduct data use approval and federal modeling approval via the coordination node 510, and the data consumer node 520 and the data provider node 530 can complete approval only through simple query operation without considering more communication details, thereby simplifying the operation procedure of federal modeling. Particularly in the case where one data consumer node 520 wants to establish federal computing tasks with a plurality of data provider nodes 530, or one data provider node 530 provides data to a plurality of data consumer nodes 520, or a computing node acts as both a data consumer node 520 and a data provider node 530 while establishing federal computing tasks with a plurality of computing nodes, aspects in accordance with embodiments of the present disclosure may significantly simplify the approval process between computing nodes.
In another embodiment of the present disclosure, an overall solution is provided for a federal computing method based on the foregoing embodiments. The functional steps of the federal computing method of the foregoing embodiments can be visualized, providing a user interface at the coordinating node and the plurality of computing nodes, respectively, that facilitates operation. A main program that can perform the foregoing federation computing method 200 is installed at the coordinating node, and an agent program that can perform the foregoing federation computing method 300 is installed at each of the plurality of computing nodes.
The main program at the coordination node is responsible for unified multiparty task coordination, execution and management, and also provides functions such as global metadata management and tenant unified management for the computing nodes. The method can receive the federation data analysis or the federation machine learning task of the user, split the execution plan of the federation computing task into a multiparty joint execution plan, and distribute the multiparty joint execution plan to corresponding computing nodes to execute the task. The main program at the coordinating node is also responsible for managing, tracking and recording the execution of tasks, providing queries and presentation. The main program at the coordinator node is also responsible for managing the metadata. After the user local data warehouse at the computing node joins the federated data warehouse, the metadata information in all federated data warehouses is managed. The main program at the coordinating node is also responsible for managing users at computing nodes registered with the coordinating node. Including, but not limited to, managing all user information, including information of the user itself (e.g., a user name), the location of the computing node to which the user corresponds, etc., and user rights related information. The main program at the coordinating node may also optimize the SQL query. For example, when performing federated trusted data analysis, query SQL may be performance optimized, generating higher performance SQL statements that are returned to the user execution model at each compute node for execution.
Agents at the compute nodes provide functions such as task execution and querying, and management of local data warehouses. An agent at a compute node is responsible for the performance of federated computing tasks. For example, it is responsible for performing multiparty computing tasks, where algorithms such as secure multiparty computing protocols employing cryptography, secure fast interaction, etc. are employed to perform operators of multiparty execution plans. Agents at the compute nodes are also responsible for managing local federated computing tasks. For example, is responsible for recording execution task logs, managing task completion, controlling the number of concurrent task executions, etc. Agents at the compute nodes are also responsible for managing the local data warehouse. For example, configuring a local data warehouse, timing synchronization of metadata information to a coordinating node, and the like may be managed.
Fig. 6A-6F schematically illustrate examples of interfaces for solutions to federal computing methods according to embodiments of the present disclosure, primarily illustrating interfaces local to computing nodes.
FIG. 6A illustrates an interface of a data mart. Users at the compute nodes can conveniently view metadata lists and information published by all the compute nodes and published by other compute nodes locally through the data marts. Clicking on each piece of recorded information on the data mart interface may enter into viewing specific metadata information, as shown in fig. 6B and 6C, providing basic information and field description of the metadata. FIG. 6D illustrates a data management interface on which a user may manage metadata. Such as publishing metadata or de-publishing published metadata. FIG. 6E illustrates a task management interface that supports a user in creating, editing, executing, and deleting tasks, and unified lessees manage the user's task list. The user may view the current user's task list, and the corresponding task execution (job) list. Fig. 6F shows an approval management interface. All requests for performing federal computing tasks using metadata that has been published by the current user are notified to the current user, who can approve on the interface by a click operation. After the approval is passed, the task requester can start executing the task. At the same time, unified management of approval lists is provided, and task approval can be modified and denied.
Embodiments according to the present disclosure may be applied to joint trusted security modeling. Different users (located at different computing nodes) may perform joint security modeling training, testing, and use the resulting model to verify and predict data. The obtained model can be used for predicting risks, so that risk identification capacity is improved, and enterprise losses are reduced.
Embodiments according to the present disclosure may be applied to federated secure data analysis. The data can not be local through the combined SQL inquiry and program calculation among different users, and the functions of credible data analysis, calculation and the like can be realized under the condition of ensuring safety, so that the effect of data federal calculation is improved.
Embodiments according to the present disclosure may be applied to intersection of privacy sets. Two users holding respective sets are allowed to jointly compute intersection operations of data sets in respective local data stores. At the end of the computational interaction, one or both of the parties gets the correct intersection, but does not get any information of the other party's set beyond the intersection. Therefore, the privacy of the collection can be protected, the correct calculation result can be obtained, and the user requirement can be met. The aggregate content includes, but is not limited to, address books, genomes of genetic diagnosis service users, certificate numbers, and the like.
Fig. 7 schematically illustrates a block diagram of a federal computing device 700, according to an embodiment of the present disclosure. As shown in fig. 7, federated computing device 700 includes metadata management module 710, node determination module 720, task processing module 730, and task distribution module 740.
According to an embodiment, metadata management module 710 is configured to obtain a plurality of metadata from a plurality of computing nodes and to issue the plurality of metadata, wherein the metadata represents characteristics of data in a data warehouse of the computing nodes. The node determination module 720 is configured to determine at least two computing nodes of the plurality of computing nodes that agree to perform the federated computing task, wherein the at least two computing nodes agree to perform the federated computing task based on the plurality of metadata. The task processing module 730 is configured to receive federated computing tasks submitted by at least one of the at least two computing nodes and split the federated computing tasks into a plurality of subtasks. The task distribution module 740 is configured to distribute a plurality of subtasks to at least two computing nodes for executing the plurality of subtasks between the at least two computing nodes.
The specific operations of the above functional modules may be obtained by referring to the operation steps of the federal calculation method 200 in the foregoing embodiments, and will not be described herein.
Fig. 8 schematically illustrates a block diagram of a federal computing device according to another embodiment of the present disclosure. As shown in fig. 8, the federated computing device 800 includes a metadata query module 810, a first node determination module 820, a task submission module 830, a first task execution module 840, a metadata upload module 850, a second node determination module 860, a task receipt module 870, and a second task execution module 880.
According to an embodiment, the metadata query module 810 is configured to obtain, from a coordinating node, a plurality of metadata uploaded to the coordinating node by a plurality of computing nodes, wherein the metadata of each computing node represents characteristics of data in a data warehouse of the computing node. The first node determination module 820 is configured to determine, from a plurality of computing nodes, a computing node from among the plurality of computing nodes to cooperatively perform a first federated computing task as a data provider node based on the plurality of metadata. The task submission module 830 is configured to submit the first federated computing task to the orchestration node for the orchestration node to split the first federated computing task into a plurality of first sub-tasks. The first task execution module 840 is configured to receive at least one first sub-task of the plurality of first sub-tasks from the coordinator node and to execute the at least one first sub-task in cooperation with the data provider node. The metadata upload module 850 is configured to upload local metadata to the coordinator node, the local metadata representing characteristics of the data in the local data repository. The second node determination module 860 is configured to determine at least one of the plurality of computing nodes as a data consumer node to cooperatively perform a second linkage computing task. The task receiving module 870 is configured to receive at least one second sub-task of the plurality of second sub-tasks split by the second linkage computing task from the coordinating node. The second task execution module 880 is configured to execute at least one second sub-task in cooperation with the data consumer node.
The specific operations of the above functional modules may be obtained by referring to the operation steps of the federal calculation method 300 in the foregoing embodiments, and will not be described herein.
Fig. 9 schematically illustrates a block diagram of a federated computing device 900 adapted to perform federated calculations in accordance with an embodiment of the present disclosure. The federation computing method according to embodiments of the present disclosure may be performed using the federation computing device shown in fig. 9.
As shown in fig. 9, a federated computing device 900 in accordance with an embodiment of the present disclosure includes a processor 901 and a memory 902. The processor 901 may perform various suitable actions and processes in accordance with programs or instructions stored in the memory 902. The processor 901 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 901 may also include on-board memory for caching purposes. Processor 901 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
The processor 901 and the memory 902 are connected to each other through a bus. The processor 901 performs various operations of the method flow according to the embodiment of the present disclosure by executing a program in the memory 902. It should be noted that the program may also be stored in one or more storage devices other than the memory 902. The processor 901 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more storage devices.
According to an embodiment of the present disclosure, federated computing device 900 may also include an input device 903 and an output device 904, with input device 903 and output device 904 also connected to the bus. Furthermore, federal computing device 900 can also include one or more of the following components: an input section including a keyboard, a mouse, etc.; an output section including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), etc., and a speaker, etc.; a storage section including a hard disk or the like; and a communication section including a network interface card such as a LAN card, a modem, and the like.
According to embodiments of the present disclosure, the method flow according to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such embodiments, the computer program may be downloaded and installed from a network via a communication portion, and/or installed from a removable medium. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 901. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
The present disclosure also provides a computer-readable storage medium and a computer program product. The computer-readable storage medium may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed by the processor 901, implement methods in accordance with embodiments of the present disclosure. The computer program product comprises a computer program which, when executed by a processor, can implement the method of any of the embodiments described above.
According to embodiments of the present disclosure, the computer-readable storage medium may be a computer-non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (21)

1. A federal computing method, comprising:
obtaining a plurality of metadata from a plurality of computing nodes and publishing the plurality of metadata, the metadata representing characteristics of data in a data warehouse of the computing nodes;
determining at least two computing nodes from the plurality of computing nodes that agree to perform a federated computing task, wherein the at least two computing nodes agree to perform the federated computing task based on the plurality of metadata;
receiving a federation computing task submitted by at least one of the at least two computing nodes and splitting the federation computing task into a plurality of subtasks; and
Distributing the plurality of sub-tasks to the at least two computing nodes for executing the plurality of sub-tasks between the at least two computing nodes;
wherein the receiving the federated computing task submitted by at least one of the at least two computing nodes and splitting the federated computing task into a plurality of subtasks comprises:
receiving a federated computing task submitted by at least one of the at least two computing nodes;
checking whether the federation calculation task accords with federation calculation safety specification or not; and
splitting the federated computing task into the plurality of subtasks in response to the federated computing task conforming to an audit of federated computing security specifications;
wherein the at least two computing nodes include a data consumer node and a data provider node, the federated computing task being derived by:
the data consumer node determines a computing node to cooperatively execute the federation computing task in the plurality of computing nodes as a data provider node;
and responding to the modeling willingness to obtain approval of the data provider node, configuring a federation computing model by the data consumer node according to data in a data warehouse of the data consumer node and data in a data warehouse of the data provider node, adjusting parameters of the federation computing model, and obtaining the federation computing task based on the configured and adjusted federation computing model.
2. The method of claim 1, wherein determining at least two of the plurality of computing nodes that agree to perform a federated computing task comprises:
constructing a data approval channel between at least two computing nodes of the plurality of computing nodes for negotiation of data usage by the at least two computing nodes based on the plurality of metadata via the data approval channel;
constructing a task approval channel between at least two computing nodes which reach data use negotiation, wherein the task approval channel is used for approval of federal computing tasks by the at least two computing nodes which reach data use negotiation through the task approval channel;
at least two computing nodes that reach federation computing task approval are taken as the at least two computing nodes that agree to perform federation computing tasks.
3. The method of claim 2, wherein constructing a data approval channel between at least two of the plurality of computing nodes comprises:
receiving a data use request sent from at least one of the at least two computing nodes as a data consumer node;
determining a computing node where data required for executing the federation computing task is located as a data provider node according to the data use request, and sending the data use request to the data provider node; and
And receiving a use request response from the data provider node and sending the use request response to the data consumer node.
4. The method of claim 3, wherein constructing a task approval channel between at least two of the plurality of computing nodes comprises:
receiving a modeling task request from a data consumer node of the at least two computing nodes;
sending the modeling task request to the data provider node; and
and receiving a modeling request response from the data provider node, and sending the modeling request response to the data consumer node.
5. The method of claim 1, wherein the federal computing task includes federal data query, federal data analysis, and federal model training performed based on data in the data warehouse of each of the at least two computing nodes, and federal prediction from a model resulting from the federal model training.
6. The method of claim 1, after receiving federal computing tasks submitted by at least one of the at least two computing nodes, further comprising:
Checking whether the federation calculation task accords with federation calculation safety specification;
and rejecting the federation computing task under the condition that the federation computing task does not accord with the federation computing safety specification.
7. The method of claim 1, wherein the metadata includes a name of a data warehouse, a name of a data table stored in the data warehouse, a field name of the data table, a field type, and a number of rows.
8. A federal computing method, comprising:
a data consumer node obtains from a coordinating node a plurality of metadata uploaded to the coordinating node by a plurality of computing nodes, wherein the metadata of each computing node represents characteristics of data in a data warehouse of the associated computing node;
determining a computing node to cooperatively execute a first federal computing task from the plurality of computing nodes as a data provider node according to the plurality of metadata;
responding to the modeling willingness to obtain approval of the data provider node, and configuring a federation computing model by the data consumer node according to data in a data warehouse of the data consumer node and data in a data warehouse of the data provider node, and adjusting parameters of the federation computing model;
Submitting the first federation computing task based on the configured and adjusted federation computing model to the coordination node for splitting the first federation computing task into a plurality of first subtasks by the coordination node; and
receiving at least one first sub-task of the plurality of first sub-tasks from the coordinating node and executing the at least one first sub-task in conjunction with the data provider node;
wherein the plurality of first subtasks are split in response to an audit that the first federal computing task meets federal computing security specifications.
9. The method of claim 8, wherein determining, from the plurality of computing nodes, a computing node to cooperatively perform a first federated computing task as a data provider node from the plurality of metadata comprises:
determining a computing node in which data required for executing the first federation computing task is located from the plurality of computing nodes according to the plurality of metadata;
negotiating data use with the computing node where the data required for executing the first federal computing task is located via a data approval channel constructed by the coordinating node;
The method comprises the steps that a computing node which reaches data use negotiation carries out approval of federal computing tasks through a task approval channel constructed by the coordination node;
and taking the computing node which achieves federal computing task approval as the data provider node.
10. The method of claim 8, further comprising:
uploading local metadata to the coordinating node, the local metadata representing characteristics of data in a local data warehouse;
determining at least one computing node of the plurality of computing nodes as a data consumer node to cooperatively perform a second federation computing task;
receiving at least one second subtask of a plurality of second subtasks split by the second joint computing task from the coordination node; and
and executing the at least one second subtask in cooperation with the data consumer node.
11. The method of claim 10, wherein determining at least one of the plurality of computing nodes as a data consumer node to cooperatively perform a second joint computing task comprises:
negotiating data use with at least one of the plurality of computing nodes via a data approval channel constructed by the coordinating node;
The method comprises the steps that a computing node which reaches data use negotiation carries out approval of federal computing tasks through a task approval channel constructed by the coordination node;
and taking the computing node which achieves the approval of the federation computing task as a data user node which is to cooperatively execute the second federation computing task.
12. The method of claim 9, wherein negotiating data usage via the data approval channel constructed by the coordinating node with the computing node at which the data required to perform the first federated computing task is located comprises:
sending a data use request to the coordination node, wherein the data use request comprises information of a computing node where data required for executing the first federation computing task is located; and
a use request reply is received from the coordinating node.
13. The method of claim 9, wherein the approval of the federated computing task with the computing node that achieved the data usage negotiation via the task approval channel constructed by the coordinating node comprises:
sending a modeling task request to the coordination node; and
a modeling request reply is received from the coordinating node.
14. The method of claim 11, wherein negotiating data usage with at least one of the plurality of computing nodes via a data approval channel constructed by the coordinating node comprises:
Receiving, from the coordinating node, a data usage request sent by the at least one computing node; and
in the event that the at least one computing node is permitted to use data in the local data repository, a use request reply is sent to the coordinating node.
15. The method of claim 11, wherein the approval of the federated computing task with the computing node that achieved the data usage negotiation via the task approval channel constructed by the coordinating node comprises:
receiving a modeling task request from the coordinating node sent by the computing node that reaches data use negotiations; and
in the event that the computing node that has agreed to use the data to model using the data in the local data repository, a modeling request reply is sent to the coordinating node.
16. A federal computing device, comprising:
a metadata management module configured to obtain a plurality of metadata from a plurality of computing nodes and to publish the plurality of metadata, the metadata representing characteristics of data in a data warehouse of the computing nodes;
a node determination module configured to determine at least two computing nodes of the plurality of computing nodes that agree to perform a federated computing task, wherein the at least two computing nodes agree to perform the federated computing task based on the plurality of metadata;
A task processing module configured to receive a federated computing task submitted by at least one of the at least two computing nodes and split the federated computing task into a plurality of subtasks; and
a task distribution module configured to distribute the plurality of subtasks to the at least two computing nodes for executing the plurality of subtasks between the at least two computing nodes;
wherein the task processing module is further configured to:
receiving a federated computing task submitted by at least one of the at least two computing nodes;
checking whether the federation calculation task accords with federation calculation safety specification or not; and
splitting the federated computing task into the plurality of subtasks in response to the federated computing task conforming to an audit of federated computing security specifications;
wherein the at least two computing nodes include a data consumer node and a data provider node, the federated computing task being derived by:
the data consumer node determines a computing node to cooperatively execute the federation computing task in the plurality of computing nodes as a data provider node;
And responding to the modeling willingness to obtain approval of the data provider node, configuring a federation computing model by the data consumer node according to data in a data warehouse of the data consumer node and data in a data warehouse of the data provider node, adjusting parameters of the federation computing model, and obtaining the federation computing task based on the configured and adjusted federation computing model.
17. A federal computing device, comprising:
a memory storing program instructions; and
a processor configured to execute the program instructions to perform the federal computing method of any one of claims 1 to 7.
18. A federal computing device, comprising:
a metadata query module configured to obtain, from a coordinating node, a plurality of metadata uploaded to the coordinating node by a plurality of computing nodes, wherein the metadata of each computing node represents characteristics of data in a data warehouse of the computing node;
a first node determination module configured to determine, from the plurality of computing nodes, a computing node to cooperatively perform a first federated computing task as a data provider node from the plurality of computing nodes;
The module is configured to respond to the modeling willingness to obtain approval of the data provider node, and the data consumer node configures a federation computing model according to data in a data warehouse of the data consumer node and data in a data warehouse of the data provider node and adjusts parameters of the federation computing model;
the task submitting module is configured to submit the first federation computing task based on the configured and adjusted federation computing model to the coordination node so as to split the first federation computing task into a plurality of first subtasks by the coordination node; and
a first task execution module configured to receive at least one first sub-task of the plurality of first sub-tasks from the coordination node and execute the at least one first sub-task in cooperation with the data provider node;
wherein the plurality of first subtasks are split in response to an audit that the first federal computing task meets federal computing security specifications.
19. A federal computing device, comprising:
a memory storing program instructions; and
a processor configured to execute the program instructions to perform the federal computing method of any one of claims 8 to 15.
20. A computer readable storage medium storing computer executable instructions which when executed are for implementing the federal calculation method of any one of claims 1 to 7, or the federal calculation method of any one of claims 8 to 15.
21. A computer program product comprising a computer program for implementing the federal calculation method of any one of claims 1 to 7, or the federal calculation method of any one of claims 8 to 15, when executed by a processor.
CN202010997997.8A 2020-09-21 2020-09-21 Federal computing method, apparatus, device, and storage medium Active CN111914038B (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN202010997997.8A CN111914038B (en) 2020-09-21 2020-09-21 Federal computing method, apparatus, device, and storage medium
US17/189,385 US20220091891A1 (en) 2020-09-21 2021-03-02 Method, device, apparatus of federated computing, and storage medium
KR1020210028427A KR20220039526A (en) 2020-09-21 2021-03-03 combined computiong method, apparatus, equipment, non transitory computer readable storage medium and computer program
EP21160993.8A EP3971728A1 (en) 2020-09-21 2021-03-05 Method, device, apparatus of federated computing, and storage medium
JP2021057612A JP2021103588A (en) 2020-09-21 2021-03-30 Federal calculation method, federal calculation apparatus, federal calculation device, storage medium, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010997997.8A CN111914038B (en) 2020-09-21 2020-09-21 Federal computing method, apparatus, device, and storage medium

Publications (2)

Publication Number Publication Date
CN111914038A CN111914038A (en) 2020-11-10
CN111914038B true CN111914038B (en) 2024-04-16

Family

ID=73265296

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010997997.8A Active CN111914038B (en) 2020-09-21 2020-09-21 Federal computing method, apparatus, device, and storage medium

Country Status (5)

Country Link
US (1) US20220091891A1 (en)
EP (1) EP3971728A1 (en)
JP (1) JP2021103588A (en)
KR (1) KR20220039526A (en)
CN (1) CN111914038B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860970B (en) * 2021-03-02 2024-03-12 百度在线网络技术(北京)有限公司 Data processing method and device, electronic equipment and storage medium
CN113537508B (en) * 2021-06-18 2024-02-02 百度在线网络技术(北京)有限公司 Processing method and device for federal calculation, electronic equipment and storage medium
CN115567427A (en) * 2021-07-02 2023-01-03 中国移动通信有限公司研究院 Evaluation method, evaluation node and federal learning system for consistency of interaction protocols
CN114221957A (en) * 2021-11-30 2022-03-22 中国电子科技网络信息安全有限公司 Country management system
CN114721804B (en) * 2022-04-15 2024-08-13 支付宝(杭州)信息技术有限公司 Task scheduling method and device and electronic equipment
CN114692209B (en) * 2022-05-31 2022-09-20 蓝象智联(杭州)科技有限公司 Graph federation method and system based on confusion technology
CN115202908B (en) * 2022-09-09 2023-01-03 杭州海康威视数字技术股份有限公司 Privacy computation request response method and device based on dynamic arrangement
CN115577034B (en) * 2022-11-21 2023-04-04 中国电子信息产业集团有限公司 Federal computing system and method based on data system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1985493A (en) * 2004-07-21 2007-06-20 国际商业机器公司 Method and apparatus for providing federated functionality within a data processing system
CN110955907A (en) * 2019-12-13 2020-04-03 支付宝(杭州)信息技术有限公司 Model training method based on federal learning
CN110990329A (en) * 2019-12-09 2020-04-10 杭州趣链科技有限公司 Method, equipment and medium for high availability of federated computing
CN111212110A (en) * 2019-12-13 2020-05-29 清华大学深圳国际研究生院 Block chain-based federal learning system and method

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7469248B2 (en) * 2005-05-17 2008-12-23 International Business Machines Corporation Common interface to access catalog information from heterogeneous databases
US7818783B2 (en) * 2006-03-08 2010-10-19 Davis Russell J System and method for global access control
US8380750B2 (en) * 2011-02-17 2013-02-19 International Business Machines Corporation Searching and displaying data objects residing in data management systems
US9489243B2 (en) * 2012-01-26 2016-11-08 Computenext Inc. Federating computing resources across the web
CA2929825C (en) * 2015-05-17 2018-11-13 Ormuco Inc. Method of and system for managing a federation of cloud computing resources
JP6398944B2 (en) * 2015-10-28 2018-10-03 オムロン株式会社 Data distribution management system
JP6827327B2 (en) * 2017-01-05 2021-02-10 株式会社日立製作所 Distributed computing system
JP2019047334A (en) * 2017-09-01 2019-03-22 学校法人慶應義塾 Data processing unit, data processing method and program for data processing
JP7000259B2 (en) * 2018-06-07 2022-01-19 ヤフー株式会社 Generator, generation method, and generation program
US11494380B2 (en) * 2019-10-18 2022-11-08 Splunk Inc. Management of distributed computing framework components in a data fabric service system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1985493A (en) * 2004-07-21 2007-06-20 国际商业机器公司 Method and apparatus for providing federated functionality within a data processing system
CN110990329A (en) * 2019-12-09 2020-04-10 杭州趣链科技有限公司 Method, equipment and medium for high availability of federated computing
CN110955907A (en) * 2019-12-13 2020-04-03 支付宝(杭州)信息技术有限公司 Model training method based on federal learning
CN111212110A (en) * 2019-12-13 2020-05-29 清华大学深圳国际研究生院 Block chain-based federal learning system and method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
From Federated Databases to a Federated Data Warehouse System;Stefan Berger等;Proceedings of the 41st Hawaii International Conference on System Sciences;1-10 *
Managing hot metadata for scientific workflows on multisite clouds;Luis Pineda-Morales等;2016 IEEE International Conference on Big Data (Big Data);20170206;全文 *
基于Web的作战模拟系统中数据共享方法研究;陈彬;鞠儒生;黄柯棣;;系统仿真学报;20090505(第09期);全文 *
面向数据共享交换的联邦学习技术发展综述;王亚珅;;无人系统技术(第06期);全文 *

Also Published As

Publication number Publication date
US20220091891A1 (en) 2022-03-24
CN111914038A (en) 2020-11-10
EP3971728A1 (en) 2022-03-23
JP2021103588A (en) 2021-07-15
KR20220039526A (en) 2022-03-29

Similar Documents

Publication Publication Date Title
CN111914038B (en) Federal computing method, apparatus, device, and storage medium
Liu et al. A blockchain-empowered federated learning in healthcare-based cyber physical systems
CN108491164B (en) Hybrid cloud storage architecture system
US20200090188A1 (en) Autonomous data exchange marketplace system and methods
US20200265493A1 (en) Real-time customizable ai model collaboration and marketplace service over a trusted ai model network
Benjamin et al. From shared data to sharing workflow: Merging PACS and teleradiology
US20230367888A1 (en) Multi-party encryption cube processing apparatuses, methods and systems
US9594812B2 (en) Interfaces for accessing and managing enhanced connection data for shared resources
US10977230B2 (en) Data information processing method and data storage system
US10652231B2 (en) Systems and methods for managing secure sharing of online data
Cheng et al. Construction cost management using blockchain and encryption
Kaur et al. Blockchain technology for convergence: an overview, applications, and challenges
Perwej A pervasive review of Blockchain technology and its potential applications
WO2023124219A1 (en) Joint learning model iterative update method, apparatus, system, and storage medium
CN116420149A (en) Client experience perception based on federal learning
US20220318706A1 (en) Incentive-based data exchange
Chinos et al. Adjusting scheduling model with release and due dates in production planning
CN103069410A (en) System and method for service recommendation service
EP4350533A1 (en) Computer-implemented method and system for processing a service with sovereign data
JP2013250661A (en) Information processor, information processing method and program
Zohora et al. DBDAA: A real-time approach to Dynamic Banker’s Deadlock Avoidance Algorithm with optimized time complexity
CN117633904A (en) Data collaboration method, device, equipment and storage medium based on block chain
CN117114135A (en) Information processing method and device, electronic equipment and computer readable storage medium
Martins The ever-increasing pervasiveness of edge computing is creating challenges for users’ privacy. Given this state-of-affairs, we decided to pursuit an overview and future directions for novel approaches for privacy-preserving computation. In this process, we highlight of some most important privacy concepts and their application to both Fog Computing and IoT.
KR20120125930A (en) System for Intermediating Knowledge Information Transactions using Communication Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant