CN114492846A

CN114492846A - Cross-domain federated learning method and system based on trusted execution environment

Info

Publication number: CN114492846A
Application number: CN202210354376.7A
Authority: CN
Inventors: 邢炬; 左磊
Original assignee: Tianju Dihe Suzhou Technology Co ltd
Current assignee: Tianju Dihe Suzhou Technology Co ltd
Priority date: 2022-04-06
Filing date: 2022-04-06
Publication date: 2022-05-13
Anticipated expiration: 2042-04-06
Also published as: CN114492846B

Abstract

The application discloses a cross-domain federated learning method and system based on a trusted execution environment, and belongs to the technical field of machine learning. The method comprises the following steps: a task developer sends a federal learning task generated in a development environment to a platform controller; the platform controller sends the federal learning task to the participants and sends the configuration information generated according to the federal learning task to the parameter aggregation cluster; the participators carry out model training according to the federal learning task, and send the intermediate parameters of each training round to the parameter aggregation cluster; the parameter aggregation cluster performs cluster configuration and bottom network configuration according to the configuration information, aggregates intermediate parameters in a trusted execution environment according to the cluster configuration and the bottom network configuration, and sends the aggregated intermediate parameters to the participants for continuous training; and after aggregating the intermediate parameters of the last round of training, the parameter aggregation cluster sends the intermediate parameters serving as model parameters to the task developer. The method and the device can improve parameter precision and parameter aggregation efficiency.

Description

Cross-domain federated learning method and system based on trusted execution environment

Technical Field

The application relates to the technical field of machine learning, in particular to a cross-domain federal learning method and a system based on a trusted execution environment.

Background

Data ownership and safety issues are increasingly highlighted, and due to the respect of data ownership and the guarantee of data safety, federal learning is proposed as a new machine learning paradigm and is gradually popularized and applied to the cooperation of various data element main bodies. Because the actual training process is completed locally at the participant, the original data does not come out of the participant, and the paradigm can practically guarantee data safety and data ownership. In the actual industry, a large amount of data cooperation application requirements exist among different organizations, and the emergence of the federal learning technology provides an effective means for data value flowing of the cross-organization.

In order to ensure the safety of parameter aggregation, various federal learning technologies generally utilize a safety means to reinforce the parameter aggregation process, and the main modes of reinforcement include a multi-party safety calculation technology, differential privacy, a homomorphic encryption technology and the like.

The multi-party secure computation and homomorphic encryption scheme has large performance overhead (on-line and off-line), and in order to take the performance into consideration, data is often cut off in actual use, so that precision loss is caused; although the differential privacy scheme can prevent parameter leakage in a statistical sense, the introduction of noise has an influence on the aggregated data precision; in addition, these strengthening methods reduce the efficiency of parameter aggregation.

Disclosure of Invention

The application provides a cross-domain federated learning method and system based on a trusted execution environment, which are used for solving the problems of reducing parameter precision and parameter aggregation efficiency when a security means is utilized to reinforce a parameter aggregation process. The technical scheme is as follows:

in one aspect, a cross-domain federated learning method based on a trusted execution environment is provided, and is used in a training system comprising a task developer, a federated learning system and participants, wherein the federated learning system comprises a parameter aggregation cluster, a platform controller and a development environment, and the method comprises the following steps:

the task developer generates a federal learning task in the development environment and sends the federal learning task to the platform controller, wherein the federal learning task comprises parameters and index information of a model to be trained, model structure information, data preparation information and a preset data type, the parameters and index information are used for indicating model parameters and training indexes of the model, the model structure information is used for indicating a model structure of the model, the data preparation information is used for indicating a statement of training data of the model so that a distributed data source can prepare the training data according to a processing flow corresponding to the statement, and the preset data type is used for indicating logical data abstraction provided by the federal learning system so that the distributed training data can be abstracted into a complete data set;

the platform controller sends the federal learning task to the participants, generates configuration information according to the federal learning task, and sends the configuration information to the parameter aggregation cluster;

the participators carry out model training according to the federal learning task and send the intermediate parameters obtained by each training to the parameter aggregation cluster;

the parameter aggregation cluster performs cluster configuration and bottom network configuration according to the configuration information, aggregates the intermediate parameters in a trusted execution environment according to the cluster configuration and the bottom network configuration, and sends the aggregated intermediate parameters to the participants for continuous training;

and after the intermediate parameters of the last round of training are aggregated, the parameter aggregation cluster sends the aggregated intermediate parameters as model parameters to the task developer.

In one possible implementation, the method further includes:

the platform controller generates a session token according to the federated learning task and sends the session token to the task developer;

the task developer modifies at least one information in the federal learning task and sends the modified federal learning task and the session token to the platform controller, wherein the at least one information is at least one of the parameter and index information, the model structure information and the data preparation information;

the platform controller maps each information in the federal learning task before and after modification into a state diagram, performs differential calculation before and after modification on each content in the state diagram, generates updated configuration information according to a calculation result, and sends the updated configuration information to the parameter aggregation cluster;

and the parameter aggregation cluster modifies the cluster configuration and/or the underlying network configuration according to the updated configuration information.

In one possible implementation, the task developer modifies at least one information in the federal learning task, including:

and the task developer calls an increment semantic interface supported by the preset data type and modifies at least one information in the federated learning task by utilizing the increment semantic interface.

In one possible implementation, the incremental semantic interfaces include an operations class interface, a trace class interface, and an exchange class interface.

In one possible implementation, the method further includes:

the platform controller generates an information difference according to the calculation result and sends the information difference to the task developer;

and the task developer displays the information difference.

In one possible implementation, when the intermediate parameter is encrypted and partitioned into fixed-size parameter slices, the aggregating the intermediate parameter in a trusted execution environment according to the cluster configuration and the underlying network configuration includes:

the parameter aggregation cluster receives the encrypted parameter fragments sent by each participant, and routes the encrypted parameter fragments to corresponding input queues according to the underlying network configuration in the trusted execution environment;

and the parameter aggregation cluster extracts and decrypts the parameter fragments from the input queue, aggregates the decrypted parameter fragments according to the cluster configuration, and encrypts the aggregated parameter fragments to send the aggregated parameter fragments to an output queue.

In a possible implementation manner, the parameter aggregation cluster includes a plurality of servers, and the routing, in the trusted execution environment, the encrypted parameter fragment to a corresponding input queue according to the underlying network configuration includes:

in the trusted execution environment, the parameter aggregation cluster routes the encrypted parameter fragments to different servers according to a first-level route in the underlying network configuration;

and the parameter aggregation cluster routes the encrypted parameter fragments to different input queues in the memory of the server according to a secondary route in the underlying network configuration.

In a possible implementation manner, the aggregating the decrypted parameter slices according to the cluster configuration includes:

and the parameter aggregation cluster acquires the participant information and the participant training weight in the cluster configuration, and aggregates the decrypted parameter slices according to the participant information and the participant training weight.

In a possible implementation manner, the aggregating the decrypted parameter slices according to the participant information and the participant training weights includes:

and the parameter aggregation cluster aggregates the decrypted parameter fragments according to the participant information, the participant training weight and the vector point multiplication operation and addition operation of the single instruction multiple data streams.

In one aspect, a training system is provided, the training system comprising:

the method comprises the steps that in a task developer, a federal learning system and a training system of participants, the federal learning system comprises a parameter aggregation cluster, a platform controller and a development environment;

the task developer is used for generating a federal learning task in the development environment and sending the federal learning task to the platform controller, wherein the federal learning task comprises parameters and index information of a model to be trained, model structure information, data preparation information and a preset data type, the parameters and the index information are used for indicating model parameters and training indexes of the model, the model structure information is used for indicating a model structure of the model, the data preparation information is used for indicating a statement of training data of the model so that a distributed data source can prepare the training data according to a processing flow corresponding to the statement, and the preset data type is used for indicating logical data abstraction provided by the federal learning system so that the distributed training data can be abstracted into a complete data set;

the platform controller is used for sending the federal learning task to the participants, generating configuration information according to the federal learning task and sending the configuration information to the parameter aggregation cluster;

the participator is used for carrying out model training according to the federal learning task and sending the intermediate parameters obtained by each training to the parameter aggregation cluster;

the parameter aggregation cluster is used for carrying out cluster configuration and bottom network configuration according to the configuration information, aggregating the intermediate parameters in a trusted execution environment according to the cluster configuration and the bottom network configuration, and sending the aggregated intermediate parameters to the participants for continuous training;

and after the intermediate parameters of the last round of training are aggregated, the parameter aggregation cluster is also used for sending the aggregated intermediate parameters as model parameters to the task developer.

The technical scheme provided by the application has the beneficial effects that:

because the parameter aggregation cluster comprises the trusted execution environment, the parameter aggregation cluster aggregates the intermediate parameters in the trusted execution environment safely, the parameter aggregation process does not need to be reinforced by adopting a safety technical means, the loss of parameter precision and parameter aggregation efficiency caused by adopting the safety technical means can be avoided, and better parameter aggregation efficiency can be provided on the basis of ensuring safety and accuracy.

Because the federal learning task comprises three information of parameters and index information, model structure information and data preparation information, the task developer can modify at least one information of the federal learning task, and sends the modified federal learning task and the session token to the platform controller, and the platform controller can generate updating configuration information and send the updating configuration information to the parameter aggregation cluster, so that the parameter aggregation cluster can modify cluster configuration and/or underlying network configuration according to the updating configuration parameters, and the task developer can perform fine-grained adjustment (including data sets, model structures, parameters and indexes) in a light-weight and dynamic manner as required, thereby improving the usability of the system.

The task developer calls an incremental semantic interface supported by a preset data type, and modifies at least one type of information in the federated learning task by using the incremental semantic interface, so that the modularized federated learning task is constructed, the incremental update of the state of the federated learning task can be automatically carried out, and the incremental update is mapped into the incremental update of the bottom layer configuration, so that the task adjustment is more agile and efficient.

The parameter aggregation cluster acquires the participant information and the participant training weight in the cluster configuration, and aggregates the decrypted parameter slices according to the participant information and the participant training weight, so that the information of the participants is integrated in the parameter aggregation process, the cost for constructing a cross-domain federal learning and computing environment is reduced, and the adjustment difficulty is reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a training system according to some exemplary embodiments;

FIG. 2 is a flowchart of a trusted execution environment based cross-domain federated learning method provided in one embodiment of the present application;

FIG. 3 is a schematic diagram of a visual representation of a development environment according to some exemplary embodiments;

FIG. 4 is a schematic illustration of the generation of one predetermined data type in accordance with some exemplary embodiments;

FIG. 5 is a flowchart of a trusted execution environment based cross-domain federated learning method provided by an embodiment of the present application;

FIG. 6 is a schematic illustration of parameter aggregation provided by an embodiment of the present application;

fig. 7 is a flowchart of a method of task modification for federal learning provided in an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.

The invention provides a cross-domain federated learning architecture and a task development environment based on a trusted execution environment, and a task developer can conveniently use cross-organization data to carry out efficient collaborative training through a novel programming abstraction, parameter aggregation based on the trusted execution environment and a system state multiplexing method. The following first explains the configuration of the training system according to the present embodiment. Referring to fig. 1, in the training system shown in fig. 1, the task developer 110, the federal learning system 120 and the participants 130 are included, and the federal learning system 120 includes a parameter aggregation cluster 121, a platform controller 122 and a development environment 123. There may be a plurality of participants 130, and the number of participants 130 is not limited in this embodiment.

The task developer 110 is an electronic device used by a task developer, and the participant 130 is an electronic device used by a participant, and the specific type of the electronic device is not limited in this embodiment.

The development environment 123 is task developer 110 oriented, which may provide a model development, secondary development, and debugging environment for the task developer. The development environment 123 includes, among other things, model structures, data preparation, parameters and metrics, policies, and session tokens. The model structure is used for configuring the structure of a model to be trained, the data is prepared for configuring the processing flow of the training data of the model, the parameters and the indexes are used for configuring the model parameters and the training indexes, and the strategy and the session token are used for configuring the training strategy and the session token.

Parameter aggregation cluster 121 is responsible for aggregating the federated learning parameters of various participants 130. The parameter aggregation cluster 121 includes a plurality of work units (may also be referred to as servers), each work unit is configured with a trusted execution environment, and an input queue, a state tracking module, an output queue, and an aggregation engine are configured in the trusted execution environment, and the aggregation manner of the federated learning parameters is described in detail below.

The platform controller 122 provides a unified data view, and actually issues the federated learning task and maintains the task state. The platform controller 122 includes a rights management module, an incremental calculation module, a configuration generation module, and a Session management module, where the rights management module is configured to perform data rights verification, the incremental calculation module is configured to perform differential calculation on information in a modified federal learning task, the Session manager is configured to manage FL _ Session, and the configuration generator is configured to map a state in a Session to cluster configuration and bottom layer configuration.

The federal learning process for training the system is described below. Referring to fig. 2, a flowchart of a trusted execution environment based cross-domain federated learning method provided in an embodiment of the present application is shown, where the trusted execution environment based cross-domain federated learning method may be applied to the training system shown in fig. 1. The cross-domain federated learning method based on the trusted execution environment can comprise the following steps:

step 201, a task developer generates a federal learning task in a development environment and sends the federal learning task to a platform controller, wherein the federal learning task comprises parameters and index information of a model to be trained, model structure information, data preparation information and a preset data type, the parameters and index information are used for indicating model parameters and training indexes of the model, the model structure information is used for indicating a model structure of the model, the data preparation information is used for indicating a statement of training data of the model, so that a distributed data source prepares the training data according to a processing flow corresponding to the statement, and the preset data type is used for indicating logical data abstraction provided by a federal learning system, so that the distributed training data is abstracted into a complete data set.

The training system can provide a development environment for the task developer, the task developer can input various information and preset data types in the development environment, generate a federal learning task according to the various information and the preset data types, and send the federal learning task to the platform controller.

In the present embodiment, three types of information are involved, which are parameter and index information, model structure information, and data preparation information. Referring to fig. 3, in the visual view of the development environment shown in fig. 3, three kinds of information respectively correspond to one area, and a task developer can edit the corresponding information in the corresponding area.

The parameter and index information is used for indicating model parameters and training indexes of the model. Among these, the model parameters may include, but are not limited to: gradient descent, learning rate, minimum size, loss value. The training index may be an index which is required to be monitored by the task developer and can reflect the training effect or the training efficiency of the model, such as the number of training rounds, the training accuracy, the target loss value and the like in machine learning.

The model structure information is used to indicate a model structure of the model. The model structure may include a plurality of convolutional layers, pooling layers, full-link layers, and the like.

The data preparation information is used for indicating the declaration of the training data of the model, so that the distributed data source prepares the training data according to the processing flow corresponding to the declaration. Specifically, the data preparation information is a declarative expression of training data used by a task developer for model training by relying on a predetermined data type, and is a processing flow referred to by each distributed data source when data are actually processed. For the convenience of understanding, the processing flow of the training Data shown in fig. 4 is illustrated, that is, Data _ P1 is extracted from the original Data _ RAW by SQL operation; filling missing values in Data _ P1 by using a missing value filling function (PF 1) to obtain Data _ P2; finally, Data _ P2 is subjected to feature expansion by using a feature expansion function (PF 2) to obtain Data _ P3, and finally, Data _ P3 is used as prepared training Data.

The embodiment also relates to a predetermined Data type FL _ Data, so that a task developer can quickly establish a Data pipeline. The predetermined data type is used to dictate the logical data abstraction provided by the federated learning system such that the distributed training data is abstracted into a complete data set. That is, the task developer may consider distributed training data that cannot be actually controlled as a complete usable data set when performing model development. As shown in fig. 4, FL _ Data records the relationship of blood relationship and the Data volume of the training Data set organization, which is of great significance for cross-institution federal learning in some specialized fields (e.g., medical) compared to google TFF (federal learning framework) which can clearly present the Data composition. In addition, since the Data in federal learning is not local to the participants, FL _ Data provides a channel for the exchange and multiplexing of Data sets.

Step 202, the platform controller sends the federal learning task to the participants, generates configuration information according to the federal learning task, and sends the configuration information to the parameter aggregation cluster.

The platform controller can use the authority management module to verify the data authority, send the federal learning task to each participant after the verification is passed, and also can generate the configuration information of the parameter aggregation cluster according to the federal learning task and send the configuration information to the parameter aggregation cluster.

In this embodiment, the platform controller multiplexes and incrementally updates the system state through FL _ Session abstraction. An object managed by FL _ Session includes at least: (1) participants and their data usage; (2) a pipeline for constructing a federated learning data set; (3) training parameters and hyper-parameters of the model; (4) cluster configuration of the parameter aggregation cluster; (5) and the participants and the parameters aggregate the bottom network configuration of the cluster. The cluster configuration at least comprises participant information, participant training weights, an aggregator list and addresses, and the underlying network configuration at least comprises firewall rules, routing rules, the aggregator list and the addresses.

And step 203, the participator carries out model training according to the federal learning task and sends the intermediate parameters obtained in each training round to the parameter aggregation cluster.

After receiving the federal learning task, the participator can create a model according to the model information parameters and train the model according to the parameters, index information, data preparation information and preset data types. After each round of training is completed, the participants send the obtained intermediate parameters to the parameter aggregation cluster.

And 204, the parameter aggregation cluster performs cluster configuration and bottom network configuration according to the configuration information, aggregates intermediate parameters in the trusted execution environment according to the cluster configuration and the bottom network configuration, and sends the aggregated intermediate parameters to the participants for continuous training.

The parameter aggregation cluster needs to perform cluster configuration and bottom network configuration according to configuration information, after configuration is completed, the parameter aggregation cluster receives intermediate parameters sent by each participant, then the intermediate parameters are aggregated in a trusted execution environment according to the cluster configuration and the bottom network configuration, the aggregated intermediate parameters are sent to the participants, and the participants continue to train a next round of models according to the aggregated intermediate parameters.

And step 205, after aggregating the intermediate parameters of the last round of training, the parameter aggregation cluster sends the aggregated intermediate parameters as model parameters to the task developer.

After the training is finished, the parameter aggregation cluster aggregates the intermediate parameters of the last round of training, the aggregated intermediate parameters are used as model parameters and sent to a task developer, and the task developer receives the model parameters.

In summary, according to the cross-domain federal learning method based on the trusted execution environment provided by the embodiment of the present application, since the parameter aggregation cluster includes the trusted execution environment, the aggregation of the intermediate parameters of the parameter aggregation cluster in the trusted execution environment is safe, and the parameter aggregation process does not need to be reinforced by adopting a safety technical means, so that the loss caused by the adoption of the safety technical means on the parameter precision and the parameter aggregation efficiency can be avoided, and thus, better parameter aggregation efficiency can be provided on the basis of ensuring the safety and the accuracy.

Referring to fig. 5, a flowchart of a trusted execution environment based cross-domain federated learning method provided in an embodiment of the present application is shown, where the trusted execution environment based cross-domain federated learning method may be applied to a training system, which may be the training system shown in fig. 1. The cross-domain federated learning method based on the trusted execution environment can comprise the following steps:

step 501, a task developer generates a federal learning task in a development environment and sends the federal learning task to a platform controller, the federal learning task comprises parameters and index information of a model to be trained, model structure information, data preparation information and a preset data type, the parameters and the index information are used for indicating model parameters and training indexes of the model, the model structure information is used for indicating a model structure of the model, the data preparation information is used for indicating a statement of training data of the model, so that a distributed data source prepares the training data according to a processing flow corresponding to the statement, and the preset data type is used for indicating logical data abstraction provided by a federal learning system, so that the distributed training data is abstracted into a complete data set.

Step 502, the platform controller sends the federal learning task to the participants, generates configuration information according to the federal learning task, and sends the configuration information to the parameter aggregation cluster.

And 503, performing model training by the participator according to the federal learning task, and sending the intermediate parameters obtained in each training round to the parameter aggregation cluster.

And step 504, the parameter aggregation cluster performs cluster configuration and bottom network configuration according to the configuration information.

The implementation process of step 501-504 is the same as the implementation process of step 201-204, and is described in detail in the foregoing, which is not repeated herein.

Step 505, when the intermediate parameter is encrypted and divided into parameter fragments with fixed size, the parameter aggregation cluster receives the encrypted parameter fragments sent by each participant, and routes the encrypted parameter fragments to corresponding input queues according to the underlying network configuration in a trusted execution environment; and the parameter aggregation cluster extracts and decrypts the parameter fragments from the input queue, aggregates the decrypted parameter fragments according to the cluster configuration, and encrypts the aggregated parameter fragments to send the aggregated parameter fragments to the output queue.

In this embodiment, after obtaining the intermediate parameter, the participant may encrypt the intermediate parameter, divide the encrypted intermediate parameter into parameter fragments of a fixed size, and send all the encrypted parameter fragments to the parameter aggregation cluster in a streaming manner. The fixed size is typically 64 KB.

After receiving the encrypted parameter fragments, the parameter aggregation cluster can route the encrypted parameter fragments to corresponding input queues in a trusted execution environment according to underlying network configuration, then extracts and decrypts the parameter fragments from the input queues by using an aggregator and sends the decrypted parameter fragments to an aggregation engine, and after the aggregation engine finishes aggregation, encrypts the parameter fragments and sends the encrypted parameter fragments to an output queue. The aggregator saves the latest aggregation results and accumulated weights.

In this embodiment, the parameter aggregation cluster includes a plurality of servers, and in a trusted execution environment, the parameter aggregation cluster may route the encrypted parameter segments to different servers according to a first-level route in the underlying network configuration; and the parameter aggregation cluster routes the encrypted parameter fragments to different input queues in the memory of the server according to a secondary route in the underlying network configuration. Two layers of routes can be set based on rules of load balancing and task neighbor, so that the speed and accuracy of parameter aggregation can be improved.

As shown in fig. 6, a parameter aggregation cluster includes n servers (which may also be referred to as parameter aggregators), and a memory of each server is provided with a plurality of input queues and output queues, so that parameter fragments may be routed to one server through a first-level route first, and then the parameter fragments may be routed to one input queue through a second-level route.

In this embodiment, the parameter aggregation cluster may obtain the participant information and the participant training weight in the cluster configuration, and aggregate the decrypted parameter pieces according to the participant information and the participant training weight, so that the information of the participants may be merged in the parameter aggregation process, thereby reducing the cost of constructing the cross-domain federal learning computing environment and reducing the adjustment difficulty. The configuration information such as the participant information and the participant training weight is issued by the platform controller.

The aggregation engine is designed based on a Single Instruction Multiple Data (SIMD) chip of a Central Processing Unit (CPU), and then the aggregation engine in the parameter aggregation cluster may aggregate the decrypted parameter slices according to the participant information, the participant training weight, and a vector dot product operation and an addition operation of the SIMD operation, so that aggregation of the weighted parameters can be efficiently completed.

In this embodiment, the input queue and the output queue are both in the memory, so that the context switching interface of the trusted execution environment can be reconstructed in the memory queue manner, and high overhead caused by context switching is avoided.

Step 506, the parameter aggregation cluster sends the aggregated intermediate parameters to the participants for continuous training.

And the parameter aggregation cluster sends the parameter fragments in the output queue to the participators, and the participators continue the next round of model training according to the aggregated intermediate parameters.

And 507, after aggregating the intermediate parameters of the last round of training, sending the aggregated intermediate parameters serving as model parameters to the task developer by the parameter aggregation cluster.

After the training is finished, the parameter aggregation cluster aggregates the intermediate parameters of the last round of training, the aggregated intermediate parameters are used as model parameters and sent to a task developer, and the task developer obtains the model parameters.

In the training process, the task developer can adjust the information in the federal learning task according to the requirement to adjust or debug the training. Referring to the method flowchart of the federal learned task modification method shown in fig. 7, the federal learned task modification method may include:

and 701, the platform controller generates a session token according to the federal learning task and sends the session token to a task developer.

Specifically, after receiving the federal learning task, the platform controller may return an FL _ Session token, i.e., a Session token, corresponding to the federal learning task.

And 702, modifying at least one information in the federal learning task by the task developer, and sending the modified federal learning task and the session token to the platform controller, wherein the at least one information is at least one of parameter and index information, model structure information and data preparation information.

The task developer can modify the information to be modified (parameter and index information, model structure information and data preparation information) in the development environment in a corresponding area to obtain a modified federal learning task, and sends the modified federal learning task and the session token to the platform controller together.

Specifically, the task developer modifying at least one information in the federal learning task may include: and the task developer calls an incremental semantic interface supported by a preset data type and modifies at least one information in the federal learning task by using the incremental semantic interface. The increment semantic interface comprises an operation interface, a tracking interface and an exchange interface.

In this embodiment, the operation class interface may include, but is not limited to: x.partitions _ mapping (flag), x.pipeline _ mapping (pipe _ indexes, flag), x.pipeline _ substitee (pipe _ index, pipe), x.rollback (history _ number), x.event (). Trace class interfaces may include, but are not limited to: x.linkage (indexes), x.diff (Y), x.merge (Y), x.freeze (indexes, Y). The switch-class interface may include, but is not limited to: x. export (), x.import ().

And 703, the platform controller maps each information in the federal learning task before and after modification into a state diagram, performs differential calculation before and after modification on each content in the state diagram, generates updated configuration information according to a calculation result, and sends the updated configuration information to the parameter aggregation cluster.

The state diagram can be a state list or a dictionary, and the state list and the dictionary both contain multiple items, and the content of each item is modifiable.

The platform controller can map each kind of information in the federal learning task before and after modification into a state diagram, and perform differential calculation before and after modification on each item in the state diagram. Specifically, the platform controller may compare the state diagrams of the same kind of information mapped before and after modification item by item to obtain the change of the specific content. For example, the model index before modification is a learning rate of 0.05, and the model index after modification is a learning rate of 0.01.

And then, the platform controller analyzes the state in the FL _ Session according to the Session token, distinguishes the affected bottom layer state from the unaffected bottom layer state and the cluster configuration, regenerates the updating configuration information according to the preset correlation rule for the affected bottom layer state and the cluster configuration, and sends the generated updating configuration information to the parameter aggregation cluster.

Step 704, the parameter aggregation cluster modifies the cluster configuration and/or the underlying network configuration according to the updated configuration information.

In this embodiment, the platform controller may further generate an information difference according to the calculation result, and send the information difference to the task developer; the task developer displays the information differential, which corresponds to an area in the visual representation, as shown in FIG. 3.

As shown in fig. 1, the training system may include a task developer 110, a federal learning system 120, and a participant 130, wherein the federal learning system includes a parameter aggregation cluster 121, a platform controller 122, and a development environment 123;

the task developer 110 is configured to generate a federal learning task in the development environment 123, and send the federal learning task to the platform controller 122, where the federal learning task includes parameters and index information of a model to be trained, model structure information, data preparation information, and a predetermined data type, the parameters and index information are used to indicate model parameters and training indexes of the model, the model structure information is used to indicate a model structure of the model, the data preparation information is used to indicate a statement of training data of the model, so that a distributed data source prepares the training data according to a processing procedure corresponding to the statement, and the predetermined data type is used to indicate a logical data abstraction provided by the federal learning system, so that the distributed training data is abstracted into a complete data set;

the platform controller 122 is configured to send the federal learning task to the participant 130, generate configuration information according to the federal learning task, and send the configuration information to the parameter aggregation cluster 121;

the participant 130 is configured to perform model training according to the federal learning task, and send the intermediate parameters obtained in each training round to the parameter aggregation cluster 121;

the parameter aggregation cluster 121 is configured to perform cluster configuration and underlying network configuration according to the configuration information, aggregate intermediate parameters in the trusted execution environment according to the cluster configuration and the underlying network configuration, and send the aggregated intermediate parameters to the participant 130 for continuous training;

after aggregating the intermediate parameters of the last round of training, the parameter aggregation cluster 121 is further configured to send the aggregated intermediate parameters as model parameters to the task developer 110.

In an optional embodiment, the task developer 110 is further configured to modify at least one information in the federal learning task, and send the modified federal learning task and the session token to the platform controller 122, where the at least one information is at least one of parameter and index information, model structure information, and data preparation information;

the platform controller 122 is further configured to map each piece of information in the federal learning task before and after modification into a state diagram, perform differential calculation before and after modification on each piece of content in the state diagram, generate update configuration information according to a calculation result, and send the update configuration information to the parameter aggregation cluster 121;

the parameter aggregation cluster 121 is further configured to modify the cluster configuration and/or the underlying network configuration according to the updated configuration information.

In an alternative embodiment, the task developer 110 is further configured to invoke an incremental semantic interface supported by a predetermined data type, and modify at least one information in the federated learning task using the incremental semantic interface.

In an alternative embodiment, the delta semantic interfaces include an operations class interface, a trace class interface, and an exchange class interface.

In an optional embodiment, the platform controller 122 is further configured to generate an information difference according to the calculation result, and send the information difference to the task developer 110;

the task developer 110 is also used for displaying the information difference.

In an alternative embodiment, when the intermediate parameter is encrypted and divided into parameter fragments with a fixed size, the parameter aggregation cluster 121 is further configured to receive the encrypted parameter fragments sent by each participant 130, and route the encrypted parameter fragments to corresponding input queues according to the underlying network configuration in the trusted execution environment;

the parameter aggregation cluster 121 is further configured to extract and decrypt the parameter fragments from the input queue, aggregate the decrypted parameter fragments according to the cluster configuration, and encrypt the aggregated parameter fragments and send the encrypted parameter fragments to the output queue.

In an optional embodiment, the parameter aggregation cluster 121 includes a plurality of servers, and in the trusted execution environment, the parameter aggregation cluster 121 is further configured to route the encrypted parameter segments to different servers according to a first-level route in an underlying network configuration;

the parameter aggregation cluster 121 is further configured to route the encrypted parameter segments to different input queues in the memory of the server according to a secondary route in the underlying network configuration.

In an optional embodiment, the parameter aggregation cluster 121 is further configured to obtain information of the participant 130 and training weights of the participant 130 in the cluster configuration, and aggregate the decrypted parameter slices according to the information of the participant 130 and the training weights of the participant 130.

In an alternative embodiment, the parameter aggregation cluster 121 is further configured to aggregate the decrypted parameter slices according to the information of the participants 130, the training weights of the participants 130, and the vector dot product operation and the add operation of the simd.

To sum up, the training system provided in the embodiment of the present application includes a trusted execution environment in the parameter aggregation cluster, so that the parameter aggregation cluster aggregates the intermediate parameters in the trusted execution environment safely, and does not need to adopt a safety technical means to reinforce the parameter aggregation process, and can avoid the loss caused by the parameter precision and the parameter aggregation efficiency when adopting the safety technical means, thereby providing better parameter aggregation efficiency on the basis of ensuring the safety and the accuracy.

The parameter aggregation cluster acquires the participant information and the participant training weight in the cluster configuration, and aggregates the decrypted parameter fragments according to the participant information and the participant training weight, so that the information of the participants is integrated in the parameter aggregation process, the cost for constructing a cross-domain federal learning computing environment is reduced, and the adjustment difficulty is reduced.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description should not be taken as limiting the embodiments of the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the embodiments of the present application should be included in the scope of the embodiments of the present application.

Claims

1. A cross-domain federated learning method based on a trusted execution environment is characterized in that the method is used in a training system comprising a task developer, a federated learning system and participants, wherein the federated learning system comprises a parameter aggregation cluster, a platform controller and a development environment, and the method comprises the following steps:

and after aggregating the intermediate parameters of the last round of training, the parameter aggregation cluster sends the aggregated intermediate parameters as model parameters to the task developer.

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein the task developer modifies at least one piece of information in the federated learning task, comprising:

4. The method of claim 3, wherein the delta semantic interfaces comprise an operations class interface, a trace class interface, and a switch class interface.

5. The method of claim 2, further comprising:

and the task developer displays the information difference.

6. The method of claim 1, wherein when the intermediate parameters are encrypted and partitioned into fixed-size parameter slices, the aggregating the intermediate parameters in a trusted execution environment according to the cluster configuration and the underlying network configuration comprises:

7. The method according to claim 6, wherein the plurality of servers are included in the parameter aggregation cluster, and the routing, in the trusted execution environment, the encrypted parameter fragments to the corresponding input queues according to the underlying network configuration includes:

8. The method of claim 6, wherein the aggregating the decrypted parameter slices according to the cluster configuration comprises:

9. The method of claim 8, wherein the aggregating the decrypted parameter slices according to the participant information and the participant training weights comprises:

10. A training system is characterized by comprising a task developer, a federal learning system and a participant, wherein the federal learning system comprises a parameter aggregation cluster, a platform controller and a development environment;