CN114004363A

CN114004363A - Method, device and system for jointly updating model

Info

Publication number: CN114004363A
Application number: CN202111256451.8A
Authority: CN
Inventors: 郑龙飞; 陈超超; 王力; 张本宇
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-10-27
Filing date: 2021-10-27
Publication date: 2022-02-01

Abstract

The embodiment of the specification provides a method, a device and a system for jointly updating a model. By the method, the device and the system provided by the embodiment of the specification, based on the data compound segmentation situation in the process of jointly updating the model, the data of the training members are assumed to be segmented, so that a plurality of horizontally segmented subsystems are formed, and the training members with the vertically segmented data can be included in a single subsystem. In this way, the single subsystem with the data being vertically split iterates through the training samples distributed in the plurality of training members in the subsystem, so as to update the parameters to be synchronized. And the data synchronization can be carried out among the subsystems according to the synchronization period triggered by the synchronization condition. The method fully considers the data composition of each training member, provides a solution for the joint update model under the complex data structure, and is beneficial to expanding the application range of the federal learning.

Description

Method, device and system for jointly updating model

Technical Field

One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method, an apparatus, and a system for jointly updating a model by multiple data parties.

Background

With the development of artificial intelligence technology, machine learning models have been gradually applied in the fields of risk assessment, speech recognition, face recognition, natural language processing, and the like. More training data is needed to achieve better model performance. In the fields of medical treatment, finance and the like, different enterprises or institutions have different data samples, and once the data are jointly trained by using a distributed machine learning algorithm, the model precision is greatly improved, and huge economic benefits are brought to the enterprises.

In conventional techniques, federal learning is typically used to jointly train better performing models using data from multiple data parties. Federal learning can be divided into two broad categories depending on the type of data in the data side: horizontally sliced data and vertically sliced data. In a horizontal segmentation scene, data feature spaces owned by a data side are the same, and sample spaces are different; under the vertical segmentation scene, the data sample spaces owned by the data side are the same, and the feature spaces are different. However, in some multi-party united machine learning, it cannot be simply considered as horizontal segmentation or vertical segmentation, for example, federal learning is performed between a financial platform and multiple banks, a vertical segmentation scenario may be between the financial platform and the multiple banks, and a horizontal segmentation scenario may be more suitable between the multiple banks. That is, there are more complex slicing scenarios in practice. How to realize the joint training of the related business models in a complex segmentation scene is a technical problem which has important significance and is worthy of research in the field of federal learning.

Disclosure of Invention

One or more embodiments of the present specification describe a method and apparatus for jointly updating a model to address one or more of the problems identified in the background.

According to a first aspect, a system for jointly updating a model is provided, which comprises a federal service provider and a plurality of subsystems, wherein the plurality of subsystems are used for jointly updating the model W, and a single subsystem i in the plurality of subsystems comprises a first member C in training members_i1Second member C_i2First member C_i1Second member C_i2The held sample data form vertical segmentation, the sample data held by each subsystem form horizontal segmentation, and a single subsystem i corresponds to a local model W with the same structure as the model W_iLocal model W_iComprises a first member C_i1First submodel W of_ci1Is arranged on the second member C_i2Second submodel W of_ci2(ii) a Wherein: a single subsystem i for use with the first member C_i1Second member C_i2Training samples for up-vertical segmentation are directed to local model W_iPerforming joint training in a vertical segmentation mode, and providing a corresponding local model W for the federal service side under the condition of meeting a synchronization condition_iUpdating values of the parameters to be synchronized corresponding to the parameters to be determined one by one, and synchronizing the local parameters to be synchronized with the parameters to be synchronized in each subsystem according to the synchronous values of the parameters to be synchronized fed back by the federal service side, so as to adjust the corresponding parameters to be determined; and the federal service side is used for carrying out safety synchronization on the updated values of the parameters to be synchronized from the subsystems and feeding back the synchronized values.

According to a second aspect, there is provided a method of jointly updating a model, the method being applicable to a process of jointly updating a system update model W of a model, the system comprising a federal service provider and a plurality of subsystems, a single subsystem i of the plurality of subsystems comprising a first member C of training members_i1Second member C_i2First member C_i1Second member C_i2The held sample data form vertical segmentation, the sample data held by each subsystem form horizontal segmentation, and a single subsystem i corresponds to a local model W with the same structure as the model W_iLocal model W_iComprises a first member C_i1First submodel W of_ci1Is arranged on the second member C_i2Second submodel W of_ci2(ii) a The method comprises the following steps: each subsystem respectively utilizes the training samples vertically divided on the corresponding first member and the second member to carry out combined training under the vertical division mode aiming at the corresponding local model, and each training member respectively sends the federate service party under the condition of meeting the synchronous conditionProviding an updating value of each parameter to be synchronized corresponding to each parameter to be determined in the corresponding submodel one by one; the federal service party carries out safe synchronization on the updated values of the parameters to be synchronized from a plurality of subsystems and feeds back the synchronization values of the parameters to be synchronized; and each training member in each subsystem receives the synchronization value of the local parameter to be synchronized so as to update the local parameter to be determined.

In one embodiment, the single subsystem i further comprises a sub-server S_iSingle subsystem i vs. local model W_iThe joint training performed includes: for several samples of the current round, a first member C_i1And a second member C_i2Each passing through a first submodel W_ci1And a second submodel W_ci2Processing corresponding local sample data to respectively obtain corresponding first intermediate results R_it1The second intermediate result R_it2To send to the sub-server S_i(ii) a Sub-server S_iBased on the third submodel W_siFor the first intermediate result R_it1The second intermediate result R_it2Processed to the first member C respectively_i1Second member C_i2Feeding back the first intermediate result R_it1Gradient of (2), second intermediate result R_it2A gradient of (a); first member C_i1And a second member C_i2Each using the first intermediate result R_it1And a second intermediate result R_it2Determining a first submodel W_ci1And a second submodel W_ci2To determine the gradient of the parameter to be determined, to determine the first submodel W, respectively_ci1And a second submodel W_ci2The updated value of the parameter to be synchronized.

In one embodiment, the label-holder of the samples of the current round in a single subsystem i is the first member C_i1Or a second member C_i2(ii) a The sub-server S_iBased on the third submodel W_siFor the first intermediate result R_it1The second intermediate result R_it2Processed to the first member C respectively_i1Second member C_i2Feeding back the first intermediate result R_it1Gradient of (2), second intermediate result R_it2Further comprising: the sub-server S_iBased on the third submodel W_siFor the first intermediate result R_it1And a second intermediate result R_it2Processing to obtain a prediction result, and sending the prediction result to the label holder; the label holder determines corresponding model loss through comparison of label data of a plurality of samples of the current round with the prediction result so as to feed back the model loss to the sub-server S_i(ii) a The sub-server S_iDetermining for a first intermediate result R from the model loss_it1And a second intermediate result R_it2Of the gradient of (c).

In one embodiment, in the third submodel W_siUnder the condition of undetermined parameters, the sub-server S_iAlso detecting the model loss for the third submodel W_siIncluding the gradient of the parameter to be determined.

In one embodiment, the label-holder for several samples of the current round of a single subsystem i is the first member C_i1Or a second member C_i2The label holder is provided with a fourth submodel W_ci3(ii) a The sub-server S_iBased on the third submodel W_siFor the first intermediate result R_it1The second intermediate result R_it2Processed to the first member C respectively_i1Second member C_i2Feeding back the first intermediate result R_it1Gradient of (2), second intermediate result R_it2Further comprising: the sub-server S_iBased on the third submodel W_siFor the first intermediate result R_it1And a second intermediate result R_it2Is processed to obtain a third intermediate result R_it3And the third intermediate result R_it3Sending the label to the label holder; the label holder passes through a fourth submodel W_ci3Processing the third intermediate result R_it3Obtaining corresponding prediction results, and determining a third intermediate result R of model loss aiming at the current turn based on the comparison of the label data of a plurality of samples of the current turn and the prediction results_it3To be fed back to said sub-server S_i(ii) a The above-mentionedSub-server S_iAccording to the third intermediate result R_it3For the first intermediate result R is determined_it1And a second intermediate result R_it2Of the gradient of (c).

In one embodiment, subsystem i is paired with local model W_iThe joint training performed includes: each training member in the subsystem i performs multi-party safety calculation so that each training member can determine the gradient of model loss aiming at the local undetermined parameters; each training member determines the updating value of the parameter to be synchronized based on the gradient of the parameter to be determined in the corresponding submodel, wherein the first member C_i1And a second member C_i2Respectively determining a first submodel W_ci1And a second submodel W_ci2The updated value of the parameter to be synchronized.

In one embodiment, the synchronization condition includes: each local model is updated in a predetermined round or a predetermined time period.

In one embodiment, the single parameter to be synchronized is a single parameter to be determined, or a single gradient corresponding to the single parameter to be determined.

In one embodiment, the federate service side performing safe synchronization on the updated values of the parameters to be synchronized from the plurality of subsystems comprises: the federal service side receives all parameters to be synchronized which are respectively sent by all training members and encrypted in a preset encryption mode; and the federal service side performs fusion of at least one mode of addition, weighted average and median value solving on the respective updated values of the parameters to be synchronized to obtain corresponding synchronization values.

In one embodiment, the predetermined encryption scheme comprises one of the following: adding perturbations that satisfy differential privacy; homomorphic encryption; and (4) secret sharing.

According to a third aspect, a method of jointly updating a model is provided, the method being applicable to a process of jointly updating a system update model W of the model, the system comprising a federal service provider and a plurality of subsystems, a single subsystem i of the plurality of subsystems comprising a first member C of training members_i1Second member C_i2First member C_i1Second member C_i2The held sample data form vertical segmentation, the sample data held by each subsystem form horizontal segmentation, and a single subsystem i corresponds to a local model W with the same structure as the model W_iLocal model W_iComprises a first member C_i1First submodel W of_ci1Second member C_i2Second submodel W of_ci2(ii) a The method is performed by the federal service, and includes: respectively receiving updated values of the parameters to be synchronized which are in one-to-one correspondence with the parameters to be determined in the corresponding sub-model under the condition that the synchronization condition is met from each subsystem, wherein the updated values of the parameters to be synchronized provided by the single subsystem i are based on the subsystem i aiming at the corresponding local model W_iPerforming joint training determination in a vertical segmentation mode; and carrying out safe synchronization on the updated values of the parameters to be synchronized from the subsystems, and feeding back the synchronization value of each parameter to be synchronized so as to allow the corresponding training member or the sub-server to complete the updating of the parameters to be determined of the local module.

According to a fourth aspect, a method of jointly updating a model is provided, the method being applicable to a process of jointly updating a system update model W of the model, the system comprising a federal service provider and a plurality of subsystems, a single subsystem i of the plurality of subsystems comprising a first member C of training members_i1Second member C_i2First member C_i1Second member C_i2The held sample data form vertical segmentation, the sample data held by each subsystem form horizontal segmentation, and a single subsystem i corresponds to a local model W with the same structure as the model W_iLocal model W_iComprises a first member C_i1First module W_ci1Is arranged on the second member C_i2Second module W_ci2(ii) a The method comprises a first member C_i1Executing, including: using local and second members C_i2Constructing vertically sliced training samples for respective local models W_iPerforming joint training to obtain a first sub-model W_ci1The updating value of each parameter to be synchronized is in one-to-one correspondence with each parameter to be determined; in case of satisfying the synchronization condition, will andfirst submodel W_ci1The updated values of the parameters to be synchronized, which correspond to the parameters to be determined one by one, are sent to the federal service side, so that the federal service side can safely synchronize the parameters to be synchronized based on the updated values of the parameters to be synchronized from the subsystems; obtaining a first sub-model W subjected to safety synchronization from the federal service side_ci1The synchronous value of each parameter to be synchronized to update the first submodel W_ci1To each pending parameter in (1).

In one embodiment, subsystem i further comprises a sub-server S_iLocal model W for a single subsystem i_iFurther comprises a sub-server S_iThird submodel W of_si(ii) a The utilization of local and second member C_i2Constructing vertically sliced training samples for respective local models W_iPerforming the joint training comprises: for a plurality of samples of the current round, passing through a first sub-model W_ci1Processing corresponding local sample data to obtain a corresponding first intermediate result R_it1To send to the sub-server S_iFor the sub-server S_iBased on the third submodel W_siFor the first intermediate result R_it1The second intermediate result R_it2Processing performed to feed back the first intermediate result R_it1In which the second intermediate result R_it2From a second member C_i2By means of the second submodel W_ci2Processing corresponding local sample data to obtain; using the first intermediate result R_it1And a second intermediate result R_it2Determining a first submodel W_ci1The gradient of each parameter to be determined, thereby determining a first submodel W_ci1The updated value of the parameter to be synchronized.

In one embodiment, the utilization local and second member C_i2Constructing vertically sliced training samples for respective local models W_iPerforming the joint training comprises: performing a multi-party security calculation with each training member in subsystem i to determine model loss for a first sub-model W_ci1The gradient of the undetermined parameter; based on a first sub-model W_ci1Determining the gradient of the undetermined parameter, determining the phaseThe updated value of the parameter should be synchronized.

According to a fifth aspect, there is provided an apparatus for federated update model, the apparatus being adapted to a federated server in a system for federated update model, the system comprising the federated server and a plurality of subsystems, a single subsystem i of the plurality of subsystems comprising a first member C of training members_i1Second member C_i2First member C_i1Second member C_i2The held sample data form vertical segmentation, the sample data held by each subsystem form horizontal segmentation, and a single subsystem i corresponds to a local model W with the same structure as the model W_iLocal model W_iComprises a first member C_i1First submodel W of_ci1Is arranged on the second member C_i2Second submodel W of_ci2(ii) a The device comprises:

an obtaining unit configured to receive, from each subsystem, an updated value of each parameter to be synchronized, which corresponds to each parameter to be determined in the corresponding local model one to one when the synchronization condition is satisfied, where the updated value of each parameter to be synchronized provided by a single subsystem i is based on the subsystem i with respect to the corresponding local model W_iPerforming joint training determination in a vertical segmentation mode;

and the synchronization unit is configured to perform safe synchronization on the update values of the parameters to be synchronized received from the plurality of subsystems and feed back the synchronization values of the parameters to be synchronized so that the corresponding training members or the sub-servers can complete the update of the parameters to be determined of the local module.

According to a sixth aspect, there is provided an apparatus for jointly updating a model, the apparatus being adapted for use in a process of jointly updating a system update model W of a model, the system comprising a federal service provider and a plurality of subsystems, a single subsystem i of the plurality of subsystems comprising a first member C of training members_i1Second member C_i2First member C_i1Second member C_i2The held sample data form vertical segmentation, the sample data held by each subsystem form horizontal segmentation, and a single subsystem i corresponds to a local model W with the same structure as the model W_iLocal model W_iComprises a first member C_i1First module W_ci1Is arranged on the second member C_i2Second module W_ci2(ii) a The device is arranged on the first member C_i1The method comprises the following steps:

a training unit configured to utilize the local and second member C_i2Training samples with vertical segmentation are aimed at corresponding submodels W_iPerforming joint training to obtain a first sub-model W_ci1The updating value of each parameter to be synchronized is in one-to-one correspondence with each parameter to be determined;

a providing unit configured to be associated with the first submodel W in case a synchronization condition is satisfied_ci1The updated values of the parameters to be synchronized, which correspond to the parameters to be determined one by one, are sent to the federal service side, so that the federal service side can safely synchronize the parameters to be synchronized based on the updated values of the parameters to be synchronized, which are received from the subsystems;

a synchronization unit configured to acquire a first sub-model W subjected to secure synchronization from the federal service side_ci1The synchronous value of each parameter to be synchronized to update the first submodel W_ci1To each pending parameter in (1).

According to a seventh aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the third, fourth aspect.

According to an eighth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory stores executable code, and the processor implements the methods of the third and fourth aspects when executing the executable code.

By the method, the device and the system provided by the embodiment of the specification, based on the data compound segmentation situation in the process of jointly updating the model, the data of at least part of training members in all the training members are assumed to be segmented, so that a plurality of horizontally segmented subsystems are formed, and the training members with the data vertically segmented can be included in a single subsystem. In this way, the single subsystem with the data being vertically split iterates through the training samples distributed in the plurality of training members in the subsystem, so as to update the parameters to be synchronized. And the data synchronization can be carried out among the subsystems according to the synchronization period triggered by the synchronization condition. The method fully considers the data composition of each training member, provides a solution for the joint update model under the complex data structure, and is beneficial to expanding the application range of the federal learning.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIGS. 1a and 1b are schematic diagrams of horizontal segmentation and vertical segmentation of data in conventional federated learning, respectively;

2a and 2b show schematic diagrams of data compound segmentation scenes of two specific examples;

FIG. 3a is a schematic diagram illustrating a specific architecture of a system for a joint update model based on a data compound slicing scenario under the technical concept of the present specification;

FIG. 3b is a schematic diagram of another specific architecture of a system for joint update model based on data compound slicing scenario under the technical concept of the present specification;

FIG. 4a shows a schematic model architecture of a subsystem corresponding to FIG. 3 a;

FIG. 4b shows a schematic model architecture of a subsystem corresponding to FIG. 3 b;

FIG. 5 illustrates a flow diagram of a method of jointly updating a model, according to one embodiment;

FIG. 6 illustrates a timing flow diagram for a joint update model according to one embodiment;

FIG. 7 shows a schematic block diagram of an apparatus for jointly updating a model, according to one embodiment;

FIG. 8 shows a schematic block diagram of an apparatus for jointly updating a model, according to one embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

Federal Learning (federal Learning), which may also be referred to as federal machine Learning, joint Learning, league Learning, and the like. Federal machine learning is a distributed machine learning framework, and can effectively help a plurality of organizations to perform data use and machine learning modeling under the condition of meeting the requirements of user privacy protection, data safety and government regulations.

In particular, assuming that enterprise A and enterprise B each build a task model, individual tasks may be categorical or predictive, and these tasks have been approved by the respective users at the time the data was obtained. However, the models at each end may not be able to be built or may not work well due to incomplete data, such as lack of tag data for enterprise a, lack of user profile data for enterprise B, or insufficient data and insufficient sample size to build a good model. The problem to be solved by federal learning is how to build high quality models at each end of a and B, and the data owned by each enterprise is not known by other parties, i.e., a common model is built without violating data privacy regulations. This common model is just like the optimal model that the parties aggregate the data together. In this way, the built model serves only the own targets in the region of each party.

The federal study can include a plurality of training members, and if necessary, a trusted third party can be used as a service party to perform some auxiliary operations. Each training member may correspond to different service data. The service data may be various data such as characters, pictures, voice, animation, video, and the like. Generally, the business data of each training member has correlation.

For example, among a plurality of business parties relating to financial business, the business party 1 is a bank which provides business such as savings and loan to a user and can hold data such as the age, sex, balance, loan amount, and deposit amount of the user, the business party 2 is a P2P platform which can hold data such as loan record, investment record, and payment due time of the user, and the business party 3 is a shopping site which holds data such as the shopping habit, payment habit, and payment account of the user. Then the business parties holding bank data, P2P platform data, shopping site data may act as training members to conduct federal learning of financial risk prediction models. For another example, among a plurality of business parties related to medical services, each business party may be each hospital, physical examination organization, etc., for example, the business party 1 is the hospital a, diagnosis records corresponding to the age, sex, symptom, diagnosis result, treatment plan, treatment result, etc. of the user are used as local business data, the business party 2 may be the physical examination organization B, physical examination record data corresponding to the age, sex, symptom, physical examination conclusion, etc. of the user, etc. Then each business party holding hospital data, physical examination institution data, etc. can act as a training member to conduct federal learning of models such as disease risk prediction.

Federal learning generally has two data distribution architectures, a horizontal split architecture and a vertical split architecture. Fig. 1a and 1b show these two distribution architectures, respectively.

As shown in fig. 1a, it is a horizontal slicing architecture. In the horizontal slicing architecture, a single sample is completely held by a single data party, and samples held by different data parties are independent. As in FIG. 1a, DataPart 1 holds the tag data for sample 1 (e.g., tag 1) and all the signature data (e.g., signature A1+ B1), and DataPart 2 holds the tag data for sample 2 (e.g., tag 2) and all the signature data (e.g., signature A2+ B2). The features a and B may be regarded as two types of features, in practice, there may be more types of features, and a single data party may also hold more sample data, which is not described herein. As shown in fig. 1a, a sample is represented by a row, and the samples in each data side are completely independent and can be completely separated along a straight line in the horizontal direction, so that the architecture is called a horizontal slicing (or vertical slicing) architecture. For example, basic characteristic data (such as type a data) of the age, the sex and the like of the respective users, property type characteristic data (such as type B data) of balance, running, loan, repayment and the like, label data of whether the financial risk users exist and the like are held among the various banking data parties. That is, sample data held by different data parties have different sample spaces and the same feature space.

Fig. 1b shows a vertical slicing architecture. Under the vertical slicing architecture, sample data of a single sample is held by a plurality of data parties, and the single data party only holds partial data of each sample. As shown in fig. 1B, under the vertical slicing architecture, the data side 1 holds tag data (e.g., tag 1, tag 2, etc.) and partial feature data (e.g., a type feature data, which is recorded as feature a1, feature a2, etc. corresponding to each sample), and the data side 2 holds another partial feature data (e.g., a type feature data, which is recorded as feature B1, feature B2, etc. corresponding to each sample. As shown in fig. 1b, a single data party cannot hold a complete sample, one line represents one sample, and the samples in each data party are combined horizontally to form complete sample data, or can be completely separated along a straight line in the vertical direction, so that the architecture is called a vertical slicing (or longitudinal slicing) architecture. For example, the data side 1 is a bank, the type a feature data is asset type feature data, the data side 2 is a shopping website, and the type B feature data is shopping type feature data such as a commodity browsing record, a search record, a purchase record, a payment route and the like of a user. That is, sample data held by different data parties have the same sample space and different feature spaces. In practice, the sample data of a single sample may also include more other characteristic data held by more data parties, as just an example in fig. 1 b.

For the horizontal-split and vertical-split architectures shown in fig. 1a and 1b, the federal learning mode in conventional learning is generally as follows: under the horizontal segmentation architecture, a parallel synchronous training method can be adopted, namely: training members have the same neural network structure, training is carried out under the assistance of a third party C, data such as updating values of gradient or undetermined parameters are provided to the third party C by a data direction in the training process, and synchronous gradient or undetermined parameters are calculated under the assistance of the third party C and are transmitted back to the training members; in a vertical scene, an MPC (multi-party secure computing) or a segmentation learning model is generally adopted, wherein the MPC is used for putting machine learning calculation into a dense state domain, training members in the segmentation learning have the first layers of the whole neural network structure, a server has a residual layer model of the neural network structure, the training members are trained locally by using private data respectively to obtain output layers of the first layers of the network model, the output layers are transmitted to the server for forward propagation of the residual layer, gradient data are propagated reversely through model loss, and the model is updated.

However, in a real business scenario, not all federal learning can be segmented according to the horizontal-slicing and vertical-slicing architectures shown in fig. 1a and 1b, and may also include both horizontal slicing and vertical slicing.

FIG. 2a shows a simple example of a co-sliced data architecture. The data structure comprises a plurality of horizontally-split data sides and a data side which forms vertical split with the plurality of horizontally-split data sides. In fig. 2a, a data party Ai (i ═ 1, 2 … … n) represents each data party holding X-type feature and tag data, such as a bank, and X-type feature data, such as asset-type feature data, constitutes a horizontal split between data party a1 and data party a2 … …. The data side B represents a data side with Z-type characteristics, the data side B is a shopping website and the like, the characteristics Z represent shopping-type characteristic data, and a vertical segmentation framework is formed between the data side B and each data side A.

In fact, in practice, the way of splitting data may be more complicated, as shown in fig. 2b, which includes many possible situations. Fig. 2b shows the data architecture in the case of samples arranged in rows, where each wire frame represents data for one data side and the vertical correspondence (or by columns) represents the same features. It can be seen that the data between the data parties is complicated, and the horizontal segmentation and the vertical segmentation are nested and crossed. For example, a portion of data on data side 4 and data side 6 comprise a vertical split, another portion of data and a portion of data on data side 9 comprise a vertical split, another portion of data on data side 9 and a portion of data on data side 5 comprise a vertical split, while the portion of data on data side 5 and data side 4 comprise a horizontal split, the above-mentioned data and another portion of data on data side 5 further comprise a horizontal split … …

For such complex data architectures, where the various data parties are training members, it may not be possible to train the models together using conventional federal learning. In view of this, the present specification proposes a novel distributed federated learning concept to process the sample data of this hybrid segmentation architecture. Under the technical concept of the specification, each data side of federal learning is firstly divided. In the situation shown in fig. 2a, the sub-servers may respectively correspond to the data parties Ai (as first members) one-to-one, and the data party B is divided into a plurality of second members, each of which contains sample data of training samples consistent with the corresponding data party Ai. The single data side Ai and the data side B portion containing the other data of the respective sample body can be considered as one subsystem. In the situation shown in fig. 2b, the data of the respective data side is divided into batches by means of respective dashed lines, such as dashed line 201. The data of each batch are horizontally split with each other. And a single batch of data may be internally presented as a vertical slice, or individually hold a small amount of complete sample data (such as sample data held by the data side 11 and 12 in fig. 2 b), and so on. In summary, the segmented data comprises at least a set of vertically sliced data parties. For the situation that data of one data party and a plurality of different data parties form vertical segmentation, intersection can be determined in modes of privacy intersection (SPI) and the like, and therefore alignment segmentation of samples is conducted. The specific privacy intersection method is determined according to the service requirement, and is not described herein again.

In view of this idea, please refer to fig. 3a and fig. 3b, which are schematic diagrams of two specific distributed federal learning architectures of the system for jointly updating the model according to the present disclosure. The system for jointly updating the model is functionally divided and comprises the following steps: a federal service and subsystems surrounded by dashed boxes such as dashed boxes 3011, 3012, etc. Wherein, each subsystem can be independent from the system function and the model setting. In other words, the subsystems can be viewed as "training members" split in parallel with each other. The federal service can be used to synchronize the parameters to be synchronized in the global model W. The parameter to be synchronized can generally be a pending parameter in the model W, or a gradient of the pending parameter. The parameters to be synchronized and the parameters to be determined correspond one to one. It is easy to understand, hypothesisThe number of the systems is n (n is a positive integer greater than 1), and any single subsystem i (i is greater than or equal to 1 and less than or equal to n) can correspond to a local model W which is consistent with the structure of the global model W_i. With reference to fig. 2a and 2b, according to an actual data distribution situation, at least 1 subsystem exists in each subsystem, and the segmentation federated learning may be performed on vertically sliced sample data, for example, in fig. 2b, a subsystem for performing segmentation federated learning on sample data composed of a data party 5, a data party 9, and a data party 10.

In the architectures shown in fig. 3a and 3b, only the subsystem for performing the split federal learning on the vertically split sample data is shown, and in practice, the architecture may also include a case where the subsystems are separately formed like the data party 11 and the data party 12 in fig. 2b, which is not described herein again.

As can be seen from the schematic diagrams of fig. 2a and 2b, in the system architectures shown in fig. 3a and 3b, a single training member may be one data party or may be a part of a single data party. That is, one data party may be divided into multiple subsystems as respective training members according to the data provided. Thus, a single training member in a single subsystem shown in fig. 3a or fig. 3b may represent one data party, or may represent a portion of one data party, or multiple training members may be from the same data party. As shown in fig. 3a and 3b, the difference is that fig. 3a shows a single subsystem comprising at least two training members, whereas fig. 3b shows a single subsystem comprising a sub-server in addition to at least two training members.

In the implementation architecture shown in fig. 3a, model training can be performed by means of multi-party security computing (MPC) among training members. Assuming that the subsystem i (i ═ 1, 2 … … n) is a subsystem for performing segmentation federated learning on vertically sliced sample data, 2 training members for vertically sliced data, for example, denoted as C_i1And C_i2Local model W_iIs divided into a plurality of parts, e.g. comprising training member C_i1Sub-model W set up on_ci1Training Member C_i1Upper arranged submodel W_ci2. At this time, in combination trainingThe model of (2) is a neural network for example, and the distribution of sub-models within a subsystem can be shown, for example, in FIG. 4a, where the characteristics of the gray parts, the weighting parameters, the neural network are represented by the data side C_i1The characteristics, weight parameters and neural network of the black part are held by the data side C_i2And (4) holding. Due to training member C_i1And training Member C_i1The data on the member C form a vertical segmentation, and the data of the data side can be used for the calculation of a single data side, then the member C is trained_i1And training Member C_i1Data can be interacted in modes of homomorphic encryption, secret sharing and the like, and the local model W corresponding to the training subsystem i is combined under the condition that the privacy of the local data is not disclosed_i。

In the implementation architecture shown in fig. 3b, the subsystem i may correspond to a sub-server, as denoted by S_iAnd at least 2 training members for vertically slicing the data, e.g. denoted C_i1And C_i2. Accordingly, the local model W_iIs divided into a plurality of parts, e.g. comprising training member C_i1Sub-model W set up on_ci1Training Member C_i1Upper arranged submodel W_ci2And a sub-server S_iSub-model W in the sub-system set up_siAnd the like. At this time, the architecture of each model in the subsystem i can be as shown in FIG. 4b, sub-model W_ci1And W_ci2For submodels connected in parallel, and then with submodels W_siAre connected in series. In the split Federal learning Process, sub-model W_ci1And W_ci2Respectively processing the samples of the current batch in a training member C_i1Training Member C_i1Up distributed part, then send the intermediate result obtained to the sub-server i, by the sub-model W on the sub-server_siAnd processing the intermediate result to obtain a prediction result.

For the implementation architecture of fig. 3b, from the device and data attribution perspective: the federal service party and each sub-service party may belong to the same trusted third party, or may be provided in the same distributed device cluster, or may belong to different trusted third parties, which is not limited herein.

It should be noted that fig. 3a and 3b are examples, not exhaustive, of the system for jointly updating the model in the present specification. In practice, the system of jointly updating the models may also be arranged in other ways. For example, some subsystems may contain sub-servers as shown at 3012, some subsystems may not contain sub-servers as shown at 3011, and so on.

In addition, in FIG. 3a and FIG. 3b, n training members C_i1(i-1, 2 … … n), some or all of the data entities may be the same data entity, or may belong to different data entities, and similarly, n training members C_i2(i 1, 2 … … n), some or all of the data parties may be the same data party, or may belong to different data parties; and are not limited herein. For example, in the scenario shown in FIG. 2a, member C is trained₁₂、C₂₂……C_n2May both indicate data party B. It is worth noting that, in this case, the data side B can arrange a plurality of submodels W_ci2Also, a submodel W is arranged_c2As a sub-model W common to the subsystems_ci2. For clarity of description and greater generality, the submodels W will be referred to below_ci2Each submodel W_ci2In some alternative examples the same submodel may be indicated. In a single subsystem, train Member C_i1For example, it may be referred to as a first member, training member C_i2For example, may be referred to as a second member.

Wherein the model W can be negotiated and determined by each training member. Respective local model W_iThe structure of W is uniform, for example, the number of neural network layers, the weight matrix, and the like. Each local model may have nuances depending on the respective subsystem. Within the subsystem, the local model W_iThe distribution of the submodels on each member can be determined according to the feature dimension held by each member, and will not be described herein. In an alternative embodiment, the local model W_iCompletely consistent with the global model W to be jointly updated, the Federal service side can initialize W and send W to each subsystem to obtain W_i. For example, under the architecture of fig. 3a, W may be split by a federal service side for each data side_iDivided into multiple sub-model assignmentsTo the respective data parties. Under the framework shown in fig. 3b, the federal service side initializes W, and issues W to each sub-service side to obtain W_i. Sub-server S_iSplitting the model W into sub-models W of training members_ciAnd sub-model W on sub-server side_siAnd according to the feature dimension between training members, W_ciSplitting into W_ci1And W_ci2And the like. According to one embodiment, the submodel W_iThe sub-server S can be modified on the basis of W according to the actual business requirements, such as adding constant item parameters and the like_iW can also be_iAnd splitting after modification.

Further, from the aspect of data interaction, inside the subsystem: under the implementation architecture of fig. 3a, secure interaction is performed between various data parties inside a single subsystem; under the implementation architecture of fig. 3b, each member may interact with a sub-server, and the members are independent from each other, for example, after each member processes local data by using a local sub-module, each member may provide an intermediate processing result to the sub-server, and the sub-server may feed back gradient data of the intermediate result to each member in the subsystem. In general terms: each training member in the subsystem in fig. 3a may interact with the data to be synchronized by the federal service side, and each subsystem and each training member in fig. 3b may interact with the federal service side. As shown by the dashed two-way arrows in fig. 3a and fig. 3b, each sub-server and/or each training member may send the updated value of the local parameter to be synchronized to the federal server, and the federal server feeds back the synchronized value of the parameter to be synchronized to the federal server.

It should be noted that in FIG. 3a or FIG. 3b, if corresponding to the architecture of FIG. 2a, a plurality of submodels W are provided_ci2The data party B can complete the relevant parameter synchronization locally without the need of carrying out synchronization through a federal service party in the case of being provided with one data party B in a whole or being the same submodel on the data party B.

The system architecture divides a data side of a mixed architecture according to a training mode through distributed arrangement of a service side to form a horizontal division architecture among subsystems and a vertical division architecture inside the subsystems, so that an integral federal learning system and an internal sub-federal learning system are matched with each other, a division learning algorithm and a parallel synchronous learning algorithm are combined, distributed training of a model under a vertical and horizontal composite scene is realized, and a corresponding solution is provided for a more complex federal learning scene.

The technical idea of the present specification will be described in detail below by taking as an example the operation performed by the system of jointly updating models shown in fig. 3a or fig. 3b within one parameter synchronization period.

It is understood that in a system for jointly updating the model, the executed process may include several small loops inside the subsystem and one large loop of the whole system in one parameter synchronization period. FIG. 5 illustrates a flow diagram of a joint update model according to an embodiment of the present description.

As shown in fig. 5, the process includes the following steps: step 501, each subsystem respectively utilizes training samples vertically split on a corresponding first member and a corresponding second member to perform joint training in a vertical splitting mode aiming at a corresponding local model, and each training member provides an updated value of each parameter to be synchronized in a corresponding sub-model corresponding to each parameter to be determined one by one to a federal service party under the condition that a synchronization condition is met; 502, safely synchronizing the updated values of the parameters to be synchronized from a plurality of subsystems by a federal service party, and feeding back the synchronization value of each parameter to be synchronized; step 503, each training member in each subsystem receives the synchronization value of the local parameter to be synchronized, so as to update the local parameter to be determined.

First, in step 501, each subsystem performs joint training in a vertical segmentation mode on a corresponding local model by using training samples vertically segmented on a corresponding first member and a corresponding second member, and each training member provides an updated value of each parameter to be synchronized, which corresponds to each parameter to be determined one by one, in a corresponding sub-model to a federal service party under the condition that a synchronization condition is satisfied.

It can be understood that the federal learning in the vertical slicing mode is performed for the corresponding sub-models by using the vertically sliced training samples. The training architecture in this manner is shown in fig. 4a or fig. 4 b. In a training period of the subsystem, aiming at a batch of training samples, the undetermined parameters in the sub-model can be updated through forward transmission and backward propagation of data among training members or between the training members and the sub-server.

As shown in fig. 4a, under the framework shown in fig. 3a, since there is no assistance from the sub-server, forward data transfer and backward gradient propagation are completed between training members through multi-party secure computation, and each training member obtains the gradient of the undetermined parameter included in the corresponding sub-model.

To describe the technical solution of the present specification more clearly, fig. 6 shows a timing chart of operations performed by any vertically-sliced subsystem (such as the system shown in fig. 4 b) in the system of the joint update model corresponding to fig. 3b in one parameter synchronization period. The dotted line 601 part is a processing flow of federal learning in a vertical segmentation mode of the subsystem. The operation in this step 501 is described below in conjunction with the step shown in the dashed box S601 in fig. 6.

As shown in the dashed box S601 of fig. 6, through S6011, for several samples of the current round, the first member and the second member respectively process local data through local sub-modules, and deliver an intermediate result to the sub-server. Taking subsystem i as an example, the first member C_i1And a second member C_i2Each passing through a first submodel W_ci1And a second submodel W_ci2Processing corresponding local sample data to respectively obtain corresponding first intermediate results R_it1The second intermediate result R_it2To send to the sub-server S_i。

Wherein the first member C_i1And a second member C_i2The processed sample data is in one-to-one correspondence, that is, the sample bodies are consistent, and for example, the local data is processed in a manner that the same sequence corresponds to the same sample identifier. The sample body is determined according to a specific service scenario, for example, the sample body is a user, and at this time, the sample identifier may uniquely mark the sample body, for example, an identity card number, a mobile phone number, and the like. First member C_i1And a firstTwo members C_i2When the sub-system is divided, the sample data can be determined by privacy intersection and other modes. In the current federal learning process, the information can be learned by the sub-server i or the first member C_i1Second member C_i2Determining the sequence of the sample data of the current batch in other appointed modes, and respectively passing through the first submodel W according to the corresponding sequence_ci1The second submodel W_ci2And carrying out corresponding data processing to obtain respective intermediate results so as to ensure that the first intermediate result and the second intermediate result of each training sample are corresponding to each other. The individual data in the intermediate result may be delivered to the child servant i in a form such as a vector, array, or the like, having a agreed order. As can be seen in FIG. 6, the first member C_i1And a second member C_i2The data processing and the intermediate result sending processes are mutually independent, and the private data can be ensured not to be leaked. In order to enforce privacy protection when sending intermediate results, first member C_i1And a second member C_i2And disturbance can be added to the intermediate result through modes such as differential privacy and the like, or the intermediate result is encrypted by adopting modes such as homomorphic encryption and secret sharing.

Further, through S6012, the sub-server S_iBased on sub-model W_siFor the first intermediate result R_it1The second intermediate result R_it2Processed to the first member C respectively_i1Second member C_i2Feeding back the first intermediate result R_it1Gradient of (2), second intermediate result R_it2Of the gradient of (c). In order to determine the gradient of the first intermediate result and the second intermediate result, a prediction result for the training sample needs to be determined first. The prediction result is the service result of the service to be predicted, and the prediction accuracy can be checked by a sample label. The sample labels may be commonly held by some training members in the current subsystem, but not exclusively, the training members provide the sample labels to the sub-server S_iOr sample label by sub-servant S_iA possibility of holding.

In one embodiment, the sub-server S_iSample labels can be obtained, at which point predictions can be made by the sub-serverAnd finally, comparing the sample label with the prediction result to determine the model loss.

In another embodiment, the sample label may also be held by a portion of the training members.

In one case, as shown in FIG. 6, the sub-server S may be used_iSending the prediction result to the training member who holds the sample label (such as the first member C)_i1) The corresponding training member completes the comparison of the sample label and the prediction result, thereby determining the model loss and feeding back the model loss to the sub-server S_i. Fig. 6 is one of the cases of model loss determination, and thus the corresponding timing is represented by a dotted line. In the case where a plurality of training members each hold a partial sample label, the child server S_iAnd the prediction result data of the corresponding training samples can be sent to each training member respectively to compare model loss, which is not described herein again. The model loss is set according to a specific service scenario, and may be in various forms such as a mean square error, a cross entropy, a cosine similarity, and the like. Generally, the current model loss is the sum of model losses caused by the current batch of training samples, and the number of training samples in the current batch may be 1, or multiple (e.g., 100), and so on. Sub-server S_iFurthermore, the gradient of each intermediate result, i.e. the partial derivative of the model loss to the intermediate result, can be determined from the model loss. According to the definition of the partial derivatives, the gradient is transitive, and in order to determine the gradient of each undetermined parameter in the submodule of each training member, the gradient of the intermediate result determined by the undetermined parameter can be determined first. Thus, it can be served by the sub-server S_iAnd determining gradients of a first intermediate result and a second intermediate result respectively corresponding to the first member and the second member, and respectively returning the gradients to the corresponding training members. Optionally, in the submodel W_siIn the case of pending parameters, the sub-server S_iThe submodel W may also be determined locally_siTo update the local parameters.

In another case, the tag holder may be further provided with a third submodel W_ci3Sub-server S_iBased on sub-model W_siFor the first intermediate result R_it1The second intermediate result R_it2The processing carried out may also result in a third intermediate result R_it3Then the third intermediate result R_it3Sent to the label holder by the third submodel W_ci3Processing determines a third intermediate result R_it3Prediction results for training samples. Then, the label holder calculates the model loss according to the prediction result and the gradient of the third intermediate result, and transmits the model loss and the gradient of the third intermediate result back to the sub-server S_i. Further, the sub-server S_iDetermining a model loss versus a first intermediate result R_it1The second intermediate result R_it2Of the gradient of (c).

In other cases, the model may have other architectures, and the gradient may have other different determination manners, which are not described herein again. In summary, the gradient has the property of back propagation, and the gradient of the model loss to the first intermediate result and the second intermediate result can be determined by the corresponding sub-servers.

On the other hand, according to S6013, first member C_i1And a second member C_i2Each using the first intermediate result R_it1And a second intermediate result R_it2Determining a first submodel W_ci1And a second submodel W_ci2To locally update the first submodel W, respectively_ci1The second submodel W_ci2The parameter (1).

It is worth noting that in the update iteration of the subsystem i, the parameters updated by each participant (such as each training member in the exemplary architecture of fig. 3a, or the sub-server and each member in the exemplary architecture of fig. 3 b) may generally include the current pending parameter gradient, and the pending parameters updated according to the gradient. When the subsystem does not need to perform parameter synchronization with other subsystems, each participant can update the gradient and the undetermined parameters in sequence, and under the condition of performing parameter synchronization with other subsystems, each participant can determine to update the parameters to be synchronized and send the updated values of the parameters to be synchronized to the federal service side. The parameter to be synchronized is understood to be a parameter that needs to be synchronized between the subsystems. Here, the parameter to be synchronized may be a gradient or a parameter to be determined.

In the process of jointly updating the model W, a condition for parameter synchronization may be preset, for example, a predetermined iteration turn (for example, 5 turns) or a predetermined time (for example, 3 minutes) is set, and in a case that the synchronization condition is satisfied, each subsystem may stop iteration and send an updated value of the currently determined pending parameter to the federal service side. Under the architecture shown in fig. 3b, as shown in fig. 6, in the case that the sub-server also has the parameter to be synchronized, the sub-server and the training member together send the updated value of the local parameter to be synchronized to the federal server. Since there may be a case where the sub-server does not have the parameter to be synchronized, the flow of uploading the local parameter to the federal server in fig. 6 is represented by a dotted line, which represents an alternative according to the actual business situation. It will be appreciated that under the architecture shown in FIG. 3a, the process of uploading parameters is similar to that of FIG. 6, except that there are no child servers, and therefore only membership directions are trained; and uploading the parameters to be synchronized by the shoulder service party. In the parameter uploading process, each participant only sends local parameters, so that the data privacy is effectively protected. Optionally, each participant may further encrypt the local parameter by adding perturbation (e.g., perturbation data satisfying differential privacy), or by using a homomorphic encryption, and then send the local parameter to the federal service side.

In this way, the parameter synchronization period of each subsystem in the process of jointly updating the model is controlled by using the synchronization condition, and in a single synchronization period, one or more iterations can be performed in the single subsystem, so that the current parameter to be synchronized is fed back to the federal service side when the parameter synchronization period arrives. In a single iteration process, each training member processes data held by the training member, and data privacy is effectively protected.

Next, through step 502, the federal service side performs security synchronization on the updated values of the parameters to be synchronized received from the multiple subsystems, and feeds back the synchronization values of the parameters to be synchronized.

Referring to FIG. 6, which includes the sub-server architecture, in S602, the parameter synchronization process is performed by using the updated values of the parameters to be synchronized received from the subsystems and corresponding to the parameters to be determined in a one-to-one mannerAnd (5) line fusion. For a single parameter to be synchronized, there may be corresponding update values sent from the various subsystems. The federal service side can obtain the average value, the median value, the maximum value, the minimum value and the like of the updated values to obtain the synchronous value of the single parameter to be synchronized (such as the synchronous parameter in fig. 6). In the case that the parameter to be synchronized is a gradient, the synchronization value of the gradient may also be determined in an additive manner. And then, the federal service side can feed back the synchronous value of each parameter to be synchronized to the corresponding participant. Wherein, the first member C_i1For example, for a first member C_i1A first parameter of the corresponding parameters to be synchronized by the first member C_i1Sending the updated value of the first parameter to the federal service side, and after the federal service side synchronizes the first parameter by using the updated value of the first parameter fed back by the participator in other subsystems to obtain a first synchronization value, feeding back the first synchronization value to the first member C_i1. In this way, for each parameter to be synchronized, the parameter is only transmitted between the related participant and the trusted federal service side, and the data privacy is guaranteed.

In addition, in the case that the participant of the subsystem encrypts the parameters to be synchronized by using a method such as homomorphic encryption, the federal service side can also perform secure synchronization of data in a homomorphic encryption manner. The federal service side may feed back the disclosed synchronization value to the participant of each subsystem, or may feed back the synchronization value in the corresponding encryption mode, which is not limited in this specification.

Then, via step 503, the corresponding sub-server and each member in each subsystem each receive the synchronization value of the local corresponding parameter to be synchronized, so as to update the local parameter to be determined. The participants corresponding to the single subsystem i can at least comprise a first member C_i1Second member C_i2Sub-server S may also be included under some architectures_i. In this step 503, as shown with reference to S603 in fig. 6, the single participant receives only the synchronization values of the corresponding parameters to be synchronized in the locally relevant sub-modules. It can be understood that, in the implementation architecture of the present specification, a subsystem may include a sub-server or may not include a sub-server, in the case of including a sub-serverUnder the condition, the sub-server S_iThe parameter to be synchronized may or may not be corresponded, and therefore, in S603 of fig. 6, the sub-server S is related to_iIs indicated by a dashed line, the sub-server S is included only in the sub-system_iAnd exists in the case of corresponding parameters to be synchronized.

In the case that the parameter to be synchronized is a parameter to be determined, a single participant may directly update the local parameter to be determined with the synchronization value of the parameter to be synchronized. When the parameter to be synchronized is the gradient of the parameter to be determined, the synchronization value of the parameter to be synchronized can be used as the current gradient, and the corresponding parameter to be determined can be updated by using a gradient-related method such as a gradient descent method and a Newton method.

It should be noted that in the embodiments of fig. 5 and fig. 6, only the first member and the second member included in a single subsystem are taken as examples for description. Under more complex composite architectures, there may be more types of members, e.g., referred to as third members, fourth members, etc. In practice, for such a system for jointly updating a model with a mixed segmentation, there are at least 1 subsystem, and the subsystem includes at least 2 training members, and sample data of these training members constitutes a vertical segmentation. That is, a system that can segment a joint update model of at least one such subsystem can be applied to the technical solution provided in the present specification.

It is to be understood that the systems shown in fig. 3a and 3b perform joint update model operation via the flow described in fig. 5, and the flow described in fig. 5 is a flow using the system shown in fig. 3a or 3b, so that the related descriptions for the systems shown in fig. 3a and 3b and the flow described in fig. 5 can be adapted to each other. In particular, fig. 6 is a flowchart corresponding to the architecture of fig. 3b, and the related descriptions in fig. 3b and fig. 6 may be adapted to each other, and are not repeated herein.

In addition, from the perspective of the federal service, the flow of the joint update model provided in an embodiment of the present specification may include: respectively receiving updated values of the parameters to be synchronized, which correspond to the parameters to be determined in the corresponding submodels one by one under the condition that the synchronization condition is met, from each subsystem, wherein each subsystem is singleIn System i, first Member C_i1Second member C_i2Providing local models W, respectively_iNeutron model W_siThe first sub-model W_ci1The second submodel W_ci2To be synchronized parameter update value, local model W_iThe updating value of each parameter to be synchronized is determined based on joint training of the subsystem i in a vertical segmentation mode aiming at the corresponding submodel; and carrying out safety synchronization on the update values of the parameters to be synchronized received from the plurality of subsystems, and feeding back the synchronization value of each parameter to be synchronized so that the corresponding training member or the sub-server can complete the update of the parameters to be determined of the local module.

In an implementation architecture including a servlet, a servlet S_iFrom an execution perspective, the process of jointly updating a model according to one embodiment includes: with a corresponding first member C_i1Second member C_i2Together, are utilized in the corresponding first member C_i1Second member C_i2Training samples with upper composition vertical segmentation are directed to corresponding local model W_iPerforming split federation learning in a vertical split mode; if the synchronization condition is satisfied, the sub-model W will be associated with_siThe updated values of the parameters to be synchronized, which correspond to the parameters to be determined one by one, are sent to a federal service side, so that the federal service side can safely synchronize the parameters to be synchronized based on the updated values of the parameters to be synchronized, which are received from a plurality of subsystems; obtaining a sub-model W from a federal service side after secure synchronization_siThe synchronous value of each parameter to be synchronized to update the submodel W_siTo each pending parameter in (1).

From the first member C_i1From an execution perspective, the process of jointly updating a model according to one embodiment includes: using local and second members C_i2Training samples with upper composition vertical segmentation are directed to corresponding local model W_iPerforming joint training to obtain a first sub-model W_ci1The updating value of each parameter to be synchronized is in one-to-one correspondence with each parameter to be determined; in case the synchronization condition is satisfied, it will be associated with the first submodel W_ci1The updated value of each parameter to be synchronized corresponding to each parameter to be determined one by one is sent to the federal serviceThe system comprises a side for the federal service side to safely synchronize each parameter to be synchronized based on the updated value of the parameter to be synchronized received from each subsystem; obtaining a first sub-model W subjected to safety synchronization from a federal service side_ci1The synchronous value of each parameter to be synchronized to update the first submodel W_ci1To each pending parameter in (1).

From the second member C_i2From an execution perspective, the process of jointly updating a model according to one embodiment includes: using local and first member C_i1Training samples with upper composition vertical segmentation are directed to corresponding local model W_iPerforming joint training to obtain a first sub-model W_ci1The updating value of each parameter to be synchronized is in one-to-one correspondence with each parameter to be determined; in case the synchronization condition is satisfied, it will be associated with the second submodel W_ci2The updated values of the parameters to be synchronized, which correspond to the parameters to be determined one by one, are sent to a federal service side, so that the federal service side can safely synchronize the parameters to be synchronized based on the updated values of the parameters to be synchronized, which are received from the subsystems; obtaining a second sub-model W subjected to safety synchronization from a federal service side_ci2The synchronization value of each parameter to be synchronized to update the second submodel W_ci2To each pending parameter in (1).

It should be noted that, since the operations performed by the first member and the second member are identical, in the present specification, the names of the first member and the second member do not substantially distinguish the corresponding training members, and the performed operations are distinguished only by the identifiers, so that the operation performed by one member is also applicable to the other member.

Reviewing the above flow, the present specification provides a concept of a joint update model for data compound slicing scenarios. The compound segmentation of the data can comprise horizontal segmentation and horizontal segmentation, so that the conventional federal learning cannot be applied. In view of the above, the present specification contemplates segmenting the data of each training member to form a plurality of horizontally sliced subsystems, which may include training members with vertically sliced data within a single subsystem. In this way, the single subsystem with the data being vertically split iterates through the training samples distributed in the plurality of training members in the subsystem, so as to update the parameters to be synchronized. And the data synchronization can be carried out among the subsystems according to the synchronization period triggered by the synchronization condition. The method fully considers the data composition of each training member, provides a solution for the joint update model under the complex data structure, and is beneficial to expanding the application range of the federal learning.

According to an embodiment of another aspect, there are also provided respective apparatuses adapted to the above system for jointly updating a model. The means for jointly updating the model may include means for locating a federal service, means for locating a sub-service, and means for locating a first member or a second member. Fig. 7 shows a block diagram of a device provided on the federal service side, and fig. 8 shows a block diagram of a device commonly used on a sub-service side, a first member or a second member.

As shown in fig. 7, the apparatus 700 includes an acquisition unit and a synchronization unit. Wherein: the obtaining unit 71 is configured to receive, from each subsystem, an updated value of each to-be-synchronized parameter that corresponds to each to-be-determined parameter in the corresponding local model one to one when the synchronization condition is satisfied, where the updated value of the to-be-synchronized parameter provided by a single subsystem i is determined based on joint training of the subsystem i in a vertical segmentation manner for the corresponding sub-model; the synchronization unit 72 is configured to perform security synchronization on the update value of each parameter to be synchronized received from the multiple subsystems, and feed back the synchronization value of each parameter to be synchronized, so that the corresponding training member or the sub-server completes the update of the parameter to be determined of the local module.

As shown in fig. 8, the apparatus 800 includes a training unit 81, a providing unit 82, and a synchronizing unit 83.

At device 800 is located at sub-server S_iIn the case of the apparatus for jointly updating a model of (1): a training unit 81 configured to communicate with a respective first member C_i1Second member C_i2Together, are utilized in the corresponding first member C_i1Second member C_i2Training samples with vertical segmentation are aimed at corresponding submodels W_iPerforming joint training (also called federal learning) in a vertical segmentation mode; a providing unit 82 configured to satisfyIn case of step condition, the AND submodule W_siThe updated values of the parameters to be synchronized, which correspond to the parameters to be determined one by one, are sent to a federal service side, so that the federal service side can safely synchronize the parameters to be synchronized based on the updated values of the parameters to be synchronized, which are received from a plurality of subsystems; a synchronization unit 83 configured to obtain the safely synchronized submodel W from the federal service side_siThe synchronous value of each parameter to be synchronized to update the submodel W_siTo each pending parameter in (1).

The apparatus 800 is provided for the first member C_i1(or a second member C_i2) In the case of the apparatus for jointly updating a model of (1): a training unit 81 configured to utilize the local and second member C_i2(or first member C_i1) Training samples with upper composition vertical segmentation are directed to corresponding local model W_iPerforming split federal learning; a providing unit 82 configured to, in case a synchronization condition is fulfilled, associate with the first submodel W_ci1(or a second submodel W_Ci2) The updated values of the parameters to be synchronized, which correspond to the parameters to be determined one by one, are sent to a federal service side, so that the federal service side can safely synchronize the parameters to be synchronized based on the updated values of the parameters to be synchronized, which are received from the subsystems; a synchronization unit 83 configured to obtain the first sub-model W subjected to the secure synchronization from the federal service side_ci1(or a second submodel W_Ci2) The synchronous value of each parameter to be synchronized to update the first submodel W_ci1(or a second submodel W_Ci2) To each pending parameter in (1).

It should be noted that the apparatus 700 shown in fig. 7 and the apparatus 800 shown in fig. 8 are respectively an embodiment of an apparatus provided on a federal service side and a training member in the method embodiment shown in fig. 5, so as to implement functions of a corresponding business side. Therefore, the corresponding description in the method embodiment shown in fig. 5 is also applicable to the apparatus 700 or the apparatus 800, and is not repeated here.

According to an embodiment of another aspect, a computer-readable storage medium is further provided, on which a computer program is stored, which, when executed in a computer, causes the computer to perform operations corresponding to any one of the participants of the method described in conjunction with fig. 5 and 6.

According to an embodiment of still another aspect, a computing device is further provided, which includes a memory and a processor, where the memory stores executable codes, and when the processor executes the executable codes, the processor implements operations corresponding to any one of the participants in the methods described in conjunction with fig. 5 and fig. 6.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of this specification may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above embodiments are only intended to be specific embodiments of the technical concept of the present disclosure, and should not be used to limit the scope of the technical concept of the present disclosure, and any modification, equivalent replacement, improvement, etc. made on the basis of the technical concept of the embodiments of the present disclosure should be included in the scope of the technical concept of the present disclosure.

Claims

1. A system for jointly updating a model comprises a federal service party and a plurality of subsystems, wherein the plurality of subsystems are used for jointly updating the model W, and a single subsystem i in the plurality of subsystems comprises a first member C in training members_i1Second member C_i2First member C_i1Second member C_i2The held sample data form vertical segmentation, the sample data held by each subsystem form horizontal segmentation, and a single subsystem i corresponds to a local model W with the same structure as the model W_iLocal model W_iComprises a first member C_i1First submodel W of_ci1Is arranged on the second member C_i2Second submodel W of_ci2(ii) a Wherein:

a single subsystem i for use with the first member C_i1Second member C_i2Training samples for up-vertical segmentation are directed to local model W_iPerforming joint training in a vertical segmentation mode, and providing a corresponding local model W for the federal service side under the condition of meeting a synchronization condition_iUpdating values of the parameters to be synchronized corresponding to the parameters to be determined one by one, and synchronizing the local parameters to be synchronized with the parameters to be synchronized in each subsystem according to the synchronous values of the parameters to be synchronized fed back by the federal service side, so as to adjust the corresponding parameters to be determined;

and the federal service side is used for carrying out safety synchronization on the updated values of the parameters to be synchronized from the subsystems and feeding back the synchronized values.

2. A method for jointly updating a model is applicable to a process of jointly updating a model W of a system of models, wherein the system comprises a federal service provider and a plurality of subsystems, and a single subsystem i in the plurality of subsystems comprises a first member C in training members_i1Second member C_i2First member C_i1Second member C_i2The held sample data form vertical segmentation, the sample data held by each subsystem form horizontal segmentation, and a single subsystem i corresponds to a local model W with the same structure as the model W_iLocal model W_iComprises a first member C_i1First submodel W of_ci1Is arranged on the second member C_i2Second submodel W of_ci2(ii) a The method comprises the following steps:

each subsystem respectively utilizes the training samples vertically split on the corresponding first member and the second member to carry out combined training in a vertical splitting mode aiming at the corresponding local model, and each training member respectively provides the updated value of each parameter to be synchronized corresponding to each parameter to be determined in the corresponding sub-model to the federal service side under the condition of meeting the synchronization condition;

the federal service party carries out safe synchronization on the updated values of the parameters to be synchronized from a plurality of subsystems and feeds back the synchronization values of the parameters to be synchronized;

and each training member in each subsystem receives the synchronization value of the local parameter to be synchronized so as to update the local parameter to be determined.

3. The method of claim 2, the single subsystem i further comprising a sub-server S_iSingle subsystem i vs. local model W_iThe joint training performed includes:

for several samples of the current round, a first member C_i1And a second member C_i2Each passing through a first submodel W_ci1And a second submodel W_ci2Processing corresponding local sample data to respectively obtain corresponding first intermediate results R_it1The second intermediate result R_it2To send to the sub-server S_i；

Sub-server S_iBased on the third submodel W_siFor the first intermediate result R_it1The second intermediate result R_it2Processed to the first member C respectively_i1Second member C_i2Feeding back the first intermediate result R_it1Gradient of (2), second intermediate result R_it2A gradient of (a);

first member C_i1And a second member C_i2Each using the first intermediate result R_it1And a second intermediate result R_it2Determining a first submodel W_ci1And a second submodel W_ci2To determine the gradient of the parameter to be determined, to determine the first submodel W, respectively_ci1And a second submodel W_ci2The updated value of the parameter to be synchronized.

4. The method of claim 3, wherein the label-holders for the samples of the current round in a single subsystem i are first members C_i1Or a second member C_i2(ii) a The sub-server S_iBased on the third submodel W_siFor the first intermediate result R_it1The second intermediate result R_it2Processed to the first member C respectively_i1Second member C_i2Feeding back the first intermediate result R_it1Gradient of (2), second intermediate result R_it2LadderThe degree further includes:

the sub-server S_iBased on the third submodel W_siFor the first intermediate result R_it1And a second intermediate result R_it2Processing to obtain a prediction result, and sending the prediction result to the label holder;

the label holder determines corresponding model loss through comparison of label data of a plurality of samples of the current round with the prediction result so as to feed back the model loss to the sub-server S_i；

The sub-server S_iDetermining for a first intermediate result R from the model loss_it1And a second intermediate result R_it2Of the gradient of (c).

5. The method of claim 4, wherein in the third submodel W_siUnder the condition of undetermined parameters, the sub-server S_iAlso detecting the model loss for the third submodel W_siIncluding the gradient of the parameter to be determined.

6. The method of claim 3, wherein the label-holders for several samples of a single subsystem i current turn are first members C_i1Or a second member C_i2The label holder is provided with a fourth submodel W_ci3(ii) a The sub-server S_iBased on the third submodel W_siFor the first intermediate result R_it1The second intermediate result R_it2Processed to the first member C respectively_i1Second member C_i2Feeding back the first intermediate result R_it1Gradient of (2), second intermediate result R_it2Further comprising:

the sub-server S_iBased on the third submodel W_siFor the first intermediate result R_it1And a second intermediate result R_it2Is processed to obtain a third intermediate result R_it3And the third intermediate result R_it3Sending the label to the label holder;

the label holder passesFourth submodel W_ci3Processing the third intermediate result R_it3Obtaining corresponding prediction results, and determining a third intermediate result R of model loss aiming at the current turn based on the comparison of the label data of a plurality of samples of the current turn and the prediction results_it3To be fed back to said sub-server S_i；

The sub-server S_iAccording to the third intermediate result R_it3For the first intermediate result R is determined_it1And a second intermediate result R_it2Of the gradient of (c).

7. The method of claim 2, wherein subsystem i is paired with local model W_iThe joint training performed includes:

each training member in the subsystem i performs multi-party safety calculation so that each training member can determine the gradient of model loss aiming at the local undetermined parameters;

each training member determines the updating value of the parameter to be synchronized based on the gradient of the parameter to be determined in the corresponding submodel, wherein the first member C_i1And a second member C_i2Respectively determining a first submodel W_ci1And a second submodel W_ci2The updated value of the parameter to be synchronized.

8. The method of claim 2, wherein the synchronization condition comprises: each local model is updated in a predetermined round or a predetermined time period.

9. The method of claim 2, wherein the single parameter to be synchronized is a single parameter to be determined or a single gradient corresponding to the single parameter to be determined.

10. The method of claim 2, wherein the federated server securely synchronizing updated values for parameters to be synchronized from a plurality of subsystems comprises:

the federal service side receives all parameters to be synchronized which are respectively sent by all training members and encrypted in a preset encryption mode;

and the federal service side performs fusion of at least one mode of addition, weighted average and median value solving on the respective updated values of the parameters to be synchronized to obtain corresponding synchronization values.

11. The method of claim 10, wherein the predetermined encryption scheme comprises one of: adding perturbations that satisfy differential privacy; homomorphic encryption; and (4) secret sharing.

12. A method for jointly updating a model is applicable to a process of jointly updating a model W of a system of models, wherein the system comprises a federal service provider and a plurality of subsystems, and a single subsystem i in the plurality of subsystems comprises a first member C in training members_i1Second member C_i2First member C_i1Second member C_i2The held sample data form vertical segmentation, the sample data held by each subsystem form horizontal segmentation, and a single subsystem i corresponds to a local model W with the same structure as the model W_iLocal model W_iComprises a first member C_i1First submodel W of_ci1Second member C_i2Second submodel W of_ci2(ii) a The method is performed by the federal service, and includes:

respectively receiving updated values of the parameters to be synchronized which are in one-to-one correspondence with the parameters to be determined in the corresponding sub-model under the condition that the synchronization condition is met from each subsystem, wherein the updated values of the parameters to be synchronized provided by the single subsystem i are based on the subsystem i aiming at the corresponding local model W_iPerforming joint training determination in a vertical segmentation mode;

and carrying out safe synchronization on the updated values of the parameters to be synchronized from the subsystems, and feeding back the synchronization value of each parameter to be synchronized so as to allow the corresponding training member or the sub-server to complete the updating of the parameters to be determined of the local module.

13. Method for jointly updating model, and the method is suitable forProcess for updating a model W of a system for jointly updating a model, the system comprising a federal service and a plurality of subsystems, a single subsystem i of the plurality of subsystems comprising a first member C of training members_i1Second member C_i2First member C_i1Second member C_i2The held sample data form vertical segmentation, the sample data held by each subsystem form horizontal segmentation, and a single subsystem i corresponds to a local model W with the same structure as the model W_iLocal model W_iComprises a first member C_i1First module W_ci1Is arranged on the second member C_i2Second module W_ci2(ii) a The method comprises a first member C_i1Executing, including:

using local and second members C_i2Constructing vertically sliced training samples for respective local models W_iPerforming joint training to obtain a first sub-model W_ci1The updating value of each parameter to be synchronized is in one-to-one correspondence with each parameter to be determined;

in case the synchronization condition is satisfied, it will be associated with the first submodel W_ci1The updated values of the parameters to be synchronized, which correspond to the parameters to be determined one by one, are sent to the federal service side, so that the federal service side can safely synchronize the parameters to be synchronized based on the updated values of the parameters to be synchronized from the subsystems;

obtaining a first sub-model W subjected to safety synchronization from the federal service side_ci1The synchronous value of each parameter to be synchronized to update the first submodel W_ci1To each pending parameter in (1).

14. The method of claim 13, subsystem i further comprising a sub-server S_iLocal model W for a single subsystem i_iFurther comprises a sub-server S_iThird submodel W of_si(ii) a The utilization of local and second member C_i2Constructing vertically sliced training samples for respective local models W_iPerforming the joint training comprises:

for a number of samples of the current round, pass the first subModel W_ci1Processing corresponding local sample data to obtain a corresponding first intermediate result R_it1To send to the sub-server S_iFor the sub-server S_iBased on the third submodel W_siFor the first intermediate result R_it1The second intermediate result R_it2Processing performed to feed back the first intermediate result R_it1In which the second intermediate result R_it2From a second member C_i2By means of the second submodel W_ci2Processing corresponding local sample data to obtain;

using the first intermediate result R_it1And a second intermediate result R_it2Determining a first submodel W_ci1The gradient of each parameter to be determined, thereby determining a first submodel W_ci1The updated value of the parameter to be synchronized.

15. The method of claim 13, the utilizing local and second members C_i2Constructing vertically sliced training samples for respective local models W_iPerforming the joint training comprises:

performing a multi-party security calculation with each training member in subsystem i to determine model loss for a first sub-model W_ci1The gradient of the undetermined parameter;

based on a first sub-model W_ci1Determining the updating value of the corresponding parameter to be synchronized according to the gradient of the parameter to be determined.

16. An apparatus for federated update of a model, the apparatus being adapted to a federated server in a system for federated update of a model, the system comprising the federated server and a plurality of subsystems, a single subsystem i of the plurality of subsystems comprising a first member C of training members_i1Second member C_i2First member C_i1Second member C_i2The held sample data form vertical segmentation, the sample data held by each subsystem form horizontal segmentation, and a single subsystem i corresponds to a local model W with the same structure as the model W_iLocal model W_iComprises a first member C_i1First submodel W of_ci1Is arranged on the second member C_i2Second submodel W of_ci2(ii) a The device comprises:

and the synchronization unit is configured to perform safe synchronization on the update values of the parameters to be synchronized received from the plurality of subsystems and feed back the synchronization values of the parameters to be synchronized so that the corresponding training members can complete the update of the parameters to be determined of the local module.

17. An apparatus for jointly updating a model, the apparatus being adapted to a process of jointly updating a model W of a system of models, the system comprising a federal service provider and a plurality of subsystems, a single subsystem i of the plurality of subsystems comprising a first member C of training members_i1Second member C_i2First member C_i1Second member C_i2The held sample data form vertical segmentation, the sample data held by each subsystem form horizontal segmentation, and a single subsystem i corresponds to a local model W with the same structure as the model W_iLocal model W_iComprises a first member C_i1First module W_ci1Is arranged on the second member C_i2Second module W_ci2(ii) a The device is arranged on the first member C_i1The method comprises the following steps:

a training unit configured to utilize the local and second member C_i2Training samples with upper composition vertical segmentation are directed to corresponding local model W_iPerforming joint training to obtain a first sub-model W_ci1The updating value of each parameter to be synchronized is in one-to-one correspondence with each parameter to be determined;

a providing unit configured to be associated with the first submodel W in case a synchronization condition is satisfied_ci1Of each pending parameterThe updated values of the parameters to be synchronized which are in one-to-one correspondence are sent to the federal service side, so that the federal service side can safely synchronize the parameters to be synchronized based on the updated values of the parameters to be synchronized received from the subsystems;

18. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 10-15.

19. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any one of claims 10-15.