CN114004363B

CN114004363B - Method, device and system for jointly updating model

Info

Publication number: CN114004363B
Application number: CN202111256451.8A
Authority: CN
Inventors: 郑龙飞; 陈超超; 王力; 张本宇
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Filing date: 2021-10-27
Publication date: 2024-05-31
Anticipated expiration: 2041-10-27

Abstract

The embodiment of the specification provides a method, a device and a system for jointly updating a model. According to the method, the device and the system provided by the embodiment of the specification, based on the data composite segmentation situation in the joint updating model, the data of the training members are supposed to be segmented, so that a plurality of horizontal segmentation subsystems are formed, and the training members with the data vertically segmented can be included in a single subsystem. In this way, a single subsystem with data in vertical segmentation iterates inside the subsystem through training samples distributed among a plurality of training members, thereby updating parameters to be synchronized. And the data synchronization can be carried out among all subsystems according to the synchronization period triggered by the synchronization condition. The method fully considers the data constitution of each training member, provides a solution for the joint update model under the complex data structure, and is beneficial to expanding the application range of federal learning.

Description

Method, device and system for jointly updating model

Technical Field

One or more embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method, an apparatus, and a system for jointly updating a model by multiple data parties.

Background

With the development of artificial intelligence technology, machine learning models have been gradually applied to the fields of risk assessment, speech recognition, face recognition, natural language processing, and the like. To achieve better model performance, more training data is needed. In the fields of medical treatment, finance and the like, different enterprises or institutions have different data samples, once the data are jointly trained by using a distributed machine learning algorithm, the model precision is greatly improved, and great economic benefits are brought to the enterprises.

In the conventional technology, federal learning is generally utilized to jointly train a model with better performance by utilizing data of a plurality of data parties. Federal learning can often be divided into two main categories depending on the type of data in the data party: horizontal slice data and vertical slice data. In a horizontal segmentation scene, the data feature spaces owned by the data parties are the same, and the sample spaces are different; in the vertical segmentation scene, the data sample spaces owned by the data parties are the same, and the feature spaces are different. However, in some multi-party joint machine learning, horizontal segmentation or vertical segmentation cannot be simply considered, for example, federal learning is performed between a certain financial platform and multiple banks, where a vertical segmentation scenario may be performed between the financial platform and the multiple banks, and a horizontal segmentation scenario may be more suitable between the multiple banks. That is, in practice, there are more complex cut scenes. How to realize the joint training of the related business models in a complex segmentation scene is a technical problem with important significance in the federal learning field and deserves research.

Disclosure of Invention

One or more embodiments of the present specification describe a method and apparatus for jointly updating models to solve one or more of the problems mentioned in the background.

According to a first aspect, a system for jointly updating a model is provided, including a federal service side and a plurality of subsystems, for jointly updating the model W, wherein a single subsystem i of the plurality of subsystems includes a first member C _i1 and a second member C _i2 of training members, sample data held by the first member C _i1 and the second member C _i2 form a vertical segmentation, sample data held by each subsystem form a horizontal segmentation, the single subsystem i corresponds to a local model W _i consistent with the structure of the model W, the local model W _i includes a first sub-model W _ci1 provided at the first member C _i1, A second sub-model W _ci2 provided on the second member C _i2; wherein: the single subsystem i is used for carrying out joint training in a vertical segmentation mode by utilizing training samples vertically segmented on the first member C _i1 and the second member C _i2 aiming at the local model W _i, providing the updated values of each parameter to be synchronized corresponding to each parameter to be determined in the corresponding local model W _i to the federal server under the condition that the synchronization condition is met, carrying out synchronization of the local parameter to be synchronized with the parameter to be synchronized in each subsystem according to the synchronization value of each parameter to be synchronized fed back by the federal server, thereby adjusting the corresponding undetermined parameters; the federal service side is used for carrying out safe synchronization on updated values of parameters to be synchronized from a plurality of subsystems and feeding back synchronization values.

According to a second aspect, there is provided a method for jointly updating a model, the method being adapted to a process of jointly updating a model W of a system of models, the system comprising a federal server and a plurality of subsystems, a single subsystem i of the plurality of subsystems comprising a first member C _i1 of training members, a second member C _i2, sample data held by the first member C _i1, the second member C _i2 forming a vertical slice, sample data held by each subsystem forming a horizontal slice, a single subsystem i corresponding to a local model W _i structurally consistent with the model W, the local model W _i comprising a first sub-model W _ci1 provided at the first member C _i1, a second sub-model W _ci2 provided at the second member C _i2; the method comprises the following steps: each subsystem performs joint training in a vertical segmentation mode by using training samples vertically segmented on the corresponding first member and the second member for the corresponding local model, and each training member provides updated values of each parameter to be synchronized, which are in one-to-one correspondence with each parameter to be determined, in the corresponding sub model for the federal service side under the condition that the synchronization condition is met; the federal service side performs safe synchronization on updated values of parameters to be synchronized from a plurality of subsystems and feeds back the synchronized values of the parameters to be synchronized; each training member in each subsystem receives the synchronization value of the local pending synchronization parameter to update the local pending parameter.

In one embodiment, the single subsystem i further includes a sub-server S _i, and the joint training of the local model W _i by the single subsystem i includes: for a plurality of samples of the current round, the first member C _i1 and the second member C _i2 process corresponding local sample data through the first sub-model W _ci1 and the second sub-model W _ci2 respectively to obtain a corresponding first intermediate result R _it1 and a second intermediate result R _it2, so as to send the first intermediate result R _it1 and the second intermediate result R _it2 to the sub-server S _i; the sub-service side S _i feeds back the gradient of the first intermediate result R _it1 and the gradient of the second intermediate result R _it2 to the first member C _i1 and the second member C _i2 respectively based on the processing of the first intermediate result R _it1 and the second intermediate result R _it2 by the third sub-model W _si; the first member C _i1 and the second member C _i2 each determine gradients of the parameters to be synchronized in the first sub-model W _ci1 and the second sub-model W _ci2 using the gradients of the first intermediate result R _it1 and the gradients of the second intermediate result R _it2, thereby determining updated values of the parameters to be synchronized in the first sub-model W _ci1 and the second sub-model W _ci2, respectively.

In one embodiment, the label holder for several samples of the current round in a single subsystem i is either the first member C _i1 or the second member C _i2; the processing of the first intermediate result R _it1 and the second intermediate result R _it2 by the sub-server S _i based on the third sub-model W _si, feeding back the gradient of the first intermediate result R _it1 and the gradient of the second intermediate result R _it2 to the first member C _i1 and the second member C _i2, respectively, further includes: the sub-server S _i performs processing on the first intermediate result R _it1 and the second intermediate result R _it2 based on the third sub-model W _si, obtaining a prediction result and sending the prediction result to the label holder; the label holder determines corresponding model loss through the comparison of label data of a plurality of samples of the current round and the prediction result, and feeds the model loss back to the sub-server S _i; the sub-server S _i determines the gradient for the first intermediate result R _it1 and the gradient for the second intermediate result R _it2 from the model loss.

In one embodiment, in case of the undetermined parameters contained in the third sub-model W _si, the sub-server S _i also detects the gradient of the model loss for the undetermined parameters contained in the third sub-model W _si.

In one embodiment, the label holder for several samples of the current round of single subsystem i is either the first member C _i1 or the second member C _i2, which is provided with a fourth sub-model W _ci3; the processing of the first intermediate result R _it1 and the second intermediate result R _it2 by the sub-server S _i based on the third sub-model W _si, feeding back the gradient of the first intermediate result R _it1 and the gradient of the second intermediate result R _it2 to the first member C _i1 and the second member C _i2, respectively, further includes: the sub-server S _i performs processing on the first intermediate result R _it1 and the second intermediate result R _it2 based on the third sub-model W _si, obtaining a third intermediate result R _it3 and sending the third intermediate result R _it3 to the label holder; the label holder processes the third intermediate result R _it3 through a fourth sub-model W _ci3 to obtain a corresponding prediction result, and determines a gradient of model loss aiming at the third intermediate result R _it3 based on the comparison of label data of a plurality of samples of the current round and the prediction result, so as to feed back the gradient to the sub-server S _i; the sub-server S _i determines a gradient for the first intermediate result R _it1 and a gradient for the second intermediate result R _it2 from the gradient of the third intermediate result R _it3.

In one embodiment, the joint training of the local model W _i by subsystem i includes: each training member in the subsystem i carries out multiparty safety calculation so that each training member can determine the gradient of model loss aiming at the local undetermined parameters; each training member determines an updated value of the parameter to be synchronized based on the gradient of the parameter to be determined in the corresponding sub-model, wherein the first member C _i1 and the second member C _i2 determine the updated value of the parameter to be synchronized in the first sub-model W _ci1 and the second sub-model W _ci2, respectively.

In one embodiment, the synchronization conditions include: each local model is updated over a predetermined round or a predetermined period of time.

In one embodiment, the single parameter to be synchronized is a single parameter to be determined, or a single gradient corresponding to the single parameter to be determined.

In one embodiment, the federal service securely synchronizing updated values of parameters to be synchronized from a plurality of subsystems includes: the federal server receives each parameter to be synchronized which is transmitted by each training member and is encrypted in a preset encryption mode; and the federal service side performs at least one of addition, weighted average and median value calculation on the updated values of the parameters to be synchronized to obtain corresponding synchronous values.

In one embodiment, the predetermined encryption scheme includes one of: adding a perturbation satisfying the differential privacy; homomorphic encryption; secret sharing.

According to a third aspect, a method for jointly updating a model is provided, the method is suitable for a process of jointly updating a model W of a system of the model, the system comprises a federal server and a plurality of subsystems, a single subsystem i in the plurality of subsystems comprises a first member C _i1 and a second member C _i2 in training members, sample data held by the first member C _i1 and the second member C _i2 form vertical segmentation, sample data held by each subsystem form horizontal segmentation, the single subsystem i corresponds to a local model W _i consistent with the structure of the model W, and the local model W _i comprises a first sub-model W _ci1 arranged on the first member C _i1 and a second sub-model W _ci2 arranged on the second member C _i2; the method is performed by the federal service party and includes: respectively receiving updated values of all to-be-synchronized parameters, which are in one-to-one correspondence with all to-be-determined parameters in the corresponding sub-model under the condition that the synchronization conditions are met, from all sub-systems, wherein the updated values of all to-be-synchronized parameters provided by a single sub-system i are determined based on joint training of the sub-system i in a vertical segmentation mode for the corresponding local model W _i; and carrying out safe synchronization on the updated values of the parameters to be synchronized from the subsystems, and feeding back the synchronized values of the parameters to be synchronized so as to enable corresponding training members or sub-servers to finish the update of the parameters to be synchronized of the local module.

According to a fourth aspect, there is provided a method of jointly updating a model, the method being adapted to a process of jointly updating a model W of a system of models, the system comprising a federal server and a plurality of subsystems, a single subsystem i of the plurality of subsystems comprising a first member C _i1 of training members, a second member C _i2, sample data held by the first member C _i1, the second member C _i2 constituting a vertical slice, sample data held by each subsystem constituting a horizontal slice, a single subsystem i corresponding to a local model W _i structurally consistent with the model W, the local model W _i includes a first module W _ci1 provided at the first member C _i1, a second module W _ci2 provided at the second member C _i2; the method is performed by a first member C _i1, comprising: performing joint training on the corresponding local model W _i by utilizing a training sample formed by the local and the second member C _i2 in a vertical segmentation manner to obtain updated values of all the parameters to be synchronized, which are in one-to-one correspondence with all the parameters to be determined in the first sub-model W _ci1; under the condition that the synchronization condition is met, the updated values of the parameters to be synchronized, which are in one-to-one correspondence with the parameters to be determined in the first sub-model W _ci1, are sent to the federal service side so that the federal service side can safely synchronize the parameters to be synchronized based on the updated values of the parameters to be synchronized from the subsystems; and acquiring the synchronous value of each parameter to be synchronized in the first sub-model W _ci1 subjected to the secure synchronization from the federal service side so as to update each parameter to be determined in the first sub-model W _ci1.

In one embodiment, the subsystem i further includes a sub-server S _i, and the local model W _i corresponding to the single subsystem i further includes a third sub-model W _si provided on the sub-server S _i; the training samples for forming the vertical segmentation by using the local and the second member C _i2 are jointly trained on the corresponding local model W _i, which comprises: for a plurality of samples of the current round, processing corresponding local sample data through a first sub-model W _ci1 to obtain a corresponding first intermediate result R _it1, sending the corresponding first intermediate result R _it1 to a sub-server S _i for the sub-server S _i to process the first intermediate result R _it1 and the second intermediate result R _it2 based on a third sub-model W _si, feeding back a gradient of the first intermediate result R _it1, wherein the second intermediate result R _it2 is obtained by processing the corresponding local sample data by the second member C _i2 through the second sub-model W _ci2; using the gradient of the first intermediate result R _it1 and the gradient of the second intermediate result R _it2, the gradient of each pending parameter in the first sub-model W _ci1 is determined, thereby determining the updated value of the pending parameter in the first sub-model W _ci1.

In one embodiment, the training samples for constructing vertical cuts using the local and second members C _i2 are jointly trained on the corresponding local model W _i, including: performing multipartite safety calculation with each training member in the subsystem i to determine a gradient of model loss for the undetermined parameter in the first sub-model W _ci1; based on the gradient of the parameter to be determined in the first sub-model W _ci1, updated values of the corresponding parameter to be synchronized are determined.

According to a fifth aspect, there is provided an apparatus for jointly updating a model, the apparatus being adapted for a federal server in a system for jointly updating a model, the system comprising the federal server and a plurality of subsystems, a single subsystem i of the plurality of subsystems comprising a first member C _i1 of training members, a second member C _i2, sample data held by the first member C _i1, the second member C _i2 forming a vertical slice, sample data held by each subsystem forming a horizontal slice, a single subsystem i corresponding to a local model W _i structurally consistent with the model W, the local model W _i comprising a first sub-model W _ci1 provided at the first member C _i1, a second sub-model W _ci2 provided at the second member C _i2; the device comprises:

the acquisition unit is configured to receive updated values of all the parameters to be synchronized, which correspond to all the parameters to be determined in the corresponding local model one by one under the condition that the synchronization condition is met, from all the subsystems respectively, wherein the updated values of all the parameters to be synchronized provided by the single subsystem i are determined based on joint training of the subsystem i in a vertical segmentation mode for the corresponding local model W _i;

and the synchronization unit is configured to perform safe synchronization on the updated values of the parameters to be synchronized received from the subsystems and feed back the synchronized values of the parameters to be synchronized so that the corresponding training members or sub-service parties can complete the update of the parameters to be synchronized of the local module.

According to a sixth aspect, there is provided an apparatus for jointly updating a model, the apparatus being adapted to process of jointly updating a model W in a system of models, the system comprising a federal server and a plurality of subsystems, a single subsystem i of the plurality of subsystems comprising a first member C _i1 of training members, a second member C _i2, sample data held by the first member C _i1, the second member C _i2 forming a vertical slice, sample data held by each subsystem forming a horizontal slice, the single subsystem i corresponding to a local model W _i structurally consistent with the model W, the local model W _i comprising a first module W _ci1 provided at the first member C _i1, a second module W _ci2 provided at the second member C _i2; the device is provided on the first member C _i1, including:

The training unit is configured to perform combined training on the corresponding sub-model W _i by utilizing training samples which form vertical segmentation on the local and second members C _i2 so as to obtain updated values of all the parameters to be synchronized, which are in one-to-one correspondence with all the parameters to be determined in the first sub-model W _ci1;

The providing unit is configured to send the updated values of the parameters to be synchronized, which are in one-to-one correspondence with the parameters to be determined in the first sub-model W _ci1, to the federal service side so that the federal service side can perform safe synchronization on the parameters to be synchronized based on the updated values of the parameters to be synchronized received from the sub-systems;

and the synchronization unit is configured to acquire the synchronization value of each parameter to be synchronized in the first sub-model W _ci1 subjected to the secure synchronization from the federal service side so as to update each parameter to be determined in the first sub-model W _ci1.

According to a seventh aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the methods of the third and fourth aspects.

According to an eighth aspect, there is provided a computing device comprising a memory and a processor, characterized in that the memory has stored therein executable code, the processor, when executing the executable code, implementing the method of the third aspect, the fourth aspect.

According to the method, the device and the system provided by the embodiment of the specification, based on the data composite segmentation situation when the model is updated jointly, the data of at least part of the training members in each training member is supposed to be segmented, so that a plurality of horizontal segmentation subsystems are formed, and the training members with the data vertically segmented can be included in a single subsystem. In this way, a single subsystem with data in vertical segmentation iterates inside the subsystem through training samples distributed among a plurality of training members, thereby updating parameters to be synchronized. And the data synchronization can be carried out among all subsystems according to the synchronization period triggered by the synchronization condition. The method fully considers the data constitution of each training member, provides a solution for the joint update model under the complex data structure, and is beneficial to expanding the application range of federal learning.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIGS. 1a and 1b are schematic diagrams of horizontal and vertical cuts, respectively, of data in conventional Federal learning;

FIGS. 2a and 2b are schematic diagrams of a data composite cut scene of two specific examples;

FIG. 3a is a schematic diagram of a specific architecture of a system based on a joint update model of a data composite cut scene under the technical concept of the present specification;

FIG. 3b is a schematic diagram of another specific architecture of a system based on a joint update model of a data composite cut scene under the technical concept of the present specification;

FIG. 4a shows a schematic diagram of a model architecture of a subsystem corresponding to FIG. 3 a;

FIG. 4b shows a schematic diagram of a model architecture of a subsystem corresponding to FIG. 3 b;

FIG. 5 illustrates a method flow diagram of a joint update model, according to one embodiment;

FIG. 6 illustrates a timing flow diagram for a joint update model, according to one embodiment;

FIG. 7 illustrates a schematic block diagram of an apparatus for joint update of a model, according to one embodiment;

FIG. 8 illustrates a schematic block diagram of an apparatus for joint updating a model, according to one embodiment.

Detailed Description

The following describes the scheme provided in the present specification with reference to the drawings.

Federal learning (FEDERATED LEARNING), which may also be referred to as federal machine learning, joint learning, federation learning, and the like. Federal machine learning is a distributed machine learning framework that can effectively help multiple institutions perform data usage and machine learning modeling while meeting the requirements of user privacy protection, data security, and government regulations.

Specifically, assuming enterprise A, enterprise B each builds a task model, a single task may be classified or predicted, and these tasks have also been approved by the respective users when the data was obtained. However, because the data is incomplete, such as enterprise a lacks label data, enterprise B lacks user feature data, or the data is insufficient, the sample size is insufficient to build a good model, and the model at each end may not be built or may not be ideal. The problem to be solved by federal learning is how to build high quality models at each end of a and B, and the owned data of each enterprise is not known to others, i.e. build a common model without violating data privacy regulations. This common model is as if each party had aggregated data together to create an optimal model. Thus, the built model serves only the own targets in the area of each party.

Federal learning may include multiple training members and may also perform some assistance operations by trusted third parties as a service party, if desired. Each training member may correspond to different business data. The service data may be various data such as characters, pictures, voice, animation, video, and the like. Typically, the business data of the individual training members has a correlation.

For example, among a plurality of business parties involved in medical business, each business party may be each hospital, physical examination institution, etc., for example, business party 1 is hospital a, diagnosis records corresponding to the age, sex, symptoms, diagnosis results, treatment plan, treatment results, etc. of the user are taken as local business data, and business party 2 may be physical examination institution B, physical examination record data corresponding to the age, sex, symptoms, physical examination conclusion, etc. of the user, etc. Each business party holding hospital data, physical examination institution data, etc. can perform federal learning of models such as disease risk prediction as training members.

Federal learning typically has two data distribution architectures, a horizontal split architecture and a vertical split architecture. Fig. 1a and 1b show these two distribution architectures, respectively.

As shown in fig. 1a, a horizontal split architecture is provided. In the horizontal segmentation architecture, a single sample is completely held by a single data party, and samples held by different data parties are mutually independent. As in fig. 1a, data party 1 holds the tag data of sample 1 (e.g., tag 1) and all feature data (e.g., feature a1+b1), and data party 2 holds the tag data of sample 2 (e.g., tag 2) and all feature data (e.g., feature a2+b2). The feature a and the feature B can be regarded as two types of features, in practice, there may be more types of features, and a single data party may also hold more sample data, which is not described herein. As shown in fig. 1a, a sample is represented by a row, and the samples in each data side are completely independent and can be completely separated along a straight line in the horizontal direction, and thus are called a horizontal split (or longitudinal split) architecture. For example, each bank data party holds basic characteristic data (such as type a data) of age, sex and the like of each user, and asset type characteristic data (such as type B data) of balance, running water, loan, repayment and the like, and label data of whether the user is at financial risk or not. That is, the sample data held by different data parties has different sample spaces and the same feature space.

Fig. 1b shows a vertical slicing architecture. Under the vertical slicing architecture, sample data of a single sample is held by multiple data parties, and a single data party holds only part of the data of each sample. As shown in fig. 1B, under the vertical slicing architecture, the data side 1 holds tag data (e.g., tag 1, tag 2, etc.) of each sample (e.g., sample 1, sample 2, etc.) and part of feature data (e.g., type a feature data, corresponding to each sample is denoted as feature A1, feature A2, etc.), and the data side 2 holds another part of feature data (e.g., type B feature data) of each sample, corresponding to each sample is denoted as feature B1, feature B2, etc. As shown in fig. 1b, a single data party cannot hold a complete sample, a sample is represented by a row, and the samples in the data parties are horizontally combined to form complete sample data, or can be completely separated along a straight line in a vertical direction, so the data party is called a vertical slicing (or longitudinal slicing) architecture. For example, the data party 1 is a bank, the type a feature data is asset type feature data, the data party 2 is a shopping website, and the type B feature data is shopping type feature data such as commodity browsing records, search records, purchase records, payment paths and the like of the user. That is, the sample data held by different data parties has the same sample space and different feature spaces. In practice, the sample data of a single sample may also include more other characteristic data held by more data parties, only as an example in fig. 1 b.

For the horizontal split, vertical split architecture shown in fig. 1a, 1b, the federal learning approach in conventional learning is typically: under the horizontal segmentation architecture, a parallel synchronous training method can be adopted, namely: the training members have the same neural network structure, training is carried out under the assistance of the third party C, data such as gradient or update value of the undetermined parameter is provided for the third party C in the training process, and the synchronous gradient or undetermined parameter is calculated under the assistance of the third party C and is transmitted back to the training members; in a vertical scene, an MPC (multi-party secure computing) or a segmentation learning model is generally adopted, the MPC is to put computation of machine learning in a dense state domain, training members in the segmentation learning have the first few layers of the whole neural network structure, a server has the rest layer model of the neural network structure, the training members train locally by using private data respectively to obtain output layers of the first few layers of the network model, the output layers are transmitted to the server to carry out forward propagation of the rest layers, and the model is updated by losing back propagation gradient data through the model.

However, in a real business scenario, not all federal learning may be split according to the horizontal split, vertical split architecture shown in fig. 1a, 1b, and may include both horizontal and vertical splits.

Fig. 2a shows a data architecture of a simple example of a hybrid cut. The data structure comprises a plurality of horizontally segmented data parties and a data party which forms vertical segmentation with the plurality of horizontally segmented data parties. In fig. 2a, the data sides Ai (i=1, 2 … … n) represent the data sides holding X-type feature and tag data, such as banks, and the X-type feature data, such as asset type feature data, form a horizontal slice between the data sides A1, A2 … … data side An. The data party B represents a data party with Z type characteristics, such as shopping websites, and the like, and the characteristic Z represents shopping type characteristic data, so that a vertical segmentation architecture is formed between the data party B and each data party A.

In practice, the manner of splitting the data may be more complex, as shown in fig. 2b, involving a number of possible scenarios. Fig. 2b shows the data architecture in case of samples arranged in rows, where each wireframe represents the data of one data side, and the longitudinal correspondence (or column-wise) represents the same features. It can be seen that the data between the data parties are intricate, and the horizontal segmentation and the vertical segmentation are nested and crossed. For example, a part of the data side 4 and a part of the data side 6 constitute a vertical slice, another part of the data and a part of the data side 9 constitute a vertical slice, another part of the data side 9 and a part of the data side 5 constitute a vertical slice, while the part of the data side 5 and the data side 4 form a horizontal slice, the above-mentioned data and the other part of the data side 5 form a horizontal slice … … again

For such complex data architectures, individual data parties may not be able to co-train the model using conventional federal learning as training members. In view of this, the present specification proposes a novel federal learning concept of distributed type to process sample data of such a hybrid slicing architecture. Under the technical conception of the specification, each data party of federal learning is divided first. In the case shown in fig. 2a, each sub-service party may correspond one-to-one to each data party Ai (as a first member), while data party B is divided into a plurality of second members, each of which contains sample data of training samples consistent with the corresponding data party Ai. A single data party Ai and a data party B part containing other data of the corresponding sample body can be regarded as one subsystem. In the case shown in fig. 2b, the data of the respective data parties are divided into batches by respective dotted lines, such as dotted line 201. The batches of data exhibit horizontal cuts with respect to each other. While a single batch of data may internally appear as a vertical slice, or hold a small number of complete sample data alone (e.g., sample data held by data party 11, 12 in fig. 2 b), etc. In summary, the partitioned data includes at least one set of vertically partitioned data parties. For the situation that one data party and data of a plurality of different data parties form vertical segmentation, intersection sets can be determined through privacy intersection (SPI) and other modes, so that alignment segmentation of samples is carried out. The specific privacy delivery mode is determined according to the service requirement, and is not described herein.

In view of this idea, please refer to fig. 3a and 3b, which are schematic diagrams of two specific distributed federal learning architectures of the system for joint update model in the present specification. The present specification describes a system for jointly updating a model, functionally divided, comprising: a federal service and subsystems surrounded by dashed boxes such as dashed boxes 3011, 3012, and the like. Wherein, each subsystem can be mutually independent from the aspects of system function and model arrangement. In other words, the individual subsystems can be viewed as "training members" that are split parallel to each other. The federal service can be used to synchronize parameters to be synchronized in the global model W. The parameter to be synchronized may typically be a parameter to be determined in the model W, or a gradient of the parameter to be determined. The parameters to be synchronized and the parameters to be determined correspond one by one. It will be readily appreciated that assuming a number of subsystems n (n is a positive integer greater than 1), any single subsystem i (i is greater than or equal to 1 and less than or equal to n) may correspond to a local model W _i that is structurally consistent with the global model W. As shown in fig. 2a and fig. 2b, according to the actual data distribution situation, at least 1 subsystem exists in each subsystem, and the split federation learning can be performed on the vertically split sample data, such as the subsystem for performing the split federation learning on the sample data formed by the data side 5, the data side 9 and the data side 10 in fig. 2 b.

In the architectures shown in fig. 3a and 3b, only the subsystem for performing split federal learning on the vertically split sample data is shown, and in practice, the case that the data side 11 and the data side 12 are similar to the case that the data side 11 and the data side 12 in fig. 2b are separately configured to form the subsystem may be included, which is not described herein.

As can be seen in connection with the schematic of fig. 2a, 2b, in the system architecture shown in fig. 3a, 3b, a single training member may be a data party or may be part of a single data party. That is, a data party may be separated into multiple subsystems as corresponding training members according to the data provided. Thus, a single training member in a single subsystem shown in fig. 3a or 3b may represent one data party, or may represent a portion of one data party, or multiple training members may be from the same data party. As shown in fig. 3a and 3b, the difference is that the single subsystem shown in fig. 3a includes at least two training members, and the single subsystem shown in fig. 3b includes a sub-server in addition to at least two training members.

In the implementation architecture shown in fig. 3a, model training may be performed by multi-party security computing (MPC) between individual training members. Assuming that subsystem i (i=1, 2 … … n) is a subsystem for split federal learning of vertically sliced sample data, 2 training members of the vertically sliced data, e.g., denoted as C _i1 and C _i2, the local model W _i is split into multiple parts, e.g., including a sub-model W _ci1 disposed on training member C _i1, a sub-model W _ci2 disposed on training member C _i1. In this case, taking the example that the jointly trained model is a neural network, the distribution of sub-models inside the subsystem can be shown in fig. 4a, for example, where the characteristics, weight parameters, and neural network of the gray part are held by the data party C _i1, and the characteristics, weight parameters, and neural network of the black part are held by the data party C _i2. Because the data on the training member C _i1 and the training member C _i1 form a vertical segmentation, and the calculation of a single data party may be used to obtain the data of the data party, the training member C _i1 and the training member C _i1 can interact data in a homomorphic encryption mode, a secret sharing mode and the like, and combine the local model W _i corresponding to the training subsystem i under the condition of not revealing the local data privacy.

In the implementation architecture shown in fig. 3b, subsystem i may correspond to one sub-server, e.g., denoted as S _i, and at least 2 training members of the vertical slice data, e.g., denoted as C _i1 and C _i2. Accordingly, the local model W _i is partitioned into a plurality of parts, including, for example, the sub-model W _ci1 set on the training member C _i1, the sub-model W _ci2 set on the training member C _i1, and the sub-model W _si in the subsystem set on the sub-server S _i, and the like. At this time, the architecture of each model in the subsystem i may be as shown in fig. 4b, where the submodels W _ci1 and W _ci2 are parallel submodels, and then connected in series with the submodel W _si. In the process of dividing federal learning, the sub-models W _ci1 and W _ci2 respectively process the parts of the current batch of samples distributed on the training members C _i1 and C _i1, and then send the obtained intermediate results to the sub-server i, and the sub-model W _si on the sub-server processes the intermediate results, so as to obtain the prediction results.

For the implementation architecture of fig. 3b, from the device and data attribution: the federal service party and each sub-service party may belong to the same trusted third party, or may be provided in the same distributed device cluster, or may belong to different trusted third parties, which is not limited herein.

It should be noted that fig. 3a and 3b are examples, but are not exhaustive, of the system for jointly updating models in the present specification. In practice, the system of joint update models may also be set up in other ways. For example, some of the subsystems include a sub-server as shown at 3012, some of the subsystems do not include a sub-server as shown at 3011, and so on.

In fig. 3a and 3b, some or all of the n training members C _i1 (i=1, 2 … … n) may be the same data side, or may belong to different data sides, and similarly, some or all of the n training members C _i2 (i=1, 2 … … n) may be the same data side, or may belong to different data sides; and are not limited herein. For example, in the scenario illustrated in fig. 2a, training members C ₁₂、C₂₂……C_n2 may both indicate data party B. It should be noted that, at this time, the data party B may arrange a plurality of sub-models W _ci2 and also arrange one sub-model W _c2 as a sub-model W _ci2 shared by the subsystems. For clarity of description and more general, a number of sub-models W _ci2 are still described below, each of which may indicate the same sub-model in some alternative examples. In a single subsystem, training member C _i1 may be referred to as a first member, for example, and training member C _i2 may be referred to as a second member, for example.

Where the model W may be determined by individual training member negotiations. Each local model W _i has a consistent structure with W, e.g., a consistent number of neural network layers, weight matrix, etc. The individual local models may vary slightly depending on the respective subsystems. Inside the subsystem, the distribution of the sub-model of the local model W _i on each member may be determined according to the feature dimension held by each member, which is not described herein. In an alternative embodiment, where the local model W _i is identical to the global model W to be jointly updated, W may be initialized by the federal service and issued to the various subsystems to obtain W _i. For example, under the architecture of fig. 3a, W _i may be split into multiple sub-models for each data party by the federal service party via splitting W. Under the architecture shown in fig. 3b, the federal server initializes W and issues W to each sub-server to obtain W _i. The sub-server S _i splits the model W into a sub-model W _ci of the training members and a sub-model W _si on the sub-server, and splits W _ci into W _ci1, W _ci2 and the like according to feature dimensions among the training members. According to one embodiment, the sub-model W _i may be modified on the basis of W according to the actual service requirement, for example, a constant term parameter is added, and the sub-server S _i may split the modified W _i.

Further, from the data interaction perspective, inside the subsystem: under the implementation architecture of fig. 3a, secure interactions are performed between the data parties within a single subsystem; under the implementation architecture of fig. 3b, each member may interact with the sub-service side respectively, and the members remain independent from each other, for example, after each member processes local data by using a local sub-module, each member may provide an intermediate processing result to the sub-service side, and the sub-service side may feed back gradient data of the corresponding intermediate result to each member in the subsystem. Overall, the method is characterized in that: each training member in the single subsystem in fig. 3a can interact data to be synchronized with the federal server, and each sub-server and each training member in fig. 3b can interact data to be synchronized with the federal server. As indicated by the double-headed arrow dashed lines in fig. 3a, 3b, each sub-server and/or each training member may send a local update value of the parameter to be synchronized to the federal server, which feeds back the synchronization value of the parameter to be synchronized to it.

It should be noted that in fig. 3a or fig. 3B, if the architecture of fig. 2a corresponds, the multiple sub-models W _ci2 may all be provided on one data side B, or in the case of the same sub-model on the data side B, the data side B may perform the relevant parameter synchronization locally, without performing the synchronization via the federal service side.

The system architecture divides the data party of the hybrid architecture according to the training mode through the distributed arrangement of the service party to form a horizontal segmentation architecture among subsystems and a vertical segmentation architecture inside the subsystems, so that the integrated federal learning system and the internal sub-federal learning system are mutually matched, the distributed training of the model under the vertical and horizontal composite scene is realized by combining a segmentation learning algorithm and a parallel synchronous learning algorithm, and a corresponding solution is provided for a more complex federal learning scene.

The technical idea of the present specification will be described in detail below taking an operation performed in one parameter synchronization period by the system of the joint update model as shown in fig. 3a or 3b as an example.

It will be appreciated that the process performed by the system of joint update models may include several small loops within the subsystem and one large loop of the overall system during one parameter synchronization period. FIG. 5 shows a flow diagram of a joint update model according to an embodiment of the present description.

As shown in fig. 5, the flow includes the steps of: step 501, each subsystem performs joint training in a vertical segmentation mode on a corresponding local model by using training samples vertically segmented on a corresponding first member and a corresponding second member, and each training member provides updated values of each parameter to be synchronized, which are in one-to-one correspondence with each parameter to be determined, in a corresponding sub-model to a federal service side under the condition that synchronization conditions are met; step 502, the federal service side performs secure synchronization on updated values of parameters to be synchronized from a plurality of subsystems, and feeds back synchronization values of the parameters to be synchronized; in step 503, each training member in each subsystem receives the synchronization value of the local pending synchronization parameter to update the local pending parameter.

Firstly, in step 501, each subsystem performs joint training in a vertical segmentation mode for a corresponding local model by using training samples vertically segmented on a corresponding first member and a corresponding second member, and each training member provides an updated value of each parameter to be synchronized corresponding to each parameter to be determined in a corresponding sub-model to a federal service side under the condition that a synchronization condition is satisfied.

It can be appreciated that federal learning in a vertical segmentation approach is performed for the corresponding sub-model using vertically segmented training samples. The training architecture in this way is shown in fig. 4a or fig. 4 b. In a training period of the subsystem, for a batch of training samples, pending parameters in the sub-model may be updated via forward transfer and gradient back propagation of data between training members, or between training members and the sub-server.

As shown in fig. 4a, under the architecture shown in fig. 3a, since no assistance of the sub-server is provided, forward data transfer and gradient back propagation are completed between each training member through multiparty security calculation, and each training member obtains the gradient of the undetermined parameter contained in the corresponding sub-model.

To more clearly describe the technical solution of the present specification, fig. 6 shows a timing diagram of operations performed by any one of the vertically split subsystems (such as the system shown in fig. 4 b) in the system of the joint update model corresponding to fig. 3b in one parameter synchronization period. The dashed box 601 is a processing flow of federal learning in the vertical segmentation mode of the subsystem. The operation in this step 501 is described below in connection with the step shown in the dashed box S601 of fig. 6.

As shown in the dashed box S601 of fig. 6, through S6011, the first member and the second member process local data through local sub-modules, respectively, for several samples of the current round, and pass intermediate results to the sub-service. Taking the subsystem i as an example, the first member C _i1 and the second member C _i2 process the corresponding local sample data through the first sub-model W _ci1 and the second sub-model W _ci2 respectively to obtain a corresponding first intermediate result R _it1 and a second intermediate result R _it2, so as to send the first intermediate result R _it1 and the second intermediate result R _it2 to the sub-server S _i.

The sample data processed by the first member C _i1 and the second member C _i2 are in one-to-one correspondence, that is, the sample bodies are consistent, for example, local data are processed in a manner that the same sequence corresponds to the same sample identifier. The sample body is determined according to a specific service scene, for example, the sample identification can be used for uniquely marking the sample body, for example, the sample body is an identity card number, a mobile phone number and the like. Sample data on the first member C _i1 and the second member C _i2 may be determined by means of privacy intersection or the like when performing subsystem segmentation. In the current federal learning process, the sequence of the sample data of the current batch can be determined by the sub-server i or other modes agreed by the first member C _i1 and the second member C _i2, and corresponding data processing is carried out through the first sub-model W _ci1 and the second sub-model W _ci2 according to the corresponding sequence, so that respective intermediate results are obtained, and the first intermediate results and the second intermediate results of each training sample are ensured to correspond to each other. The individual data in the intermediate result may be delivered to the sub-server i in a form having a contracted order such as a vector, an array, or the like. As can be seen from fig. 6, the data processing and the intermediate result sending processes between the first member C _i1 and the second member C _i2 are independent, so that the privacy data can be ensured not to be revealed. When sending the intermediate result, in order to enhance privacy protection, the first member C _i1 and the second member C _i2 may further add disturbance to the intermediate result by means of differential privacy, or encrypt the intermediate result by means of homomorphic encryption, secret sharing, or the like.

Further, through S6012, the sub-server S _i feeds back the gradient of the first intermediate result R _it1 and the gradient of the second intermediate result R _it2 to the first member C _i1 and the second member C _i2, respectively, based on the processing of the first intermediate result R _it1 and the second intermediate result R _it2 by the sub-model W _si. In order to determine the gradient of the first intermediate result and the second intermediate result, the predicted result for the training sample needs to be determined first. The prediction result is a service result of the service to be predicted, and the prediction accuracy of the service to be predicted can be checked through a sample tag. The sample tag may typically be held by a portion of the training members in the current subsystem, but this does not exclude the possibility that the training members provide the sample tag to the sub-server S _i or that the sample tag is held by the sub-server S _i.

In one embodiment, the sub-server S _i may obtain the sample label, at this time, the sub-server may obtain the prediction result, and complete the comparison between the sample label and the prediction result, so as to determine the model loss.

In another embodiment, the sample tag may also be held by a portion of the training members.

In one case, as shown in fig. 6, the sub-server S _i may send the prediction result to the training member (e.g., the first member C _i1) that holds the sample label, and the corresponding training member completes the comparison between the sample label and the prediction result, so as to determine the model loss, and feed back to the sub-server S _i. Fig. 6 is one of the cases of model loss determination, and thus the corresponding timings are indicated by broken lines. In the case that the plurality of training members each hold a part of sample labels, the sub-server S _i may also send the prediction result data of the corresponding training sample to each training member, so as to compare model loss, which is not described herein. The model loss is set according to specific business scenes, and can be in various forms such as mean square error, cross entropy, cosine similarity and the like. Typically, the current model loss is the sum of model losses caused by the current batch of training samples, and the number of training samples in the current batch may be 1, or may be multiple (e.g. 100), etc. The sub-server S _i may further determine the gradient of each intermediate result based on the model loss, i.e. the bias of the model loss to the intermediate result. According to the definition of partial derivatives, the gradient has transitivity, and in order to determine the gradient of each undetermined parameter in the submodules in each training member, the gradient of an intermediate result determined by the undetermined parameter can be determined first. Thus, the sub-server S _i may determine gradients of the first intermediate result and the second intermediate result corresponding to the first member and the second member, respectively, and each transmits the gradients back to the corresponding training member. Optionally, in the case where there are pending parameters in the submodel W _si, the sub-server S _i may also determine the gradient of the pending parameters in the submodel W _si locally to update the local parameters.

In another case, the label holder may further be provided with a third sub-model W _ci3, and the sub-server S _i may further obtain a third intermediate result R _it3 based on the processing performed by the sub-model W _si on the first intermediate result R _it1 and the second intermediate result R _it2, and then send the third intermediate result R _it3 to the label holder, where the third sub-model W _ci3 processes to determine a predicted result of the third intermediate result R _it3 for the training sample. Model loss is then calculated by the label holder based on the predicted outcome, and the gradient of the third intermediate outcome is passed back to the sub-server S _i. Further, the sub-server S _i determines the gradient of the model loss to the first intermediate result R _it1, the second intermediate result R _it2.

In other cases, the model may have other structures, and the gradient may also have other different determining manners, which are not described herein. In summary, the gradient has the property of counter-propagating, and the gradient of the model loss to the first intermediate result, the second intermediate result, can be determined by the respective sub-server.

On the other hand, according to S6013, the first member C _i1 and the second member C _i2 each determine gradients of undetermined parameters in the first sub-model W _ci1 and the second sub-model W _ci2 using the gradients of the first intermediate result R _it1 and the gradients of the second intermediate result R _it2 to locally update the parameters in the first sub-model W _ci1 and the second sub-model W _ci2, respectively.

It should be noted that, in the update iteration of the subsystem i, the parameters updated by each participant (such as each training member in the architecture illustrated in fig. 3a, or the sub-server and each member in the architecture illustrated in fig. 3 b) may generally include a current pending parameter gradient, and the pending parameters updated according to the gradient. When the subsystem does not need to be in parameter synchronization with other subsystems, each participant can update gradient and undetermined parameters in sequence, and when the subsystem needs to be in parameter synchronization with other subsystems, each participant can determine to update the parameters to be synchronized and send the updated values of the parameters to be synchronized to the federal service side. By parameters to be synchronized, it is understood parameters that require synchronization between the various subsystems. Here, the parameter to be synchronized may be a gradient or a parameter to be determined.

In the process of jointly updating the model W, a parameter synchronization condition may be preset, for example, a predetermined iteration round (such as 5 rounds) or a predetermined duration (such as 3 minutes) may be set, and in the case that the synchronization condition is met, each subsystem may stop iteration and send an update value of the currently determined pending parameter to the federal service side. Under the architecture shown in fig. 3b, as shown in fig. 6, in case the sub-server also has parameters to be synchronized, the sub-server and the training member together send updated values of the local parameters to be synchronized to the federal server. Since there may be a case where the sub-service side does not have parameters to be synchronized, the flow of uploading local parameters to the federal service side by the sub-service side in fig. 6 is represented by a dotted line, which represents an alternative according to an actual service situation. It will be appreciated that under the architecture shown in fig. 3a, the process of uploading parameters is similar to that of fig. 6, except that there is no sub-server, so only the member direction is trained; and uploading parameters to be synchronized by the shoulder server. In the parameter uploading process, each participant only sends local parameters, so that the data privacy is effectively protected. Optionally, each participant may also encrypt the local parameter by adding a disturbance (such as disturbance data satisfying differential privacy), or homomorphic encryption, and then send the encrypted local parameter to the federal service side.

In this way, the synchronization conditions are used to control the parameter synchronization period of each subsystem in the process of jointly updating the model, and in a single synchronization period, one or more iterations can be performed in the single subsystem, so that the current parameters to be synchronized are fed back to the federal server side when the parameter synchronization period arrives. In a single iteration process, each training member processes the data held by itself, so that the data privacy is effectively protected.

Next, through step 502, the federal service side performs secure synchronization on the updated values of the parameters to be synchronized received from the plurality of subsystems, and feeds back the synchronization values of the parameters to be synchronized.

Referring to fig. 6, which includes a sub-server architecture, in S602, the parameter synchronization process is a process of fusing updated values of to-be-synchronized parameters received from each subsystem and corresponding to each to-be-determined parameter one by one. For a single parameter to be synchronized, updated values sent from the various subsystems may correspond. The federal service may average, median, maximum, minimum, etc. these updated values to obtain the synchronization value for the single parameter to be synchronized (e.g., the synchronization parameter in fig. 6). In the case that the parameter to be synchronized is a gradient, the synchronization value of the gradient can also be determined by means of addition. The federal service may then feed back the synchronization values for the various parameters to be synchronized to the corresponding participants. Taking the first member C _i1 as an example, for a first parameter in the to-be-synchronized parameters corresponding to the first member C _i1, the first member C _i1 sends an updated value of the first parameter to the federal server, and after the federal server synchronizes the first parameter with the updated value of the first parameter fed back by the participants in the other subsystems to obtain the first synchronized value, the first synchronized value is still fed back to the first member C _i1. In this way, for each parameter to be synchronized, the parameters are only transferred between the relevant participant and the trusted federal service party, so that the data privacy is ensured.

In addition, in the case that the party of the subsystem encrypts the parameter to be synchronized by using a method such as homomorphic encryption, the federal service side can also perform secure synchronization of data in a homomorphic encryption manner. The federal service may feed back the published synchronization value to the participants of each subsystem, or may feed back the synchronization value in a corresponding encryption mode, which is not limited in this specification.

Then, the respective sub-server and the respective members in the respective sub-systems each receive a synchronization value of the local respective parameter to be synchronized to update the local parameter to be determined, via step 503. The participants for a single subsystem i may include at least a first member C _i1, a second member C _i2, and in some architectures may also include a sub-server S _i. In this step 503, referring to S603 in fig. 6, a single participant receives only the synchronization values of the corresponding parameters to be synchronized in the locally relevant sub-module. It can be understood that, in the implementation architecture of the present specification, the subsystem may or may not include a sub-server, where the sub-server S _i may or may not correspond to the parameter to be synchronized, and thus, in S603 of fig. 6, a portion related to the sub-server S _i is indicated by a dotted line, and only exists if the subsystem includes the sub-server S _i and corresponds to the parameter to be synchronized.

In the case where the parameter to be synchronized is a parameter to be determined, the single participant may directly update the local parameter to be determined with the synchronization value of the parameter to be synchronized. In the case that the parameter to be synchronized is a gradient of the parameter to be determined, the synchronization value of the parameter to be synchronized may be used as the current gradient, and the corresponding parameter to be determined may be updated by using a gradient-related method such as a gradient descent method, a newton method, or the like.

It should be noted that, in the embodiments of fig. 5 and fig. 6, only the first member and the second member included in the single subsystem are taken as examples for illustration. Under a more complex composite architecture, there may also be more types of members, e.g., referred to as third members, fourth members, etc. In practice, there are at least 1 such subsystem for a system of joint update model for such a hybrid cut, the subsystem comprising at least 2 training members whose sample data constitute the vertical cut. That is, a system that can divide at least one joint update model of such a subsystem can be applied to the technical solution provided in the present specification.

It will be appreciated that the system shown in fig. 3a, 3b performs the joint update model operation via the flow described in fig. 5, the flow described in fig. 5 being a flow using the system shown in fig. 3a or 3b, and thus the system shown in fig. 3a, 3b and the associated description of the flow described in fig. 5 may be adapted to each other. In particular, fig. 6 is a flow chart corresponding to the architecture of fig. 3b, and the descriptions related to fig. 3b and fig. 6 may be adapted to each other, which will not be repeated here.

In addition, from the perspective of the federal service provider, the process of jointly updating the model provided in one embodiment of the present disclosure may include: receiving updated values of all to-be-synchronized parameters, which are in one-to-one correspondence with all to-be-determined parameters in the corresponding sub-model under the condition that the synchronization conditions are met, from all sub-systems respectively, wherein in a single sub-system i, a first member C _i1 and a second member C _i2 respectively provide updated values of the to-be-synchronized parameters in a local model W _i sub-model W _si, a first sub-model W _ci1 and a second sub-model W _ci2, and the updated values of all to-be-synchronized parameters in the local model W _i are determined based on joint training of the sub-system i in a vertical segmentation mode for the corresponding sub-model; and carrying out safe synchronization on the updated values of the parameters to be synchronized received from the subsystems, and feeding back the synchronized values of the parameters to be synchronized so as to enable corresponding training members or sub-servers to finish updating the parameters to be synchronized of the local module.

In an implementation architecture including a sub-server, from the perspective of execution of the sub-server S _i, the flow of the joint update model according to one embodiment includes: along with the corresponding first member C _i1 and the second member C _i2, performing split federal learning in a vertical split mode on the corresponding local model W _i by utilizing training samples forming the vertical split on the corresponding first member C _i1 and the second member C _i2; under the condition that the synchronization condition is met, the updated values of the parameters to be synchronized, which are in one-to-one correspondence with the parameters to be determined in the submodel W _si, are sent to the federal server side, so that the federal server side can perform safe synchronization on the parameters to be synchronized based on the updated values of the parameters to be synchronized received from the subsystems; and acquiring the synchronous value of each parameter to be synchronized in the sub-model W _si subjected to the secure synchronization from the federal service side so as to update each parameter to be determined in the sub-model W _si.

From the perspective of the first member C _i1 execution, the flow of the joint update model according to one embodiment includes: performing joint training on the corresponding local model W _i by utilizing a training sample which is vertically segmented on the local member C _i2 and the second member C _i2 so as to obtain updated values of all the parameters to be synchronized, which are in one-to-one correspondence with all the parameters to be determined in the first sub-model W _ci1; under the condition that the synchronization condition is met, the updated values of the parameters to be synchronized, which are in one-to-one correspondence with the parameters to be determined in the first sub-model W _ci1, are sent to the federal service side, so that the federal service side can safely synchronize the parameters to be synchronized based on the updated values of the parameters to be synchronized received from the sub-systems; the synchronization values of all the parameters to be synchronized in the first sub-model W _ci1 after the secure synchronization are obtained from the federal service side so as to update all the parameters to be determined in the first sub-model W _ci1.

From the perspective of the second member C _i2 execution, the flow of the joint update model according to one embodiment includes: performing joint training on the corresponding local model W _i by utilizing the training samples which form vertical segmentation on the local member C _i1 and the first member C _i1 to obtain updated values of all the parameters to be synchronized, which are in one-to-one correspondence with all the parameters to be determined in the first sub-model W _ci1; under the condition that the synchronization condition is met, the updated values of the parameters to be synchronized, which are in one-to-one correspondence with the parameters to be determined in the second sub-model W _ci2, are sent to the federal service side, so that the federal service side can safely synchronize the parameters to be synchronized based on the updated values of the parameters to be synchronized received from the sub-systems; and acquiring the synchronous value of each parameter to be synchronized in the second sub-model W _ci2 subjected to the secure synchronization from the federal service side so as to update each parameter to be determined in the second sub-model W _ci2.

It should be noted that, since the operations performed by the first member and the second member are identical, in this specification, the names of the two members do not substantially distinguish between the corresponding training members, and the performed operations are only different in terms of identification, so that the operations performed by one side also apply to the other side.

Reviewing the above flow, the present specification provides a concept of a joint update model for a data composite cut scenario. The composite segmentation of the data can comprise horizontal segmentation as well as horizontal segmentation, so that conventional federal learning cannot be applied. In view of this, the present description contemplates splitting the data of each training member to form multiple horizontally split subsystems, within a single subsystem, training members with vertically split data may be included. In this way, a single subsystem with data in vertical segmentation iterates inside the subsystem through training samples distributed among a plurality of training members, thereby updating parameters to be synchronized. And the data synchronization can be carried out among all subsystems according to the synchronization period triggered by the synchronization condition. The method fully considers the data constitution of each training member, provides a solution for the joint update model under the complex data structure, and is beneficial to expanding the application range of federal learning.

According to an embodiment of another aspect, there is also provided respective means adapted for use in a system for joint updating a model as described above. The means for jointly updating the model may include means provided on the federal server, means provided on the sub-server, and means provided on the first member or the second member. Fig. 7 shows a block diagram of the structure of a device provided on the federal service side, and fig. 8 shows a block diagram of the structure of a device commonly used for the sub-service side, the first member, or the second member.

As shown in fig. 7, the apparatus 700 includes an acquisition unit and a synchronization unit. Wherein: the acquiring unit 71 is configured to receive, from each subsystem, updated values of each parameter to be synchronized, which correspond to each parameter to be determined in the corresponding local model one by one under the condition that the synchronization condition is satisfied, where the updated values of the parameter to be synchronized provided by the single subsystem i are determined based on joint training of the subsystem i in a vertical segmentation mode for the corresponding subsystem; the synchronization unit 72 is configured to perform secure synchronization on the updated values of the parameters to be synchronized received from the multiple subsystems, and feed back the synchronized values of the parameters to be synchronized, so that the corresponding training member or the sub-server can complete the update of the parameters to be synchronized of the local module.

As shown in fig. 8, the apparatus 800 includes a training unit 81, a providing unit 82, and a synchronizing unit 83.

In the case where the apparatus 800 is an apparatus for a joint update model provided to the sub-server S _i: a training unit 81 configured to perform, together with the respective first member C _i1 and second member C _i2, joint training (which may also be referred to as federal learning) in a vertical segmentation manner for the respective sub-model W _i by using training samples that constitute vertical segmentation on the respective first member C _i1 and second member C _i2; the providing unit 82 is configured to send the updated values of the parameters to be synchronized, which are in one-to-one correspondence with the parameters to be determined in the sub-module W _si, to the federal service side, so that the federal service side performs secure synchronization on the parameters to be synchronized based on the updated values of the parameters to be synchronized received from the plurality of sub-systems; and a synchronization unit 83 configured to acquire, from the federal service side, the synchronization value of each parameter to be synchronized in the sub-model W _si after the secure synchronization, so as to update each parameter to be determined in the sub-model W _si.

In the case where the apparatus 800 is an apparatus provided to the joint update model of the first member C _i1 (or the second member C _i2): a training unit 81 configured to perform split federal learning for the corresponding local model W _i using training samples constituting vertical split on the local and second members C _i2 (or the first member C _i1); the providing unit 82 is configured to send the updated values of the parameters to be synchronized, which are in one-to-one correspondence with the parameters to be determined in the first sub-model W _ci1 (or the second sub-model W _Ci2), to the federal service side, so that the federal service side performs secure synchronization on the parameters to be synchronized based on the updated values of the parameters to be synchronized received from the subsystems; the synchronization unit 83 is configured to obtain, from the federal service party, the synchronization value of each parameter to be synchronized in the first sub-model W _ci1 (or the second sub-model W _Ci2) that is subjected to the secure synchronization, so as to update each parameter to be determined in the first sub-model W _ci1 (or the second sub-model W _Ci2).

It should be noted that, the apparatus 700 shown in fig. 7 and the apparatus 800 shown in fig. 8 are respectively embodiments of apparatuses provided on a federal service side and training members in the method embodiment shown in fig. 5, so as to implement functions of corresponding service sides. Accordingly, the corresponding description in the method embodiment shown in fig. 5 applies equally to the apparatus 700 or the apparatus 800, and will not be repeated here.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the operations corresponding to any of the parties in the methods described in connection with fig. 5, 6.

According to an embodiment of still another aspect, there is further provided a computing device including a memory and a processor, where the memory stores executable code, and the processor, when executing the executable code, implements operations corresponding to any one of the parties in the methods described in connection with fig. 5 and 6.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the embodiments of the present disclosure may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The foregoing detailed description has further described the objects, technical solutions and advantageous effects of the technical concept of the present specification, and it should be understood that the foregoing is merely a specific embodiment of the technical concept of the present specification, and is not intended to limit the scope of the technical concept of the present specification, and any modifications, equivalent substitutions, improvements, etc. made on the basis of the technical solutions of the embodiments of the present specification should be included in the scope of the technical concept of the present specification.

Claims

1. A system for jointly updating a model, which comprises a federal service side and a plurality of subsystems, wherein the subsystems are used for jointly updating the model W, a single subsystem i in the subsystems comprises a first member C _i1 and a second member C _i2 in training members, sample data held by the first member C _i1 and the second member C _i2 form vertical segmentation, sample data held by each subsystem form horizontal segmentation, the single subsystem i corresponds to a local model W _i consistent with the structure of the model W, and the local model W _i comprises a first sub-model W _ci1 arranged on the first member C _i1 and a second sub-model W _ci2 arranged on the second member C _i2; wherein:

The single subsystem i is used for carrying out joint training in a vertical segmentation mode by utilizing training samples vertically segmented on a first member C _i1 and a second member C _i2 aiming at a local model W _i, providing updated values of all to-be-synchronized parameters corresponding to all to-be-determined parameters in a corresponding local model W _i to the federal server under the condition that synchronization conditions are met, and carrying out synchronization of the local to-be-synchronized parameters and the to-be-synchronized parameters in all subsystems according to the synchronization values of all to-be-synchronized parameters fed back by the federal server so as to adjust the corresponding to-be-determined parameters;

The federal service side is used for carrying out safe synchronization on updated values of parameters to be synchronized from a plurality of subsystems and feeding back synchronization values.

2. A method for jointly updating a model, the method is suitable for a process of jointly updating a model W of a system of the model, the system comprises a federal service side and a plurality of subsystems, a single subsystem i in the plurality of subsystems comprises a first member C _i1 and a second member C _i2 in training members, sample data held by the first member C _i1 and the second member C _i2 form vertical segmentation, sample data held by each subsystem form horizontal segmentation, the single subsystem i corresponds to a local model W _i consistent with the structure of the model W, and the local model W _i comprises a first sub-model W _ci1 arranged on the first member C _i1 and a second sub-model W _ci2 arranged on the second member C _i2; the method comprises the following steps:

each subsystem performs joint training in a vertical segmentation mode by using training samples vertically segmented on the corresponding first member and the second member for the corresponding local model, and each training member provides updated values of each parameter to be synchronized, which are in one-to-one correspondence with each parameter to be determined, in the corresponding sub model for the federal service side under the condition that the synchronization condition is met;

The federal service side performs safe synchronization on updated values of parameters to be synchronized from a plurality of subsystems and feeds back the synchronized values of the parameters to be synchronized;

Each training member in each subsystem receives the synchronization value of the local pending synchronization parameter to update the local pending parameter.

3. The method of claim 2, the single subsystem i further comprising a sub-server S _i, the joint training of the local model W _i by the single subsystem i comprising:

For a plurality of samples of the current round, the first member C _i1 and the second member C _i2 process corresponding local sample data through the first sub-model W _ci1 and the second sub-model W _ci2 respectively to obtain a corresponding first intermediate result R _it1 and a second intermediate result R _it2, so as to send the first intermediate result R _it1 and the second intermediate result R _it2 to the sub-server S _i;

The sub-service side S _i feeds back the gradient of the first intermediate result R _it1 and the gradient of the second intermediate result R _it2 to the first member C _i1 and the second member C _i2 respectively based on the processing of the first intermediate result R _it1 and the second intermediate result R _it2 by the third sub-model W _si;

The first member C _i1 and the second member C _i2 each determine gradients of the parameters to be synchronized in the first sub-model W _ci1 and the second sub-model W _ci2 using the gradients of the first intermediate result R _it1 and the gradients of the second intermediate result R _it2, thereby determining updated values of the parameters to be synchronized in the first sub-model W _ci1 and the second sub-model W _ci2, respectively.

4. A method according to claim 3, wherein the label holder of several samples of the current round in a single subsystem i is either a first member C _i1 or a second member C _i2; the processing of the first intermediate result R _it1 and the second intermediate result R _it2 by the sub-server S _i based on the third sub-model W _si, feeding back the gradient of the first intermediate result R _it1 and the gradient of the second intermediate result R _it2 to the first member C _i1 and the second member C _i2, respectively, further includes:

The sub-server S _i performs processing on the first intermediate result R _it1 and the second intermediate result R _it2 based on the third sub-model W _si to obtain a prediction result, and sends the prediction result to the label holder;

The label holder determines corresponding model loss through the comparison of label data of a plurality of samples of the current round and the prediction result, and feeds the model loss back to the sub-server S _i;

The sub-server S _i determines the gradient for the first intermediate result R _it1 and the gradient for the second intermediate result R _it2 from the model loss.

5. The method of claim 4, wherein in case of the undetermined parameters contained in the third sub-model W _si, the sub-server S _i further detects the gradient of the model loss for the undetermined parameters contained in the third sub-model W _si.

6. A method according to claim 3, wherein the label holder of several samples of the current round of a single subsystem i is either the first member C _i1 or the second member C _i2, said label holder being provided with a fourth sub-model W _ci3; the processing of the first intermediate result R _it1 and the second intermediate result R _it2 by the sub-server S _i based on the third sub-model W _si, feeding back the gradient of the first intermediate result R _it1 and the gradient of the second intermediate result R _it2 to the first member C _i1 and the second member C _i2, respectively, further includes:

The sub-server S _i performs processing on the first intermediate result R _it1 and the second intermediate result R _it2 based on the third sub-model W _si to obtain a third intermediate result R _it3, and sends the third intermediate result R _it3 to the label holder;

The label holder processes the third intermediate result R _it3 through a fourth sub-model W _ci3 to obtain a corresponding prediction result, and determines a gradient of model loss aiming at the third intermediate result R _it3 based on the comparison of label data of a plurality of samples of the current round and the prediction result, so as to feed back the gradient to the sub-server S _i;

The sub-server S _i determines a gradient for the first intermediate result R _it1 and a gradient for the second intermediate result R _it2 from the gradient of the third intermediate result R _it3.

7. The method of claim 2, wherein the joint training of the local model W _i by subsystem i comprises:

Each training member in the subsystem i carries out multiparty safety calculation so that each training member can determine the gradient of model loss aiming at the local undetermined parameters;

Each training member determines an updated value of the parameter to be synchronized based on the gradient of the parameter to be determined in the corresponding sub-model, wherein the first member C _i1 and the second member C _i2 determine the updated value of the parameter to be synchronized in the first sub-model W _ci1 and the second sub-model W _ci2, respectively.

8. The method of claim 2, wherein the synchronization condition comprises: each local model is updated over a predetermined round or a predetermined period of time.

9. The method of claim 2, wherein the single parameter to be synchronized is a single parameter to be determined or a single gradient corresponding to the single parameter to be determined.

10. The method of claim 2, wherein the federal service peer securely synchronizing updated values of parameters to be synchronized from a plurality of subsystems comprises:

the federal server receives each parameter to be synchronized which is transmitted by each training member and is encrypted in a preset encryption mode;

And the federal service side performs at least one of addition, weighted average and median value calculation on the updated values of the parameters to be synchronized to obtain corresponding synchronous values.

11. The method of claim 10, wherein the predetermined encryption scheme comprises one of: adding a perturbation satisfying the differential privacy; homomorphic encryption; secret sharing.

12. A method for jointly updating a model, the method is suitable for a process of jointly updating a model W of a system of the model, the system comprises a federal service side and a plurality of subsystems, a single subsystem i in the plurality of subsystems comprises a first member C _i1 and a second member C _i2 in training members, sample data held by the first member C _i1 and the second member C _i2 form vertical segmentation, sample data held by each subsystem form horizontal segmentation, the single subsystem i corresponds to a local model W _i consistent with the structure of the model W, and the local model W _i comprises a first sub-model W _ci1 arranged on the first member C _i1 and a second sub-model W _ci2 of the second member C _i2; the method is performed by the federal service party and includes:

Respectively receiving updated values of all to-be-synchronized parameters, which are in one-to-one correspondence with all to-be-determined parameters in the corresponding sub-model under the condition that the synchronization conditions are met, from all sub-systems, wherein the updated values of all to-be-synchronized parameters provided by a single sub-system i are determined based on joint training of the sub-system i in a vertical segmentation mode for the corresponding local model W _i;

And carrying out safe synchronization on the updated values of the parameters to be synchronized from the subsystems, and feeding back the synchronized values of the parameters to be synchronized so as to enable corresponding training members or sub-servers to finish the update of the parameters to be synchronized of the local module.

13. A method for jointly updating a model, the method is suitable for a process of jointly updating a model W of a system of the model, the system comprises a federal service side and a plurality of subsystems, a single subsystem i in the plurality of subsystems comprises a first member C _i1 and a second member C _i2 in training members, sample data held by the first member C _i1 and the second member C _i2 form vertical segmentation, sample data held by each subsystem form horizontal segmentation, the single subsystem i corresponds to a local model W _i consistent with the structure of the model W, and the local model W _i comprises a first module W _ci1 arranged on the first member C _i1 and a second module W _ci2 arranged on the second member C _i2; the method is performed by a first member C _i1, comprising:

Performing joint training on the corresponding local model W _i by utilizing a training sample formed by the local and the second member C _i2 in a vertical segmentation manner to obtain updated values of all the parameters to be synchronized, which are in one-to-one correspondence with all the parameters to be determined in the first sub-model W _ci1;

Under the condition that the synchronization condition is met, the updated values of the parameters to be synchronized, which are in one-to-one correspondence with the parameters to be determined in the first sub-model W _ci1, are sent to the federal service side so that the federal service side can safely synchronize the parameters to be synchronized based on the updated values of the parameters to be synchronized from the subsystems;

and acquiring the synchronous value of each parameter to be synchronized in the first sub-model W _ci1 subjected to the secure synchronization from the federal service side so as to update each parameter to be determined in the first sub-model W _ci1.

14. The method of claim 13, wherein the subsystem i further comprises a sub-server S _i, and the local model W _i corresponding to the single subsystem i further comprises a third sub-model W _si provided to the sub-server S _i; the training samples for forming the vertical segmentation by using the local and the second member C _i2 are jointly trained on the corresponding local model W _i, which comprises:

For a plurality of samples of the current round, processing corresponding local sample data through a first sub-model W _ci1 to obtain a corresponding first intermediate result R _it1, and sending the corresponding first intermediate result R _it1 to a sub-server S _i for the sub-server S _i to feed back gradients of the first intermediate result R _it1 based on processing of the first intermediate result R _it1 and the second intermediate result R _it2 by a third sub-model W _si, wherein the second intermediate result R _it2 is obtained by processing corresponding local sample data through a second sub-model W _ci2 by a second member C _i2;

Using the gradient of the first intermediate result R _it1 and the gradient of the second intermediate result R _it2, the gradient of each pending parameter in the first sub-model W _ci1 is determined, thereby determining the updated value of the pending parameter in the first sub-model W _ci1.

15. The method of claim 13, the co-training for the respective local model W _i using the local and second members C _i2 to construct vertically segmented training samples comprising:

Performing multipartite safety calculation with each training member in the subsystem i to determine a gradient of model loss for the undetermined parameter in the first sub-model W _ci1;

based on the gradient of the parameter to be determined in the first sub-model W _ci1, updated values of the corresponding parameter to be synchronized are determined.

16. A device for jointly updating a model, the device is suitable for a federal service side in a system for jointly updating the model, the system comprises the federal service side and a plurality of subsystems, a single subsystem i in the plurality of subsystems comprises a first member C _i1 and a second member C _i2 in training members, sample data held by the first member C _i1 and the second member C _i2 form vertical segmentation, sample data held by each subsystem form horizontal segmentation, the single subsystem i corresponds to a local model W _i consistent with the structure of the model W, and the local model W _i comprises a first sub-model W _ci1 arranged on the first member C _i1 and a second sub-model W _ci2 arranged on the second member C _i2; the device comprises:

And the synchronization unit is configured to perform safe synchronization on the updated values of the parameters to be synchronized received from the subsystems and feed back the synchronized values of the parameters to be synchronized so that the corresponding training members can finish the update of the parameters to be synchronized of the local module.

17. A device for jointly updating a model, the device is suitable for a process of jointly updating a system W of the model, the system comprises a federal service side and a plurality of subsystems, a single subsystem i in the plurality of subsystems comprises a first member C _i1 and a second member C _i2 in training members, sample data held by the first member C _i1 and the second member C _i2 form vertical segmentation, sample data held by each subsystem form horizontal segmentation, the single subsystem i corresponds to a local model W _i consistent with the structure of the model W, and the local model W _i comprises a first module W _ci1 arranged on the first member C _i1 and a second module W _ci2 arranged on the second member C _i2; the device is provided on the first member C _i1, including:

The training unit is configured to perform combined training on the corresponding local model W _i by utilizing training samples which form vertical segmentation on the local member C _i2 and the second member C _i2 so as to obtain updated values of all to-be-synchronized parameters which are in one-to-one correspondence with all to-be-determined parameters in the first sub model W _ci1;

18. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 10-15.

19. A computing device comprising a memory and a processor, wherein the memory has executable code stored therein, which when executed by the processor, implements the method of any of claims 10-15.