CN114676838A

CN114676838A - Method and device for jointly updating model

Info

Publication number: CN114676838A
Application number: CN202210380007.5A
Authority: CN
Inventors: 郑龙飞; 王磊; 王力
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-04-12
Filing date: 2022-04-12
Publication date: 2022-06-28

Abstract

In the federal learning process, a training member uploads part of parameters to be synchronized, and a service party issues aggregate values of part of parameters to be synchronized to the training member, so that data communication traffic in the process of joint training is reduced. The method comprises the steps that for a single training member, a sent aggregation value is selected jointly based on parameters to be synchronized uploaded by the training member and aggregation values determined by a server, so that the local data characteristics and the global data characteristics of the training member are fully considered, a model trained through federal learning better meets the actual business requirements, and the effectiveness of federal learning is improved.

Description

Method and device for jointly updating model

Technical Field

One or more embodiments of the present disclosure relate to the field of computer technology, and more particularly, to a method and apparatus for jointly updating a model.

Background

The development of computer technology has enabled machine learning to be more and more widely applied in various business scenarios. Federated learning is a method of joint modeling with protection of private data. For example, enterprises need to perform collaborative security modeling, and federal learning can be performed, so that data of all parties are used for performing collaborative training on a data processing model on the premise of sufficiently protecting enterprise data privacy, and business data are processed more accurately and effectively. In a federal learning scenario, after negotiating a machine learning model structure (or an agreed model), each party can use private data to train locally, and aggregate model parameters by using a safe and reliable method, and finally, each party improves a local model according to the aggregated model parameters. The federal learning is realized on the basis of privacy protection, a data island is effectively broken, and multi-party combined modeling is realized.

However, with the gradual increase of task complexity and performance requirements, the number of layers of a model network in federal learning tends to be gradually increased, and the number of model parameters is increased correspondingly. Taking face recognition ResNET-50 as an example, the original model has over 2000 ten thousand parameters, and the size of the model exceeds 100 MB. Particularly in a scene with more training members participating in federal learning, the data received by the server is increased in geometric multiples. Therefore, how to sparsify the interactive parameters of each training member in the server side in the process of jointly training the model is an important problem of reducing communication pressure and avoiding communication blockage.

Disclosure of Invention

One or more embodiments of the present specification describe a method and apparatus for jointly updating a model to address one or more of the problems identified in the background.

According to a first aspect, a method for jointly updating a model is provided, which is applied to a process of jointly updating a model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, and the method comprises the following steps: each training member updates M parameters to be synchronized corresponding to the model by using the local training sample, and the parameters to be synchronized are one-to-oneRespective pending parameters corresponding to the model; each training member selects a plurality of parameters to be synchronized from M parameters to be synchronized and uploads the parameters to the corresponding server, wherein the number of the parameters to be synchronized selected by a single training member i is M_i(ii) a The service side aggregates the to-be-synchronized values uploaded by each training member to obtain M aggregated values corresponding to M to-be-synchronized parameters respectively; the server side feeds back each synchronization parameter set for each training member according to the M aggregation values, wherein the synchronization parameter set W corresponding to a single training member i_iThere is a correspondence of n_iAn aggregate value of n_iN corresponding to each aggregate value_iM via which each parameter to be synchronized is uploaded_iDetermining the value to be synchronized and the M aggregation values together; and each training member updates the undetermined parameters in the local model by using the corresponding synchronous parameter set, so that the local model is updated.

In one embodiment, the single parameter to be synchronized is one of a single undetermined parameter, a gradient of the single undetermined parameter, and a difference between a current value and an initial value of the single undetermined parameter.

In one embodiment, the undetermined parameters of the local model of each training member are uniformly initialized by the server, and horizontal segmentation is formed between the local training samples of each training member.

In one embodiment, the number m of parameters to be synchronized is selected by a single training member i_iAccording to the product of the predetermined local activation ratio and the number M of the parameters to be synchronized.

In one embodiment, the number m of uploaded parameters to be synchronized is determined by a single training member i through at least one of pruning and thinning models_i。

In one embodiment, the service side may aggregate the to-be-synchronized values of the single to-be-synchronized parameter by at least one of weighted summation, averaging, median taking, maximum taking and minimum taking of the to-be-synchronized values uploaded by the training members with respect to the single to-be-synchronized parameter value.

In one embodiment, i, m is trained for members_iThe value to be synchronized describes the partSparse value set

The M aggregation values describe an aggregation value set W of the global model_s,tThe server determines the corresponding synchronization parameter set W in the following way_i: aggregation value set W for global model_s,tCarrying out sparsification to obtain a global sparse value set

Based on local sparse value set

And a global sparse value set

Determining a corresponding synchronization parameter set W_i。

In a further embodiment, the aggregate value set of the global model is described by a matrix, and the local sparse value set and the global sparse value set are described by a local sparse matrix and a global sparse matrix, respectively.

In a further embodiment, the local sparse value-based set is based on a local sparse value

And a global sparse value set

Determining a corresponding synchronization parameter set W_iThe method comprises the following steps: separately detecting local sparse matrices

And a global sparse matrix

Obtaining a local sparse position matrix M by using the non-zero element position_i,tAnd a global sparse position matrix M_s,t(ii) a Based on sparse position matrix M_i,tAnd M_s,tIs a union of non-zero element positions

Determining a sparse position matrix corresponding to a set of synchronization parameters

According to a sparse position matrix

The indicated non-zero element position selects a plurality of corresponding aggregation values from the aggregation values to form a synchronization parameter set W_i。

In a further embodiment, the sparse position matrix

The non-zero element positions in (1) are: union set

A non-zero element position of; from union set

A predetermined number of randomly selected non-zero element positions; from union according to predetermined selection probability

Of a predetermined number of non-zero element positions selected, wherein for the sparse position matrix M_i,tAnd M_s,tIs greater than the second selection probability of other positions.

In another further embodiment, the local sparse value-based set is based on a local sparse value

And a global sparse value set

Determining a corresponding synchronization parameter set W_iThe method comprises the following steps: obtaining M aggregations corresponding to the global modelGlobal sparse matrix of values

Contrasting local sparse matrices

And a global sparse matrix

To obtain a correlation coefficient beta_i,t(ii) a Based on the correlation coefficient beta_i,tDetermining a synchronization parameter set W_iCorresponding sparse position matrix

From a sparse position matrix

And the respective aggregation values determine the synchronization parameter set W_i。

In a further embodiment, the local sparse matrix is contrasted

And a global sparse matrix

To obtain a correlation coefficient beta_i,tThe method comprises the following steps: detecting local sparse matrices

And a global sparse matrix

The correlation distance is a local sparse matrix

And a global sparse matrix

Or a local sparse position matrix M_i,tAnd a global sparse position matrix M_s,tOne of the euler distance, cosine distance, manhattan distance, pearson similarity, jaccard similarity, and hamming distance; determining the correlation coefficient beta according to the normalization result of the correlation distance_i,t。

In yet a further embodiment said correlation coefficient β is based on said correlation coefficient_i,tDetermining a synchronization parameter set W_iCorresponding sparse position matrix

The method comprises the following steps: from a local sparse position matrix M_i,tAnd a global sparse position matrix M_s,tMatrix of intersection positions

To select a first number N₁To obtain a first position matrix

From M_i,tAbout

Non-zero element complement matrix of

To select a second number N₂To obtain a second position matrix

From M_s，tAbout

Non-zero element complement matrix of

To select a third number N₃To a non-zero element position of, obtainThird position matrix

Based on the first position matrix

Second position matrix

Third position matrix

Determining a sparse position matrix

In one embodiment, the first number N₁As a matrix of intersection positions

Number k of non-zero elements in_i,tSaid second number N₂Correlation coefficient beta_i,tPositive correlation, the third number N₃Correlation coefficient beta_i,tA negative correlation.

In one embodiment, the first number N₁The second number N₂The third number N₃The sum being a predetermined value k, said second number N₂The third number N₃Are both equal to predetermined values k and k_i,tThe difference of (a) is positively correlated.

In one embodiment, the first number N₁Is 0, the second number N₂Correlation coefficient beta_i,tPositive correlation, the third number N₃Correlation coefficient beta_i,tA negative correlation.

In one embodiment, the first number N₁Is 0, the second number N₂And

a non-zero number of oneSo that said third number N₃Correlation coefficient beta_i,tA negative correlation.

In one embodiment, each of the training members updates the pending parameters in the local model using the corresponding synchronization parameter set, so as to update the local model, including: a single training member i utilizes a corresponding set of synchronization parameters W_iReplacing the value to be synchronized in the local parameter to be synchronized with each aggregation value in the local parameter to be synchronized; and updating the local model by using the updated parameter to be synchronized.

According to a second aspect, there is provided a method for jointly updating a model, applied to a process of jointly updating a model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, the method performed by the server, and comprising: receiving the value to be synchronized sent by each training member, wherein the number of the value to be synchronized sent by a single training member i is m_i，m_iThe value to be synchronized is M of M parameters to be synchronized updated by using the local training sample_iUpdating values of the parameters to be synchronized, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one; aggregating the values to be synchronized uploaded by each training member to obtain M aggregated values corresponding to M parameters to be synchronized respectively; feeding back each synchronization parameter set for each training member according to the M aggregation values respectively so that each training member updates undetermined parameters in the local model by using the corresponding synchronization parameter set respectively, and accordingly updating the local model, wherein the synchronization parameter set W corresponding to a single training member i_iThere is a correspondence of n_iAn aggregate value of n_iN corresponding to each aggregate value_iM via which each parameter to be synchronized is uploaded_iAnd determining the value to be synchronized and the M aggregation values together.

According to a third aspect, a method for jointly updating a model is provided, which is applied to a process of jointly updating a model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, and the method is applied to training a member i, and comprises the following steps: updating M parameters to be synchronized corresponding to the model by using the local training sample, wherein each parameter isThe parameters to be synchronized correspond to the parameters to be determined of the model one by one; selecting M from M parameters to be synchronized_iEach parameter to be synchronized is corresponding to m_iUploading the value to be synchronized to the server side for the server side to feed back the synchronization parameter set W_iWherein the synchronization parameter set W_iCorresponding to n_iAn aggregate value of n_iN corresponding to each aggregate value_iM via which each parameter to be synchronized is uploaded_iDetermining the value to be synchronized and the M aggregation values together; using synchronization parameter set W_iAnd updating the undetermined parameters in the local model so as to update the local model.

According to a fourth aspect, there is provided an apparatus for jointly updating a model, which is applied to a process of jointly updating a model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, the apparatus is provided on the server, and includes:

a receiving unit configured to receive the to-be-synchronized values sent by each training member, wherein the number of the to-be-synchronized values sent by a single training member i is m_i，m_iThe value to be synchronized is M of M parameters to be synchronized updated by using a local training sample_iUpdating values of the parameters to be synchronized, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;

the aggregation unit is configured to aggregate the to-be-synchronized values uploaded by each training member to obtain M aggregation values corresponding to M to-be-synchronized parameters respectively;

a feedback unit configured to feed back each synchronization parameter set for each training member according to the M aggregation values, so that each training member updates the undetermined parameter in the local model by using the corresponding synchronization parameter set, thereby updating the local model, wherein the synchronization parameter set W corresponding to a single training member i_iThere is a correspondence of n_iAn aggregate value of n_iN corresponding to each aggregate value_iM via which each parameter to be synchronized is uploaded_iAnd determining the value to be synchronized and the M aggregation values together.

According to a fifth aspect, there is provided an apparatus for jointly updating a model, which is applied to a process of jointly updating a model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, the apparatus is provided for a training member i, and includes:

the training unit is configured to update M parameters to be synchronized corresponding to the model by using a local training sample, and each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;

an uploading unit configured to determine M corresponding to the M parameters to be synchronized_iM of a parameter to be synchronized_iUploading the value to be synchronized to the server side for the server side to feed back the synchronization parameter set W_iWherein the synchronization parameter set W_iThere is a correspondence of n_iAn aggregate value of n_iN corresponding to each aggregate value_iM via which each parameter to be synchronized is uploaded_iDetermining the value to be synchronized and the M aggregation values together;

a synchronization unit configured to utilize a synchronization parameter set W_iAnd updating the undetermined parameters in the local model so as to update the local model.

According to a sixth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the second or third aspect.

According to a seventh aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and the processor, when executing the executable code, implements the method of the second or third aspect.

By the method, the device and the system provided by the embodiment of the specification, in the federal learning process, the training members upload part of parameters to be synchronized, and the service side issues the aggregation values of part of parameters to be synchronized to the training members, so that the data communication traffic in the joint training process is reduced. The method comprises the steps that for a single training member, a sent aggregation value is selected jointly based on parameters to be synchronized uploaded by the training member and the aggregation value determined by a server, so that the local data characteristics of the training member are fully considered, the model training process is more personalized, and the effectiveness of the model is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a system architecture for federal learning;

FIG. 2 is a schematic diagram of a specific implementation architecture under the technical concept of the present specification;

FIG. 3 is a schematic diagram illustrating the interaction flow between a server and a single training member in the process of jointly training a model according to one embodiment of the present disclosure;

FIG. 4 illustrates a flow diagram of a joint training model performed by a server in one embodiment of the present description;

FIG. 5 is a schematic flow diagram of a joint training model performed by a server according to another embodiment of the present disclosure;

FIG. 6 is a schematic block diagram of an apparatus for a server-side joint training model according to one embodiment of the present disclosure;

FIG. 7 is a schematic block diagram of an apparatus for a joint training model provided to training members according to another embodiment of the present disclosure.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

Federal Learning (Federated Learning), which may also be referred to as federal machine Learning, joint Learning, league Learning, and the like. Federal machine learning is a machine learning framework, and can effectively help a plurality of organizations to perform data use and machine learning modeling under the condition of meeting the requirements of user privacy protection, data safety and government regulations.

In particular, assuming that enterprise A and enterprise B each build a task model, individual tasks may be categorical or predictive, and these tasks have also been approved by the respective users when obtaining data. However, the models at each end may not be able to be built or may not work well due to incomplete data, such as lack of tag data for enterprise a, lack of user profile data for enterprise B, or insufficient data and insufficient sample size to build a good model. The problem to be solved by federal learning is how to build high-quality models on each end of a and B, the training of the models is used for the data of each enterprise, such as a and B, and the owned data of each enterprise is not known by other parties, namely, a common model is built without violating data privacy regulations. This common model is just like the optimal model that the parties aggregate the data together. In this way, the built model serves only the own targets in the area of each party.

The implementation architecture of federated learning is shown in FIG. 1. The various organizations for federal learning may be referred to as training members, which may also be data holders or data providers. Each training member can hold different business data, and can also participate in the joint training of the model through equipment, a computer, a server and the like. The service data may be various forms of data such as characters, pictures, voice, animation, video, and the like. Generally, the business data held by each training member has correlation, and the business party corresponding to each training member may also have correlation. For example, a plurality of banks relating to financial services are provided as business parties, and each business party can individually provide businesses such as savings and loan to a user, and can hold data such as the age, sex, income and expenditure lines, loan amount, and deposit amount of the user. For another example, a plurality of hospitals related to medical services are used as service parties, and each service party may use diagnosis records such as age, sex, symptoms, diagnosis results, treatment schemes, treatment results, and the like of a user as local service data.

Under this implementation architecture, the model may be trained jointly by two or more training members. The model can be used for processing business data to obtain various models of corresponding business processing results, and can also be called as a business model. What kind of service data is specifically processed and what kind of service processing result is obtained, which depends on actual requirements. For example, the service data may be data related to the user finance, and the obtained service processing result is a financial credit evaluation result of the user, and for example, the service data may be customer service dialogue data of the user, and the obtained service processing result is a recommendation result of a customer service answer, and the like. The service data may be in the form of various forms of data such as text, pictures, animation, audio, video, and the like. And each training member can utilize the trained service model to perform local service processing on local service data.

In the process of jointly training the business model, the service party can provide assistance for joint learning of each business party, for example, assistance in performing nonlinear calculation, comprehensive model parameter or gradient calculation and the like. Fig. 1 shows the service party in the form of a separate party, such as a trusted third party, that is provided independently of the individual training members. In practice, the service party may also be distributed to or composed of each training member, and joint auxiliary computation may be performed between each training member by using a secure computation protocol (e.g., secret sharing). This is not limited in this specification.

Referring to fig. 1, under the implementation framework of federal learning, a server may initialize a global model to distribute to each training member. Each training member can locally calculate the gradient of the model parameters according to the global model determined by the server side, and update the model parameters according to the gradient. The server may aggregate gradients of model parameters, or other parameters related to model parameters. The parameters needing to be aggregated by the service side can be collectively called as parameters to be synchronized, the numerical values corresponding to the parameters to be synchronized can be called as values to be synchronized, and the parameter values for aggregation and synchronization of the service side can be called as aggregation values. The service may feed back the aggregate value to the individual training members. And each training member updates the local model parameters according to the received aggregation values. And circulating in this way, and finally training the business model suitable for each business party. It is to be understood that if a model parameter to be adjusted in the model is referred to as a pending parameter, the parameter to be synchronized may be a parameter associated with the pending parameter, such as the pending parameter itself, a gradient of the pending parameter, a difference between the pending parameter and an initial value, and the like.

It is understood that federated learning can be divided into horizontal federated learning (feature alignment), vertical federated learning (sample alignment), and federated migrant learning. The implementation framework provided by the specification can be used for various federal learning frameworks, and is particularly suitable for horizontal federal learning, namely, each training member provides part of independent samples respectively, or sample data of each training member forms horizontal segmentation.

Under the condition that a plurality of training members participating in federal learning exist, data received by the server are increased in geometric multiple, communication blockage is easily caused, and the efficiency of integral training is seriously influenced. Therefore, in the multi-party federal learning process, the model is usually compressed, that is, the number of parameters uploaded to the service party by a single training member is compressed (i.e., thinned), so as to reduce the pressure of communication transmission. Federal learning generally performs model aggregation based on the assumption of data IID (identical and independent distributions). Specifically, the features of the training samples have the same distribution and are independent of each other. However, since the sample of the data owner is associated with one or more of the corresponding sample subject (e.g. user), the region where the sample subject is located, the time window of data acquisition, and the like, when performing the joint training, the data sets often have different feature distributions or label distributions, and the features are not independent of each other. This type of dataset is referred to as the Non-IID (identical and independent distribution) dataset. For example, banks have an association between income characteristics and deposit characteristics determined by a user, a bank may only have income characteristics of a certain user and no deposit characteristics (e.g., payroll is quickly transferred out to other bank deposits), and the bank or other banks have both income and deposit characteristics for another user. In the case of federal learning model aggregation based on the assumption of data IID, the federal model under the Non _ IID dataset may be poor in performance.

In order to adapt to federal learning under the Non _ IID data set, the specification provides a technical concept of jointly updating the model, and the model is aggregated and updated according to the data characteristics of each training member. Fig. 2 is a schematic diagram illustrating an interaction scenario of each training member with a service party (third party) under the technical concept of the present specification. As shown in fig. 2, assuming that the jointly trained business model is a multi-layer neural network (each layer of neural network is arranged from top to bottom or from bottom to top in fig. 2), the black solid points are considered as the nodes that are currently activated. It can be seen that, in the stage of updating the parameters to be synchronized using the local training data, the model nodes of the respective data parties may all be activated. Further, under the technical concept of the present specification, on one hand, the training member updates the parameter to be synchronized according to the local data characteristics and then uploads the parameter to be synchronized to the service provider as the to-be-synchronized value of the sparse parameter to be synchronized, as shown in the real point in fig. 2. On the other hand, after the service side aggregates the training members, the service side issues the aggregation values to be aggregated to the training members, or may be sparse aggregation values, and in combination with the sparse matrix uploaded by each training member, the service side individually issues the sparse matrix of the aggregation values to each training member. As shown in fig. 2, the activation nodes of the sparse matrix uploaded by a single data party and the sparse matrix issued by the service party may be the same or different, and are not limited herein.

The server side determines a single sparse matrix issued by a single training member in combination with a sparse matrix (with local data characteristics fused) uploaded by the training member. And training the members to synchronize partial parameters to be synchronized according to the sparse matrix issued by the server. Therefore, the communication data volume between the training members and the service side can be effectively reduced, the local model can be updated to adapt to the characteristics of local data, and the effectiveness of federal learning is improved.

The technical idea of the present specification is described in detail below.

Referring to fig. 3, an example of an interaction process of a single training member i with a service party in a federal learning (i.e., joint update model) process is shown. The process of jointly updating the model can be realized by a server and a plurality of training members in the federal learning, and the training member i can be any one of the training members participating in the federal learning. The service side can aggregate the parameters to be synchronized uploaded by each training member. The server here may be any device, platform or cluster of devices with computing, processing capabilities.

It is understood that there may be periods where multiple parameters are synchronized during the process of jointly updating the model. Initially, the server side may determine the global model and initialize the model parameters to be issued to each training member, or each training member may negotiate the model structure, each training member locally constructs the local model, and the server side initializes each model parameter. The service side may also preset the required hyper-parameters, which may include, for example, one or more of a waiting time T, a parameter compression ratio α, a parameter number m uploaded, a model parameter update step size, a total training cycle number, and the like. The model parameters (such as weight parameters, constant term parameters, etc.) in the model are parameters to be adjusted, i.e. parameters to be determined. As mentioned above, each pending parameter may correspond to a parameter to be synchronized. In each synchronization period, the training members determine the updated values of the parameters to be synchronized (namely the values to be synchronized) by using local training samples, upload the values to be synchronized to the server in a sparse mode, aggregate and synchronize the values by the server and feed back the values to the training members, and update the local parameters to be determined by the training members according to the aggregated values.

The process shown in fig. 3 is an example of a synchronization cycle in an embodiment, and an overall process of the joint update model is described in terms of interaction between a training member and a server in combination with any training member i in a plurality of training members. For convenience of description, the number of undetermined parameters in the model is assumed to be a positive integer M greater than 1.

First, in step 301, a training member i updates M parameters to be synchronized corresponding to a model by using a local training sample. It will be appreciated that the pending parameters in the model may be, for example: weight parameters for feature aggregation, excitation parameters for excitation layers, truncation parameters, and the like. The process of business model training is the process of determining the undetermined parameters. In the federal learning process, parameters to be synchronized among training members can be determined according to modes such as pre-negotiation and the like, and the parameters to be synchronized correspond to the parameters to be determined one by one.

In the updating period of the single undetermined parameter, each training member can update each undetermined parameter in the local model according to the local training sample. Training Member i reads local training sample X_iN of one batch_iStrip sample data b_iProceed to model Y_iForward propagation of, get b_iThe prediction labels corresponding to the training samples are recorded as

And then based on the actual sample label y_iAnd predictive tagging

To determine model loss L_i. And then according to the model loss L_iAnd adjusting each undetermined parameter. In an optional embodiment, various updating methods such as a gradient descent method, a newton method, and the like may be generally used for updating the undetermined parameter, so as to reduce the undetermined parameter through a gradient direction of the undetermined parameter, so that the undetermined parameter tends to an optimal value. In this case, the training member i can lose L according to the model_iUpdating the gradient of each parameter to be determined by means of a back-propagation algorithm, e.g. as a gradient matrix G_i ⁰. The updating of the respective pending parameter is performed on the basis of the corresponding gradient.

The single synchronization cycle may correspond to one update cycle of the pending parameter, or may correspond to a plurality of update cycles of the pending parameter. For example, the parameter synchronization is performed every n update periods, where the positive integer n ≧ 1, or the immediate update period is determined at predetermined time intervals. For a single training member, the parameter synchronization process for the parameters to be synchronized may correspond to a single update period. Assuming that the current synchronization period is t, that is, the t-th parameter synchronization process (assuming that parameter synchronization of N periods is performed in total, then t is 0, 1.. and N), a training member may determine the current parameter to be synchronized in the N × t update period, for example, denoted as W_i,t ^u。

In one embodiment, the parameter to be synchronized is a parameter to be determined, and in a corresponding single update cycle, a single training member may use the parameter to be determined updated according to the training samples of the current batch as a value to be synchronized of the parameter to be synchronized.

In another embodiment, the parameter to be synchronized is a gradient of the parameter to be determined, and a single training member may determine the gradient of the parameter to be determined as a value to be synchronized of the parameter to be synchronized before the parameter to be determined is updated according to the training samples of the current batch.

In another embodiment, the parameter to be synchronized may also be a difference between the parameter to be determined and an initial value thereof, and a single training member may compare the parameter to be determined updated according to the training samples of the current batch with the initial value initialized by the server, and the obtained difference is used as the value to be synchronized of the parameter to be synchronized.

In other embodiments, the parameter to be synchronized may also be other parameters according to different update modes of the parameter to be determined in the model, which is not described in detail herein. Aiming at the M undetermined parameters in the model, each training member can update the M undetermined parameters in the same mode to obtain corresponding M updated values.

Then, in step 302, the training member i selects M from M parameters to be synchronized_iA parameter to be synchronized to convert the corresponding m_iAnd uploading the value to be synchronized to a server. It can be understood that under the technical idea of the present specification, each training member needs to compress the data volume of the uploading service. The training member i can compress the number of the parameters to be synchronized from M to M_iAnd (4) respectively. m is_iCan be much smaller than M, e.g. M is 10 ten thousand, M_iIs 1000. The compression of the parameters to be synchronized can be performed by means of pruning, thinning and the like.

In one embodiment, m_iCan be determined by the number of samples associated with the training member i, such as a preset fixed value, or a fixed value positively correlated with the number of samples M of the training member i. A fixed value of (2). For example, M/N (where N is the number of training members), or other fixed value that is inversely related to the total sample size of each training member, positively related to the sample size held by training member i, and so forth. At this time, at eachThe synchronization period is that the training member i can upload m to the server_iThe value to be synchronized of each parameter to be synchronized.

In one embodiment, m_iCan be controlled by a preset uploading proportionality coefficient alpha_iAnd (4) determining. For example m_i＝α_iX M, or α_i×|M|₀Wherein | M | R₀And determining the zero-order norm of the current updating value for the training member i aiming at the M parameters to be synchronized, wherein the value of the zero-order norm is M. Since the parameters to be synchronized tend to converge with the iteration of the synchronization period, a single training member may upload the parameters to be synchronized to the server according to a smaller upload proportion, in an alternative embodiment, the upload proportion coefficient α_iThe reduction may be in the form of linearity, exponential, etc. If the attenuation coefficient rho is preset, the number m of uploaded parameters to be synchronized in the current period t_iIs a decreasing function (e.g., exponential function, trigonometric function, etc.) of the attenuation coefficient p. Taking the decreasing function as an exponential function, e.g. m_i＝α×ρ^tX M, where p is a number less than 1, e.g. 0.95, then p raised to the power of t, p, with increasing number of cycles t^tGradually decreases.

In other embodiments, the training member i may also determine the number m of values to be synchronized uploaded to the server via other reasonable ways_iTherefore, the description is omitted. It is noted that when the result obtained by the above calculation method is non-integer, the form of rounding up or rounding down can be adopted as m_iAnd (6) taking the whole. Similarly, other training members can determine the number of the parameters to be synchronized to be uploaded locally in a similar manner.

Training member i to select M from M parameters to be synchronized_iDuring the synchronization, the selection may be performed in a random selection manner, or may be performed in an order from a large absolute value to a small absolute value of an update value, or may be performed continuously according to a preset initial position (e.g., ith parameter of a layer 2 neural network), or may be performed according to a parameter to be synchronized, which is specified in advance for the training member i, or may be performed in a combination of the above manners, which is not limited herein. In determining m_iAfter each parameter to be synchronized, training member i may obtain m determined in step 301_iAnd uploading the updated value serving as the value to be synchronized to the server.

For selected m_iThe training member i can upload the parameters to the server side after marking the parameters through the unique identifier, and can also upload the parameters to the server side in a parameter matrix form, which is not limited in the specification. For example, the parameter to be synchronized marked by the unique identification is as (w)_jk)_iWherein (w)_jk)_iAnd representing the identification of the parameter to be synchronized corresponding to the kth parameter of the j-th layer of neural network of the training member i, wherein the value to be synchronized represents the corresponding value to be synchronized. On the other hand, in matrix form, m_iThe local sparse matrix corresponding to each value to be synchronized is recorded as

Then the local sparse matrix

M and_im is the position corresponding to each parameter to be synchronized_iOne value to be synchronized, the rest (M-M)_i) Positions are all 0. When uploading the values to be synchronized in the form of a matrix, the data is transmitted in the form of, for example, [ j, k, ] for a single value to be synchronized]And representing the value to be synchronized of the model parameter corresponding to the jth row and kth column in the parameter matrix of the business model. In other words, the upload manner is index (index) + value (value), (j, k) is index and value. Thus, when the parameters to be synchronized in the form of matrix are uploaded to the server side, the uploaded value is m_iFurthermore, rows and columns can be defined in a numerical form corresponding to fewer bytes by shaping int and the like, so that the extra data volume is reduced when data is uploaded.

It is understood that pruning, sparsifying models, etc. generally select parameters with larger absolute values. The parameter with a larger absolute value usually has a larger influence on the result, and is the parameter determined in the current period and temporarily in the higher importance. Taking the TopK sparse model method as an example, if the model parameters are described in a matrix form, the training member i can be based on the locally updated parameter matrix to be synchronized

M is selected in which the absolute value is the largest_i＝K_iSetting the value of each element to be 1 and setting the other values to be 0 at the corresponding position of the mapping matrix with the same dimension as the parameter matrix to be synchronized to obtain the current local sparse position matrix M of the training member i_i,t. Local sparse matrix

May be a sparse position matrix M_i,tParameter matrix to be synchronized

The product of (a):

the server i can exchange

And uploading the data to a server. It can be appreciated that the local sparse matrix

In selected m_iThe element of each position is the current value to be synchronized, and the rest positions are 0, so that the value is uploaded

Only m can be uploaded in the process_iElements of individual positions.

In an optional implementation manner, before uploading the to-be-synchronized parameters, a single training member may add disturbance meeting the difference privacy to the local to-be-synchronized parameters to protect the local data privacy. For example, the disturbed data satisfying the standard gaussian distribution with mean 0 and variance 1 can be added to the parameter to be synchronized through the gaussian mechanism of differential privacy, so as to form the disturbed data to be synchronized. In the case where the data to be synchronized is represented in a matrix form, the added disturbance data may be a disturbance matrix satisfying a predetermined gaussian distribution. Wherein a single training member i can select m_iAfter the parameter to be synchronized is selected, adding disturbance to the selected parameter to be synchronized, or selecting m_iAdding disturbance data before each parameter to be synchronized, and selecting M parameters to be synchronized after adding disturbance data according to the rule described in the foregoing_iAnd (5) a parameter to be synchronized. This is not a limitation of the present specification. In addition, the noise added by the training member i on the parameter to be synchronized can also satisfy an exponential mechanism, a Laplace (Laplace) mechanism and the like.

And 303, aggregating the to-be-synchronized values uploaded by each training member by the server to obtain M aggregated values corresponding to the M to-be-synchronized parameters respectively. In the aggregation process of the to-be-synchronized values, the aggregation of the to-be-synchronized values of the to-be-synchronized parameters is independent. That is to say, for a single parameter to be synchronized, assuming that s training members feed back s corresponding values to be synchronized to the server, the aggregate value corresponding to the single parameter to be synchronized is the aggregate value of the s updated values.

The aggregation of the to-be-synchronized values of the single to-be-synchronized parameter may be performed by at least one of weighted summation, averaging, median taking, maximum taking, minimum taking, and the like. For example, for the parameter w to be determined_jkIf there are s training members feeding back the value to be synchronized, the aggregate value may be

The summation here is done for the training members, e.g. i takes from 1 to s. Under the condition that each training member uploads each local sparse matrix, the server side can also aggregate corresponding values to be synchronized through corresponding elements of the matrix, for example, an aggregation matrix formed by aggregation values in an averaging mode is

The matrix division here uses element-wise operation, i.e. the first element (e.g. the first row and the first column) of the matrix as dividend is divided by the first element of the matrix as divisor to obtain the matrix W as quotient_s,tThe first element of (1).

Particularly, it should be noted that when each training member feeds back parameters to be synchronized to the server, under the condition that there is no mutual agreement or advance negotiation, there may be a situation that none of the parameters to be synchronized corresponding to a plurality of model parameters passes back in the current synchronization period t, that is, the total amount of training samples in the current period corresponding to a single parameter to be synchronized is 0. Thus, in the calculation process, 0 is used as the denominator, and an error value (e.g., NON) is obtained. At this time, the aggregation value in the synchronization parameter set may be determined according to the actual situation. For example, the previous aggregation value may be used as the aggregation value of the current cycle. For another example, when the parameter to be synchronized is the parameter to be determined itself, a special flag may be set so that the values of the corresponding parameter to be synchronized are not synchronized. When the parameter to be synchronized is a gradient value, the value 0 can also be used as the aggregate value of the corresponding parameter to be synchronized in the current update period.

Step 304, the server feeds back the synchronization parameter set W to the training member i according to the M aggregation values_i. In order to further reduce the traffic during the model updating process, the server may also feed back the aggregation value of part of the parameters to be synchronized to the training members. In consideration of the personalized characteristics of the local training data of each training member, the server can respectively determine different synchronization parameter sets for each training member.

Therefore, the server side can determine the sparse aggregation value issued by the server side based on the value to be synchronized uploaded by the training member i and the M aggregation values. For example, the aggregation value sent by the server is consistent with the parameter to be synchronized corresponding to the value to be synchronized uploaded by the training member i. Under the condition that the value to be synchronized uploaded by the training member i is described in a matrix form, the server side can share the local sparse matrix

And the aggregation value of each parameter to be synchronized corresponding to the medium nonzero element is sent to the training member i.

In order to consider the characteristics of global data more comprehensively, in a possible implementation manner, a server may first determine the sparsification of M aggregation values corresponding to a global modelAnd (6) obtaining the result. The sparsification result may be noted as a global sparse value set, for example

Which, if described in matrix form, may be referred to as a global sparse matrix

The thinning result of the M aggregation values can be performed by pruning, thinning models and the like. In this way, the global sparse value set of the server side can select the relatively important parameter to be synchronized in the global model. Taking TOP K sparsification method as an example, K with the largest absolute value in M aggregation values can be taken_sAnd constructing a global sparse value set

In the matrix situation, the global sparse matrix

In the selected K_sEach element being corresponding to K_sThe other positions are 0.

Then, a global sparse value set may be utilized

And local sparse value sets of individual training members (e.g.

) Together determining a respective set of synchronization parameters. Using training member i as an example, a global sparse value set may be utilized

And local sparse value set

Determine together the corresponding synchronization parameter set, denoted W_i. In particular, a global sparse value set may be utilized

And local sparse value set

Determining the corresponding synchronization parameter set W by at least one item in the intersection and the union of the corresponding parameters to be synchronized_i. The following description of the synchronization parameter set W takes the above value set as a matrix form as an example_iThe determination process of (1).

Utilizing global sparse matrices

And local sparse matrix

The non-zero element positions in (b) may determine the non-zero element positions in the synchronization parameter set, e.g. the global sparse position matrix M is determined separately_s,tAnd a local sparse position matrix M_i,tReuse of the global sparse position matrix M_s,tAnd a local sparse position matrix M_i,tDetermining a synchronization parameter matrix

Sparse position matrix of (1), as

Wherein the content of the first and second substances,

the number of non-zero elements in (1) is recorded as n_iRepresents the service direction to train member feedback n_iAn aggregate value. Which aggregate values are fed back specifically

The non-zero element position determination in (1). n is_iIs usually less than M, which is related to M_iMay be equal or unequal. Single synchronization parameter matrix

Can be regarded asA sparse matrix of aggregation matrices of aggregated values. At this time, the process of the present invention,

the element corresponding to the non-zero element position in (b) may be set to 1, which is used to describe a sparse position, and further, a sparse matrix corresponding to the synchronization parameter set corresponding to the training member i is obtained

The set of synchronization parameters W, which can be seen in matrix form_i，

The non-zero element position in (1) corresponds to the synchronization parameter determined by the server for the training member i in the current updating period.

According to one possible design, a sparse position matrix M may be utilized_i,tAnd M_s,tDetermines the synchronization parameter set W_iSparse position matrix in

For example, M_i,tAnd M_s,tThe union of the positions of the non-zero elements being

Can be formed as

Determined as a sparse position matrix

Or from

Is obtained by randomly selecting a predetermined number of non-zero element positions

Non-zero element positions.

In an alternative implementation, the slave

The importance of each position may also be considered in the non-zero element position process in (1). For example, for sparse position matrix M_i,tAnd M_s,tThe intersection positions of the non-zero element positions in (b) use a greater selection probability. This is because M_i,tAnd M_s,tThe positions where all the non-zero elements are located may be more important positions relative to the local model of the corresponding training member and the global model of the server, and the intersection positions are still non-zero elements and help to transfer more important parameters. The matrix of the assumed intersection positions is

Circle ride

Representing multiplication of corresponding elements, e.g. M_i,tElement of first row and first column and M_s,tMultiplication of elements of the first row and the first column

The first row and the first column. Further, can make

Neutralization of

Has a greater probability of selection (e.g., 0.7) for the corresponding position of the non-zero element(s), and

have a smaller choice in other positionsProbability (e.g., 0.3) based on

A predetermined number of non-zero element positions are selected according to the probability.

According to another possible design, a global sparse matrix may also be utilized

And local sparse matrix

Of the preceding

And

can be based on

And

may also be based on the corresponding sparse position matrix M_i,tAnd M_s,tIs determined by the comparison of (a). With M_i,tAnd M_s,tComparative example of (D), M_i,tAnd M_s,tThe correlation of (c) can be described by a correlation coefficient determined by a correlation distance such as the euler distance, cosine distance, manhattan distance, Pearson (Pearson) similarity, jaccard (jaccard) similarity, hamming (hamming) distance, and the like, as denoted by dist (M)_i,t,M_s,t). Those skilled in the art will understand that the correlation distance may have a different range according to the description, for example, the euler distance may have a range of 0 to 2, the cosine distance may have a range of 0 to 1, the manhattan distance may have a range of 0 to 2k (2k is the sum of the non-zero elements of the two matrices), and so on. In order to make the correlation coefficient (as beta)_i,t) Keeping the interval between 0 and 1, the correlation distance can be adjustedAs a correlation coefficient. E.g. coefficient of correlation beta at euler distance_i,tIs dist (M)_i,t,M_s,t) Correlation coefficient beta at/2, Manhattan distance_i,tIs dist (M)_i,t,M_s,t) /2k, etc. Direct use of global sparse matrices

And local sparse matrix

When determining the correlation coefficient, it is not necessary to use a corresponding sparse position matrix, and the correlation coefficient determining method is similar to that of the sparse position matrix, and is not described herein again.

In an alternative implementation, the correlation coefficient may be determined from M_i,tAnd M_s,tMatrix of intersection positions

To select a first number N₁Is not zero, e.g. as a first position matrix

From M_i,tAbout

Non-zero element complement matrix of

Of a second number N₂Is not zero, e.g. as a second position matrix

From M_s,tAbout

Of the non-zero element complement matrix

To select a third number N₃Is not zero, e.g. is denoted as a third position matrix

The first number, the second number, and the third number may be determined based on the above correlation coefficients.

In an alternative implementation, the method may be implemented in a computer system

The number of the non-zero elements is recorded as k_i,tIt can be understood that k_i,tIs shown at M_i,tAnd M_s,tAll correspond to the number of positions of the non-zero element, then the first number N₁＝k_i,t. The second quantity may be positively correlated with a correlation coefficient, e.g. N₂＝β_i,t(k-k_i,t) The third number may be inversely related to the correlation coefficient, e.g. N₃＝(1-β_i,t)(k-k_i,t). Where k is the number of non-zero elements in the global sparse matrix or the global sparse position matrix, and k may be a predetermined value or according to a predetermined sparse truncation value (for example, a numerical position in the global sparse matrix that is greater than the truncation value corresponds to a position of a non-zero element in the sparse matrix). Thus, the global sparse matrix for member i is trained

And local sparse matrix

The higher the correlation is, the higher the proportion of non-zero element positions of the synchronization parameter set determined according to the non-zero elements in the local sparse matrix of the training member is selected, and the more important parameter values to be synchronized uploaded by the training member tend to be fed back to the training member.

Further, the server may determine the first number N₁A second number N₂A third number N₃Selecting a corresponding number of non-zero element positions from the correlation position matrixTo a corresponding first position matrix

Second position matrix

Third position matrix

Thus, the final sparse position matrix determined for the training member i in the current update cycle

In another alternative implementation, the first number N₁May be 0, a second number N₂May be beta_i,tk, third number N₃May be (1-. beta.)_i,t) k, thus, can be selected from

In selecting beta_i,tk elements are obtained

In that

Selection of (1-. beta.)_i,t) k elements are obtained

And then to

In yet another alternative implementation, the first number N₁May be 0, the second number N₂Can be combined with

In a non-zero amount, e.g. order

A third number N₃May be (1-. beta.)_i,t) k, thus, can be selected from

In selecting beta_i,tk elements are obtained

In that

Selection of (1-. beta.)_i,t) k elements are obtained

And then to

In other optional implementation manners, the service side may also determine the synchronization parameter set of the current update period for each training member through other reasonable manners, which is not described herein again. In the above description, a sparse matrix is compared with its corresponding sparse position matrix, and the positions of the non-zero elements of the two are consistent, except that the non-zero elements of the sparse position matrix may be a predetermined value (e.g. 1), and the non-zero elements of the sparse matrix are the corresponding real element values of the matrix before sparsification. It is worth noting that the aggregate value set W of the global model_s,tGlobal sparse value set

Local sparse value set

Instead of being described in the form of a matrix, the individual values of the flags may be identified by the parameters of the parameters to be determined, in which case the synchronization parameter set W_iThe determination principle of (2) is similar to that in the form of a matrix, except that the intersection, union, etc. of the parameter identifications can be determined by the parameter identifications without judging via a position matrixAnd will not be described herein.

Then, the service side may feed back each synchronization parameter set corresponding to each training member to the corresponding training member. Wherein, the synchronization parameter set for the feedback of the training member i may be W_i. In an alternative embodiment, the synchronization parameter set W_iMay be by means of a sparse matrix

And (4) showing.

In step 305, training member i utilizes synchronization parameter set W_iAnd updating the undetermined parameters in the local model so as to update the local model. It is understood that the synchronization parameter set fed back by the service party can be updated by each training member.

For example, in the case that the parameters to be synchronized are model parameters, the respective aggregation values in the synchronization parameter set are used to replace the corresponding parameters to be determined of the local model one by one. And under the condition that the parameter to be synchronized is the gradient of the parameter to be determined, updating the local corresponding parameter to be determined one by one according to the corresponding step length by using a gradient descent method, a Newton method and the like and using each aggregation value in the synchronization parameter set. Described in a matrix manner, the updating process of the parameters to be synchronized in the training member i is, for example

Wherein the content of the first and second substances,

based on the synchronization parameter set W_iA determined sparse position matrix. It can be seen that the training member is local to only the sparse matrix W_iThe elements in (1) are replaced.

In this way, each training member selectively synchronizes the local parameters with other training members via the server. Therefore, the communication data volume is saved, the correlation between the global model and the local model is fully considered, and an effective training scheme is provided for the federal study with large data volume.

The training members can jointly train the model through a plurality of iterations of the synchronization period shown in fig. 2 with the assistance of the server. The iteration end condition of the joint training model may be: the parameters to be synchronized tend to converge, the model loss tends to converge, the iteration period reaches a predetermined period, and the like. Wherein convergence may be understood as the amount of change being smaller than a predetermined threshold.

The model update flow of one embodiment of the present specification is described above in connection with the schematic diagram of FIG. 3 from the interaction perspective of a training member and a server. FIG. 4 shows a flow diagram of a server and a plurality of training members jointly updating a model described from the perspective of the server.

As shown in fig. 4, from the perspective of the server, the process of jointly updating the model includes:

step 401, receiving the synchronization values sent by each training member, wherein the number of the synchronization values sent by a single training member i is m_i，m_iThe value to be synchronized is M of M parameters to be synchronized updated by using a local training sample_iUpdating values of the parameters to be synchronized, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;

step 402, aggregating the values to be synchronized uploaded by each training member to obtain M aggregated values corresponding to M parameters to be synchronized respectively;

step 403, feeding back each synchronization parameter set for each training member according to the M aggregation values, so that each training member updates the undetermined parameter in the local model by using the corresponding synchronization parameter set, thereby updating the local model, where the synchronization parameter set W corresponding to a single training member i_iThere is a correspondence of n_iA polymerization value, n_iN corresponding to each aggregate value_iM via which each parameter to be synchronized is uploaded_iAnd determining the value to be synchronized and the M aggregation values together.

FIG. 5 shows a flow diagram of a server and a plurality of training members jointly updating a model described from the perspective of the training members. As shown in FIG. 5, from the perspective of training member i, the process of jointly updating the model includes:

step 501, updating M parameters to be synchronized corresponding to a model by using a local training sample, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;

step 502, select M from M parameters to be synchronized_iEach parameter to be synchronized is corresponding to m_iUploading the value to be synchronized to the server side for the server side to feed back the synchronization parameter set W_iWherein the synchronization parameter set W_iThere is a correspondence of n_iA polymerization value, n_iN corresponding to each aggregate value_iM via which each parameter to be synchronized is uploaded_iDetermining the values to be synchronized and the M aggregation values together;

step 503, using the synchronization parameter set W_iAnd updating the undetermined parameters in the local model so as to update the local model.

It can be understood that fig. 4 and fig. 5 are specific examples of the flow executed by the server and the training member i in fig. 3 in a single synchronization cycle, respectively, and therefore, the corresponding description of the execution flow of the relevant party in fig. 3 is also applicable to fig. 4 and fig. 5, and is not repeated herein.

Reviewing the process, in a single synchronization period of the joint training model, after each training member updates M parameters to be synchronized by using a local training sample, each training member selects part of the updated values of the parameters to be synchronized to upload to the server, and the server selects part of the aggregated values to issue, so that the communication pressure between the training members and the server is reduced, the calculation amount of the server can be reduced, and the learning efficiency is improved. The server side aggregates the updated values of the parameters to be synchronized by each training member to obtain M aggregated values, and then determines a corresponding synchronization parameter set according to the personalized characteristics of each training member. The data characteristics of the local samples of the training members are fully considered, so that the federal learning is more pertinent and personalized.

According to an embodiment of another aspect, a system for federated update models is provided that includes a server and a plurality of training members. Assuming that the training member i is any training member of the plurality of training members and is recorded as the training member i, the training member i and the service side may be respectively provided with corresponding devices for jointly updating the model, so as to complete federal learning in a matching manner.

FIG. 6 illustrates an apparatus 600 for a federated update model hosted by a server. As shown in fig. 6, the apparatus 600 includes:

a receiving unit 61 configured to receive the to-be-synchronized values sent by each training member, where the number of the to-be-synchronized values sent by a single training member i is m_i，m_iThe value to be synchronized is M of M parameters to be synchronized updated by using a local training sample_iUpdating values of the parameters to be synchronized, wherein the parameters to be synchronized correspond to the parameters to be determined of the model one by one;

the aggregation unit 62 is configured to aggregate the to-be-synchronized values uploaded by each training member to obtain M aggregation values corresponding to the M to-be-synchronized parameters, respectively;

a feedback unit 63 configured to respectively feed back each synchronization parameter set for each training member according to the M aggregation values, so that each training member updates the undetermined parameter in the local model by using the corresponding synchronization parameter set, thereby updating the local model, where the synchronization parameter set W corresponding to a single training member i_iThere is a correspondence of n_iA polymerization value, n_iN corresponding to each aggregate value_iM via which each parameter to be synchronized is uploaded_iAnd determining the value to be synchronized and the M aggregation values together.

FIG. 7 illustrates an apparatus 700 for a joint update model provided to any one of the training members. As shown in fig. 7, the apparatus 700 includes:

the training unit 71 is configured to update M to-be-synchronized parameters corresponding to the model by using a local training sample, wherein each to-be-synchronized parameter corresponds to each to-be-determined parameter of the model one by one;

an uploading unit 72 configured to select M from the M parameters to be synchronized_iThe parameter to be synchronized corresponds to m_iUploading the value to be synchronized to the server side for the server side to feed back the synchronization parameter set W_iWherein the synchronization parameter set W_iThere is a correspondence of n_iA polymerization value, n_iN corresponding to each aggregate value_iM via which each parameter to be synchronized is uploaded_iDetermining the value to be synchronized and the M aggregation values together;

a synchronization unit 73 configured to utilizeStep parameter set W_iAnd updating the undetermined parameters in the local model so as to update the local model.

It should be noted that the

apparatuses

600 and 700 shown in fig. 6 and fig. 7 correspond to the method embodiments shown in fig. 4 and fig. 5, respectively, and may be applied to the server and the single training member in the method embodiment shown in fig. 3, respectively, to cooperate with the training member to complete the process of jointly updating the business model in fig. 3. Therefore, the description related to the service party and the single training member in fig. 3 can be applied to the

apparatuses

600 and 700 shown in fig. 6 and fig. 7, and will not be described herein again.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 4 or fig. 5 or the like.

According to an embodiment of still another aspect, there is also provided a computing device including a memory and a processor, the memory having stored therein executable code, the processor implementing the method described in conjunction with fig. 4 or fig. 5, and so on, when executing the executable code.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of this specification may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments are intended to explain the technical idea, technical solutions and advantages of the present specification in further detail, and it should be understood that the above-mentioned embodiments are merely specific embodiments of the technical idea of the present specification, and are not intended to limit the scope of the technical idea of the present specification, and any modification, equivalent replacement, improvement, etc. made on the basis of the technical solutions of the embodiments of the present specification should be included in the scope of the technical idea of the present specification.

Claims

1. A method for jointly updating a model is applied to a process of jointly updating the model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, and the method comprises the following steps:

each training member updates M parameters to be synchronized corresponding to the model by using a local training sample, and each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;

each training member selects a plurality of parameters to be synchronized from M parameters to be synchronized and uploads the parameters to the corresponding server, wherein the number of the parameters to be synchronized selected by a single training member i is M_i；

The server side aggregates the values to be synchronized uploaded by each training member to obtain M aggregated values corresponding to M parameters to be synchronized respectively;

the server side feeds back each synchronization parameter set for each training member according to the M aggregation values, wherein the synchronization parameter set W corresponding to a single training member i_iThere is a correspondence of n_iAn aggregate value of n_iN corresponding to each aggregate value_iM via which the parameter to be synchronized is uploaded_iDetermining the values to be synchronized and the M aggregation values together;

and each training member updates the undetermined parameters in the local model by using the corresponding synchronous parameter set, so that the local model is updated.

2. The method of claim 1, wherein the single parameter to be synchronized is one of a single pending parameter, a gradient of the single pending parameter, a difference between a current value of the single pending parameter and an initial value.

3. The method of claim 1, wherein the pending parameters of the local model of each training member are uniformly initialized by the server, and a horizontal split is formed between the local training samples of each training member.

4. The method of claim 1, wherein a single trainerNumber m of parameters to be synchronized selected by training member i_iAccording to the product of the predetermined local activation ratio and the number M of the parameters to be synchronized.

5. The method of claim 1, wherein the number m of uploaded parameters to be synchronized is determined by a single training member i through at least one of pruning and sparsifying a model_i。

6. The method of claim 1, wherein the service side aggregates the to-be-synchronized values of the single to-be-synchronized parameter by at least one of weighted summation, averaging, median taking, maximum taking and minimum taking of the to-be-synchronized values uploaded by the training members with respect to the single to-be-synchronized parameter value.

7. The method of claim 1, wherein i, m is for a training member_iDescribing local sparse value set by each value to be synchronized

The M aggregation values describe an aggregation value set W of the global model_s,tThe server determines the corresponding synchronization parameter set W in the following way_i：

Aggregation value set W for global model_s,tCarrying out sparsification to obtain a global sparse value set

Based on local sparse value set

And a global sparse value set

Determining a corresponding synchronization parameter set W_i。

8. The method of claim 7, wherein the aggregate value set of the global model is described by a matrix, and the local sparse value set and the global sparse value set are described by a local sparse matrix and a global sparse matrix, respectively.

9. The method of claim 8, wherein the local sparse value set based is

And a global sparse value set

Determining a corresponding synchronization parameter set W_iThe method comprises the following steps:

separately detecting local sparse matrices

And a global sparse matrix

Obtaining a local sparse position matrix M by using the non-zero element position_i,tAnd a global sparse position matrix M_s,t；

Based on sparse position matrix M_i,tAnd M_s,tIs a union of non-zero element positions

Determining a sparse position matrix corresponding to a synchronization parameter set

According to a sparse position matrix

The indicated non-zero element positions select corresponding aggregation values from the aggregation values to form a synchronization parameter set W_i。

10. The method of claim 9, wherein the sparse position matrix

The non-zero element positions in (1) are:

union set

A non-zero element position of;

from union set

A predetermined number of randomly selected non-zero element positions;

from union according to predetermined selection probability

11. The method of claim 8, wherein the local sparse value set-based

And a global sparse value set

obtaining a global sparse matrix of M aggregation values corresponding to a global model

Contrasting local sparse matrices

And a global sparse matrix

To obtain a correlation coefficient beta_i,t；

Based on the correlation coefficient beta_i,tDetermining a synchronization parameter set W_iCorresponding sparse position matrix

From a sparse position matrix

And the respective aggregation values, determining a synchronization parameter set W_i。

12. The method of claim 11, wherein the local sparse matrices are compared

And a global sparse matrix

To obtain a correlation coefficient beta_i,tThe method comprises the following steps:

detecting local sparse matrices

And a global sparse matrix

The correlation distance is a local sparse matrix

And a global sparse matrix

Or a local sparse position matrix M_i,tAnd a global sparse position matrix M_s,tOne of the euler distance, cosine distance, manhattan distance, pearson similarity, jaccard similarity, and hamming distance;

determining the correlation coefficient beta according to the normalization result of the correlation distance_i,t。

13. The method of claim 11, wherein the correlation coefficient β is based on_i,tDetermining a synchronization parameter set W_iCorresponding sparse position matrix

The method comprises the following steps:

from a local sparse position matrix M_i,tAnd a global sparse position matrix M_s,tMatrix of intersection positions

Is selected to be a first number N₁To obtain a first position matrix

From M_i,tAbout

Non-zero element complement matrix of

Of a second number N₂To obtain a second position matrix

From M_s,tAbout

Non-zero element complement matrix of

To select a third number N₃To obtain a third position matrix

Based on the first position matrix

Second position matrix

Third position matrix

Determining a sparse position matrix

14. The method of claim 13, wherein the first number N₁As a matrix of intersection positions

15. The method of claim 14, wherein the first number N₁Said second number N₂The third number N₃The sum being a predetermined value k, said second number N₂The third number N₃Are all equal to the predetermined valuek and k_i,tThe difference of (a) is positively correlated.

16. The method of claim 13, wherein the first number N₁Is 0, the second number N₂Coefficient of correlation beta_i,tPositive correlation, the third number N₃Correlation coefficient beta_i,tAnd (4) carrying out negative correlation.

17. The method of claim 13, wherein the first number N₁Is 0, the second number N₂And

is consistent with the non-zero number in (b), the third number N₃Correlation coefficient beta_i,tA negative correlation.

18. The method of claim 1, wherein the training members each update the pending parameters in the local model with a respective set of synchronization parameters, such that updating the local model comprises:

a single training member i utilizes a corresponding set of synchronization parameters W_iReplacing the value to be synchronized in the local parameter to be synchronized with each aggregation value in the local parameter to be synchronized;

and updating the local model by using the updated parameter to be synchronized.

19. A method for jointly updating a model, which is applied to a process of jointly updating the model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, and the method is executed by the server and comprises the following steps:

receiving the value to be synchronized sent by each training member, wherein the number of the value to be synchronized sent by a single training member i is m_i，m_iThe value to be synchronized is M of M parameters to be synchronized updated by using a local training sample_iUpdating values of the parameters to be synchronized, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;

aggregating the values to be synchronized uploaded by each training member to obtain M aggregated values corresponding to M parameters to be synchronized respectively;

feeding back each synchronization parameter set for each training member according to the M aggregation values respectively so that each training member updates undetermined parameters in the local model by using the corresponding synchronization parameter set respectively, and accordingly updating the local model, wherein the synchronization parameter set W corresponding to a single training member i_iThere is a correspondence of n_iAn aggregate value of n_iN corresponding to each aggregate value_iM via which each parameter to be synchronized is uploaded_iAnd determining the value to be synchronized and the M aggregation values together.

20. A method for jointly updating a model is applied to a process of jointly updating the model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, and the method is suitable for training a member i and comprises the following steps:

updating M parameters to be synchronized corresponding to the model by using a local training sample, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;

selecting M from M parameters to be synchronized_iEach parameter to be synchronized is corresponding to m_iUploading the value to be synchronized to the server side for the server side to feed back the synchronization parameter set W_iWherein the synchronization parameter set W_iCorresponding to n_iAn aggregate value of n_iN corresponding to each aggregate value_iM via which each parameter to be synchronized is uploaded_iDetermining the value to be synchronized and the M aggregation values together;

using synchronization parameter set W_iAnd updating the undetermined parameters in the local model so as to update the local model.

21. The utility model provides a device of joint update model, is applied to the process that server and a plurality of training members jointly updated the model, and wherein, the local model of each training member is unanimous with the global model structure that server held, the device is located the server side, includes:

a feedback unit configured to feed back each synchronization parameter set for each training member according to the M aggregation values, so that each training member updates the undetermined parameter in the local model by using the corresponding synchronization parameter set, thereby updating the local model, wherein the synchronization parameter set W corresponding to a single training member i_iCorresponding to n_iAn aggregate value of n_iN corresponding to each aggregate value_iM via which each parameter to be synchronized is uploaded_iAnd determining the value to be synchronized and the M aggregation values together.

22. The device for jointly updating the model is applied to the process of jointly updating the model by a server and a plurality of training members, wherein the local model of each training member is consistent with the global model structure held by the server, the device is arranged on a training member i, and the device comprises:

an uploading unit configured to determine M corresponding to the M parameters to be synchronized_iM of a parameter to be synchronized_iUploading the value to be synchronized to the server side for the server side to feed back the synchronization parameter set W_iWherein the parameter set W is synchronized_iThere is a correspondence of n_iAn aggregate value of n_iN corresponding to each aggregate value_iM via which the parameter to be synchronized is uploaded_iA value to be synchronized and the M aggregated valuesJointly determining;

a synchronization unit configured to utilize a synchronization parameter set W_iAnd updating the undetermined parameters in the local model, thereby updating the local model.

23. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of claim 19 or 20.

24. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of claim 19 or 20.