CN114676838A - Method and device for jointly updating model - Google Patents
Method and device for jointly updating model Download PDFInfo
- Publication number
- CN114676838A CN114676838A CN202210380007.5A CN202210380007A CN114676838A CN 114676838 A CN114676838 A CN 114676838A CN 202210380007 A CN202210380007 A CN 202210380007A CN 114676838 A CN114676838 A CN 114676838A
- Authority
- CN
- China
- Prior art keywords
- synchronized
- model
- local
- training
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 103
- 238000012549 training Methods 0.000 claims abstract description 288
- 230000001360 synchronised effect Effects 0.000 claims abstract description 213
- 230000002776 aggregation Effects 0.000 claims abstract description 77
- 238000004220 aggregation Methods 0.000 claims abstract description 77
- 230000008569 process Effects 0.000 claims abstract description 41
- 239000011159 matrix material Substances 0.000 claims description 159
- 230000000875 corresponding effect Effects 0.000 claims description 124
- 230000000295 complement effect Effects 0.000 claims description 6
- 238000013138 pruning Methods 0.000 claims description 5
- 238000012935 Averaging Methods 0.000 claims description 4
- 230000004931 aggregating effect Effects 0.000 claims description 4
- 230000002596 correlated effect Effects 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 2
- 238000004891 communication Methods 0.000 abstract description 9
- 238000010586 diagram Methods 0.000 description 11
- 238000012545 processing Methods 0.000 description 8
- 238000009826 distribution Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 238000010801 machine learning Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 230000003993 interaction Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000006116 polymerization reaction Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 3
- 102000002274 Matrix Metalloproteinases Human genes 0.000 description 2
- 108010000684 Matrix Metalloproteinases Proteins 0.000 description 2
- 230000033228 biological regulation Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000005284 excitation Effects 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 241000287196 Asthenes Species 0.000 description 1
- 101100481876 Danio rerio pbk gene Proteins 0.000 description 1
- 101100481878 Mus musculus Pbk gene Proteins 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000006386 neutralization reaction Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioethics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Complex Calculations (AREA)
Abstract
In the federal learning process, a training member uploads part of parameters to be synchronized, and a service party issues aggregate values of part of parameters to be synchronized to the training member, so that data communication traffic in the process of joint training is reduced. The method comprises the steps that for a single training member, a sent aggregation value is selected jointly based on parameters to be synchronized uploaded by the training member and aggregation values determined by a server, so that the local data characteristics and the global data characteristics of the training member are fully considered, a model trained through federal learning better meets the actual business requirements, and the effectiveness of federal learning is improved.
Description
Technical Field
One or more embodiments of the present disclosure relate to the field of computer technology, and more particularly, to a method and apparatus for jointly updating a model.
Background
The development of computer technology has enabled machine learning to be more and more widely applied in various business scenarios. Federated learning is a method of joint modeling with protection of private data. For example, enterprises need to perform collaborative security modeling, and federal learning can be performed, so that data of all parties are used for performing collaborative training on a data processing model on the premise of sufficiently protecting enterprise data privacy, and business data are processed more accurately and effectively. In a federal learning scenario, after negotiating a machine learning model structure (or an agreed model), each party can use private data to train locally, and aggregate model parameters by using a safe and reliable method, and finally, each party improves a local model according to the aggregated model parameters. The federal learning is realized on the basis of privacy protection, a data island is effectively broken, and multi-party combined modeling is realized.
However, with the gradual increase of task complexity and performance requirements, the number of layers of a model network in federal learning tends to be gradually increased, and the number of model parameters is increased correspondingly. Taking face recognition ResNET-50 as an example, the original model has over 2000 ten thousand parameters, and the size of the model exceeds 100 MB. Particularly in a scene with more training members participating in federal learning, the data received by the server is increased in geometric multiples. Therefore, how to sparsify the interactive parameters of each training member in the server side in the process of jointly training the model is an important problem of reducing communication pressure and avoiding communication blockage.
Disclosure of Invention
One or more embodiments of the present specification describe a method and apparatus for jointly updating a model to address one or more of the problems identified in the background.
According to a first aspect, a method for jointly updating a model is provided, which is applied to a process of jointly updating a model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, and the method comprises the following steps: each training member updates M parameters to be synchronized corresponding to the model by using the local training sample, and the parameters to be synchronized are one-to-oneRespective pending parameters corresponding to the model; each training member selects a plurality of parameters to be synchronized from M parameters to be synchronized and uploads the parameters to the corresponding server, wherein the number of the parameters to be synchronized selected by a single training member i is Mi(ii) a The service side aggregates the to-be-synchronized values uploaded by each training member to obtain M aggregated values corresponding to M to-be-synchronized parameters respectively; the server side feeds back each synchronization parameter set for each training member according to the M aggregation values, wherein the synchronization parameter set W corresponding to a single training member iiThere is a correspondence of niAn aggregate value of niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediDetermining the value to be synchronized and the M aggregation values together; and each training member updates the undetermined parameters in the local model by using the corresponding synchronous parameter set, so that the local model is updated.
In one embodiment, the single parameter to be synchronized is one of a single undetermined parameter, a gradient of the single undetermined parameter, and a difference between a current value and an initial value of the single undetermined parameter.
In one embodiment, the undetermined parameters of the local model of each training member are uniformly initialized by the server, and horizontal segmentation is formed between the local training samples of each training member.
In one embodiment, the number m of parameters to be synchronized is selected by a single training member iiAccording to the product of the predetermined local activation ratio and the number M of the parameters to be synchronized.
In one embodiment, the number m of uploaded parameters to be synchronized is determined by a single training member i through at least one of pruning and thinning modelsi。
In one embodiment, the service side may aggregate the to-be-synchronized values of the single to-be-synchronized parameter by at least one of weighted summation, averaging, median taking, maximum taking and minimum taking of the to-be-synchronized values uploaded by the training members with respect to the single to-be-synchronized parameter value.
In one embodiment, i, m is trained for membersiThe value to be synchronized describes the partSparse value setThe M aggregation values describe an aggregation value set W of the global models,tThe server determines the corresponding synchronization parameter set W in the following wayi: aggregation value set W for global models,tCarrying out sparsification to obtain a global sparse value setBased on local sparse value setAnd a global sparse value setDetermining a corresponding synchronization parameter set Wi。
In a further embodiment, the aggregate value set of the global model is described by a matrix, and the local sparse value set and the global sparse value set are described by a local sparse matrix and a global sparse matrix, respectively.
In a further embodiment, the local sparse value-based set is based on a local sparse valueAnd a global sparse value setDetermining a corresponding synchronization parameter set WiThe method comprises the following steps: separately detecting local sparse matricesAnd a global sparse matrixObtaining a local sparse position matrix M by using the non-zero element positioni,tAnd a global sparse position matrix Ms,t(ii) a Based on sparse position matrix Mi,tAnd Ms,tIs a union of non-zero element positionsDetermining a sparse position matrix corresponding to a set of synchronization parametersAccording to a sparse position matrixThe indicated non-zero element position selects a plurality of corresponding aggregation values from the aggregation values to form a synchronization parameter set Wi。
In a further embodiment, the sparse position matrixThe non-zero element positions in (1) are: union setA non-zero element position of; from union setA predetermined number of randomly selected non-zero element positions; from union according to predetermined selection probabilityOf a predetermined number of non-zero element positions selected, wherein for the sparse position matrix Mi,tAnd Ms,tIs greater than the second selection probability of other positions.
In another further embodiment, the local sparse value-based set is based on a local sparse valueAnd a global sparse value setDetermining a corresponding synchronization parameter set WiThe method comprises the following steps: obtaining M aggregations corresponding to the global modelGlobal sparse matrix of valuesContrasting local sparse matricesAnd a global sparse matrixTo obtain a correlation coefficient betai,t(ii) a Based on the correlation coefficient betai,tDetermining a synchronization parameter set WiCorresponding sparse position matrixFrom a sparse position matrixAnd the respective aggregation values determine the synchronization parameter set Wi。
In a further embodiment, the local sparse matrix is contrastedAnd a global sparse matrixTo obtain a correlation coefficient betai,tThe method comprises the following steps: detecting local sparse matricesAnd a global sparse matrixThe correlation distance is a local sparse matrixAnd a global sparse matrixOr a local sparse position matrix Mi,tAnd a global sparse position matrix Ms,tOne of the euler distance, cosine distance, manhattan distance, pearson similarity, jaccard similarity, and hamming distance; determining the correlation coefficient beta according to the normalization result of the correlation distancei,t。
In yet a further embodiment said correlation coefficient β is based on said correlation coefficienti,tDetermining a synchronization parameter set WiCorresponding sparse position matrixThe method comprises the following steps: from a local sparse position matrix Mi,tAnd a global sparse position matrix Ms,tMatrix of intersection positionsTo select a first number N1To obtain a first position matrixFrom Mi,tAboutNon-zero element complement matrix ofTo select a second number N2To obtain a second position matrixFrom Ms,tAboutNon-zero element complement matrix ofTo select a third number N3To a non-zero element position of, obtainThird position matrixBased on the first position matrixSecond position matrixThird position matrixDetermining a sparse position matrix
In one embodiment, the first number N1As a matrix of intersection positionsNumber k of non-zero elements ini,tSaid second number N2Correlation coefficient betai,tPositive correlation, the third number N3Correlation coefficient betai,tA negative correlation.
In one embodiment, the first number N1The second number N2The third number N3The sum being a predetermined value k, said second number N2The third number N3Are both equal to predetermined values k and ki,tThe difference of (a) is positively correlated.
In one embodiment, the first number N1Is 0, the second number N2Correlation coefficient betai,tPositive correlation, the third number N3Correlation coefficient betai,tA negative correlation.
In one embodiment, the first number N1Is 0, the second number N2Anda non-zero number of oneSo that said third number N3Correlation coefficient betai,tA negative correlation.
In one embodiment, each of the training members updates the pending parameters in the local model using the corresponding synchronization parameter set, so as to update the local model, including: a single training member i utilizes a corresponding set of synchronization parameters WiReplacing the value to be synchronized in the local parameter to be synchronized with each aggregation value in the local parameter to be synchronized; and updating the local model by using the updated parameter to be synchronized.
According to a second aspect, there is provided a method for jointly updating a model, applied to a process of jointly updating a model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, the method performed by the server, and comprising: receiving the value to be synchronized sent by each training member, wherein the number of the value to be synchronized sent by a single training member i is mi,miThe value to be synchronized is M of M parameters to be synchronized updated by using the local training sampleiUpdating values of the parameters to be synchronized, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one; aggregating the values to be synchronized uploaded by each training member to obtain M aggregated values corresponding to M parameters to be synchronized respectively; feeding back each synchronization parameter set for each training member according to the M aggregation values respectively so that each training member updates undetermined parameters in the local model by using the corresponding synchronization parameter set respectively, and accordingly updating the local model, wherein the synchronization parameter set W corresponding to a single training member iiThere is a correspondence of niAn aggregate value of niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediAnd determining the value to be synchronized and the M aggregation values together.
According to a third aspect, a method for jointly updating a model is provided, which is applied to a process of jointly updating a model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, and the method is applied to training a member i, and comprises the following steps: updating M parameters to be synchronized corresponding to the model by using the local training sample, wherein each parameter isThe parameters to be synchronized correspond to the parameters to be determined of the model one by one; selecting M from M parameters to be synchronizediEach parameter to be synchronized is corresponding to miUploading the value to be synchronized to the server side for the server side to feed back the synchronization parameter set WiWherein the synchronization parameter set WiCorresponding to niAn aggregate value of niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediDetermining the value to be synchronized and the M aggregation values together; using synchronization parameter set WiAnd updating the undetermined parameters in the local model so as to update the local model.
According to a fourth aspect, there is provided an apparatus for jointly updating a model, which is applied to a process of jointly updating a model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, the apparatus is provided on the server, and includes:
a receiving unit configured to receive the to-be-synchronized values sent by each training member, wherein the number of the to-be-synchronized values sent by a single training member i is mi,miThe value to be synchronized is M of M parameters to be synchronized updated by using a local training sampleiUpdating values of the parameters to be synchronized, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;
the aggregation unit is configured to aggregate the to-be-synchronized values uploaded by each training member to obtain M aggregation values corresponding to M to-be-synchronized parameters respectively;
a feedback unit configured to feed back each synchronization parameter set for each training member according to the M aggregation values, so that each training member updates the undetermined parameter in the local model by using the corresponding synchronization parameter set, thereby updating the local model, wherein the synchronization parameter set W corresponding to a single training member iiThere is a correspondence of niAn aggregate value of niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediAnd determining the value to be synchronized and the M aggregation values together.
According to a fifth aspect, there is provided an apparatus for jointly updating a model, which is applied to a process of jointly updating a model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, the apparatus is provided for a training member i, and includes:
the training unit is configured to update M parameters to be synchronized corresponding to the model by using a local training sample, and each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;
an uploading unit configured to determine M corresponding to the M parameters to be synchronizediM of a parameter to be synchronizediUploading the value to be synchronized to the server side for the server side to feed back the synchronization parameter set WiWherein the synchronization parameter set WiThere is a correspondence of niAn aggregate value of niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediDetermining the value to be synchronized and the M aggregation values together;
a synchronization unit configured to utilize a synchronization parameter set WiAnd updating the undetermined parameters in the local model so as to update the local model.
According to a sixth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the second or third aspect.
According to a seventh aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and the processor, when executing the executable code, implements the method of the second or third aspect.
By the method, the device and the system provided by the embodiment of the specification, in the federal learning process, the training members upload part of parameters to be synchronized, and the service side issues the aggregation values of part of parameters to be synchronized to the training members, so that the data communication traffic in the joint training process is reduced. The method comprises the steps that for a single training member, a sent aggregation value is selected jointly based on parameters to be synchronized uploaded by the training member and the aggregation value determined by a server, so that the local data characteristics of the training member are fully considered, the model training process is more personalized, and the effectiveness of the model is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of a system architecture for federal learning;
FIG. 2 is a schematic diagram of a specific implementation architecture under the technical concept of the present specification;
FIG. 3 is a schematic diagram illustrating the interaction flow between a server and a single training member in the process of jointly training a model according to one embodiment of the present disclosure;
FIG. 4 illustrates a flow diagram of a joint training model performed by a server in one embodiment of the present description;
FIG. 5 is a schematic flow diagram of a joint training model performed by a server according to another embodiment of the present disclosure;
FIG. 6 is a schematic block diagram of an apparatus for a server-side joint training model according to one embodiment of the present disclosure;
FIG. 7 is a schematic block diagram of an apparatus for a joint training model provided to training members according to another embodiment of the present disclosure.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
Federal Learning (Federated Learning), which may also be referred to as federal machine Learning, joint Learning, league Learning, and the like. Federal machine learning is a machine learning framework, and can effectively help a plurality of organizations to perform data use and machine learning modeling under the condition of meeting the requirements of user privacy protection, data safety and government regulations.
In particular, assuming that enterprise A and enterprise B each build a task model, individual tasks may be categorical or predictive, and these tasks have also been approved by the respective users when obtaining data. However, the models at each end may not be able to be built or may not work well due to incomplete data, such as lack of tag data for enterprise a, lack of user profile data for enterprise B, or insufficient data and insufficient sample size to build a good model. The problem to be solved by federal learning is how to build high-quality models on each end of a and B, the training of the models is used for the data of each enterprise, such as a and B, and the owned data of each enterprise is not known by other parties, namely, a common model is built without violating data privacy regulations. This common model is just like the optimal model that the parties aggregate the data together. In this way, the built model serves only the own targets in the area of each party.
The implementation architecture of federated learning is shown in FIG. 1. The various organizations for federal learning may be referred to as training members, which may also be data holders or data providers. Each training member can hold different business data, and can also participate in the joint training of the model through equipment, a computer, a server and the like. The service data may be various forms of data such as characters, pictures, voice, animation, video, and the like. Generally, the business data held by each training member has correlation, and the business party corresponding to each training member may also have correlation. For example, a plurality of banks relating to financial services are provided as business parties, and each business party can individually provide businesses such as savings and loan to a user, and can hold data such as the age, sex, income and expenditure lines, loan amount, and deposit amount of the user. For another example, a plurality of hospitals related to medical services are used as service parties, and each service party may use diagnosis records such as age, sex, symptoms, diagnosis results, treatment schemes, treatment results, and the like of a user as local service data.
Under this implementation architecture, the model may be trained jointly by two or more training members. The model can be used for processing business data to obtain various models of corresponding business processing results, and can also be called as a business model. What kind of service data is specifically processed and what kind of service processing result is obtained, which depends on actual requirements. For example, the service data may be data related to the user finance, and the obtained service processing result is a financial credit evaluation result of the user, and for example, the service data may be customer service dialogue data of the user, and the obtained service processing result is a recommendation result of a customer service answer, and the like. The service data may be in the form of various forms of data such as text, pictures, animation, audio, video, and the like. And each training member can utilize the trained service model to perform local service processing on local service data.
In the process of jointly training the business model, the service party can provide assistance for joint learning of each business party, for example, assistance in performing nonlinear calculation, comprehensive model parameter or gradient calculation and the like. Fig. 1 shows the service party in the form of a separate party, such as a trusted third party, that is provided independently of the individual training members. In practice, the service party may also be distributed to or composed of each training member, and joint auxiliary computation may be performed between each training member by using a secure computation protocol (e.g., secret sharing). This is not limited in this specification.
Referring to fig. 1, under the implementation framework of federal learning, a server may initialize a global model to distribute to each training member. Each training member can locally calculate the gradient of the model parameters according to the global model determined by the server side, and update the model parameters according to the gradient. The server may aggregate gradients of model parameters, or other parameters related to model parameters. The parameters needing to be aggregated by the service side can be collectively called as parameters to be synchronized, the numerical values corresponding to the parameters to be synchronized can be called as values to be synchronized, and the parameter values for aggregation and synchronization of the service side can be called as aggregation values. The service may feed back the aggregate value to the individual training members. And each training member updates the local model parameters according to the received aggregation values. And circulating in this way, and finally training the business model suitable for each business party. It is to be understood that if a model parameter to be adjusted in the model is referred to as a pending parameter, the parameter to be synchronized may be a parameter associated with the pending parameter, such as the pending parameter itself, a gradient of the pending parameter, a difference between the pending parameter and an initial value, and the like.
It is understood that federated learning can be divided into horizontal federated learning (feature alignment), vertical federated learning (sample alignment), and federated migrant learning. The implementation framework provided by the specification can be used for various federal learning frameworks, and is particularly suitable for horizontal federal learning, namely, each training member provides part of independent samples respectively, or sample data of each training member forms horizontal segmentation.
Under the condition that a plurality of training members participating in federal learning exist, data received by the server are increased in geometric multiple, communication blockage is easily caused, and the efficiency of integral training is seriously influenced. Therefore, in the multi-party federal learning process, the model is usually compressed, that is, the number of parameters uploaded to the service party by a single training member is compressed (i.e., thinned), so as to reduce the pressure of communication transmission. Federal learning generally performs model aggregation based on the assumption of data IID (identical and independent distributions). Specifically, the features of the training samples have the same distribution and are independent of each other. However, since the sample of the data owner is associated with one or more of the corresponding sample subject (e.g. user), the region where the sample subject is located, the time window of data acquisition, and the like, when performing the joint training, the data sets often have different feature distributions or label distributions, and the features are not independent of each other. This type of dataset is referred to as the Non-IID (identical and independent distribution) dataset. For example, banks have an association between income characteristics and deposit characteristics determined by a user, a bank may only have income characteristics of a certain user and no deposit characteristics (e.g., payroll is quickly transferred out to other bank deposits), and the bank or other banks have both income and deposit characteristics for another user. In the case of federal learning model aggregation based on the assumption of data IID, the federal model under the Non _ IID dataset may be poor in performance.
In order to adapt to federal learning under the Non _ IID data set, the specification provides a technical concept of jointly updating the model, and the model is aggregated and updated according to the data characteristics of each training member. Fig. 2 is a schematic diagram illustrating an interaction scenario of each training member with a service party (third party) under the technical concept of the present specification. As shown in fig. 2, assuming that the jointly trained business model is a multi-layer neural network (each layer of neural network is arranged from top to bottom or from bottom to top in fig. 2), the black solid points are considered as the nodes that are currently activated. It can be seen that, in the stage of updating the parameters to be synchronized using the local training data, the model nodes of the respective data parties may all be activated. Further, under the technical concept of the present specification, on one hand, the training member updates the parameter to be synchronized according to the local data characteristics and then uploads the parameter to be synchronized to the service provider as the to-be-synchronized value of the sparse parameter to be synchronized, as shown in the real point in fig. 2. On the other hand, after the service side aggregates the training members, the service side issues the aggregation values to be aggregated to the training members, or may be sparse aggregation values, and in combination with the sparse matrix uploaded by each training member, the service side individually issues the sparse matrix of the aggregation values to each training member. As shown in fig. 2, the activation nodes of the sparse matrix uploaded by a single data party and the sparse matrix issued by the service party may be the same or different, and are not limited herein.
The server side determines a single sparse matrix issued by a single training member in combination with a sparse matrix (with local data characteristics fused) uploaded by the training member. And training the members to synchronize partial parameters to be synchronized according to the sparse matrix issued by the server. Therefore, the communication data volume between the training members and the service side can be effectively reduced, the local model can be updated to adapt to the characteristics of local data, and the effectiveness of federal learning is improved.
The technical idea of the present specification is described in detail below.
Referring to fig. 3, an example of an interaction process of a single training member i with a service party in a federal learning (i.e., joint update model) process is shown. The process of jointly updating the model can be realized by a server and a plurality of training members in the federal learning, and the training member i can be any one of the training members participating in the federal learning. The service side can aggregate the parameters to be synchronized uploaded by each training member. The server here may be any device, platform or cluster of devices with computing, processing capabilities.
It is understood that there may be periods where multiple parameters are synchronized during the process of jointly updating the model. Initially, the server side may determine the global model and initialize the model parameters to be issued to each training member, or each training member may negotiate the model structure, each training member locally constructs the local model, and the server side initializes each model parameter. The service side may also preset the required hyper-parameters, which may include, for example, one or more of a waiting time T, a parameter compression ratio α, a parameter number m uploaded, a model parameter update step size, a total training cycle number, and the like. The model parameters (such as weight parameters, constant term parameters, etc.) in the model are parameters to be adjusted, i.e. parameters to be determined. As mentioned above, each pending parameter may correspond to a parameter to be synchronized. In each synchronization period, the training members determine the updated values of the parameters to be synchronized (namely the values to be synchronized) by using local training samples, upload the values to be synchronized to the server in a sparse mode, aggregate and synchronize the values by the server and feed back the values to the training members, and update the local parameters to be determined by the training members according to the aggregated values.
The process shown in fig. 3 is an example of a synchronization cycle in an embodiment, and an overall process of the joint update model is described in terms of interaction between a training member and a server in combination with any training member i in a plurality of training members. For convenience of description, the number of undetermined parameters in the model is assumed to be a positive integer M greater than 1.
First, in step 301, a training member i updates M parameters to be synchronized corresponding to a model by using a local training sample. It will be appreciated that the pending parameters in the model may be, for example: weight parameters for feature aggregation, excitation parameters for excitation layers, truncation parameters, and the like. The process of business model training is the process of determining the undetermined parameters. In the federal learning process, parameters to be synchronized among training members can be determined according to modes such as pre-negotiation and the like, and the parameters to be synchronized correspond to the parameters to be determined one by one.
In the updating period of the single undetermined parameter, each training member can update each undetermined parameter in the local model according to the local training sample. Training Member i reads local training sample XiN of one batchiStrip sample data biProceed to model YiForward propagation of, get biThe prediction labels corresponding to the training samples are recorded asAnd then based on the actual sample label yiAnd predictive taggingTo determine model loss Li. And then according to the model loss LiAnd adjusting each undetermined parameter. In an optional embodiment, various updating methods such as a gradient descent method, a newton method, and the like may be generally used for updating the undetermined parameter, so as to reduce the undetermined parameter through a gradient direction of the undetermined parameter, so that the undetermined parameter tends to an optimal value. In this case, the training member i can lose L according to the modeliUpdating the gradient of each parameter to be determined by means of a back-propagation algorithm, e.g. as a gradient matrix Gi 0. The updating of the respective pending parameter is performed on the basis of the corresponding gradient.
The single synchronization cycle may correspond to one update cycle of the pending parameter, or may correspond to a plurality of update cycles of the pending parameter. For example, the parameter synchronization is performed every n update periods, where the positive integer n ≧ 1, or the immediate update period is determined at predetermined time intervals. For a single training member, the parameter synchronization process for the parameters to be synchronized may correspond to a single update period. Assuming that the current synchronization period is t, that is, the t-th parameter synchronization process (assuming that parameter synchronization of N periods is performed in total, then t is 0, 1.. and N), a training member may determine the current parameter to be synchronized in the N × t update period, for example, denoted as Wi,t u。
In one embodiment, the parameter to be synchronized is a parameter to be determined, and in a corresponding single update cycle, a single training member may use the parameter to be determined updated according to the training samples of the current batch as a value to be synchronized of the parameter to be synchronized.
In another embodiment, the parameter to be synchronized is a gradient of the parameter to be determined, and a single training member may determine the gradient of the parameter to be determined as a value to be synchronized of the parameter to be synchronized before the parameter to be determined is updated according to the training samples of the current batch.
In another embodiment, the parameter to be synchronized may also be a difference between the parameter to be determined and an initial value thereof, and a single training member may compare the parameter to be determined updated according to the training samples of the current batch with the initial value initialized by the server, and the obtained difference is used as the value to be synchronized of the parameter to be synchronized.
In other embodiments, the parameter to be synchronized may also be other parameters according to different update modes of the parameter to be determined in the model, which is not described in detail herein. Aiming at the M undetermined parameters in the model, each training member can update the M undetermined parameters in the same mode to obtain corresponding M updated values.
Then, in step 302, the training member i selects M from M parameters to be synchronizediA parameter to be synchronized to convert the corresponding miAnd uploading the value to be synchronized to a server. It can be understood that under the technical idea of the present specification, each training member needs to compress the data volume of the uploading service. The training member i can compress the number of the parameters to be synchronized from M to MiAnd (4) respectively. m isiCan be much smaller than M, e.g. M is 10 ten thousand, MiIs 1000. The compression of the parameters to be synchronized can be performed by means of pruning, thinning and the like.
In one embodiment, miCan be determined by the number of samples associated with the training member i, such as a preset fixed value, or a fixed value positively correlated with the number of samples M of the training member i. A fixed value of (2). For example, M/N (where N is the number of training members), or other fixed value that is inversely related to the total sample size of each training member, positively related to the sample size held by training member i, and so forth. At this time, at eachThe synchronization period is that the training member i can upload m to the serveriThe value to be synchronized of each parameter to be synchronized.
In one embodiment, miCan be controlled by a preset uploading proportionality coefficient alphaiAnd (4) determining. For example mi=αiX M, or αi×|M|0Wherein | M | R0And determining the zero-order norm of the current updating value for the training member i aiming at the M parameters to be synchronized, wherein the value of the zero-order norm is M. Since the parameters to be synchronized tend to converge with the iteration of the synchronization period, a single training member may upload the parameters to be synchronized to the server according to a smaller upload proportion, in an alternative embodiment, the upload proportion coefficient αiThe reduction may be in the form of linearity, exponential, etc. If the attenuation coefficient rho is preset, the number m of uploaded parameters to be synchronized in the current period tiIs a decreasing function (e.g., exponential function, trigonometric function, etc.) of the attenuation coefficient p. Taking the decreasing function as an exponential function, e.g. mi=α×ρtX M, where p is a number less than 1, e.g. 0.95, then p raised to the power of t, p, with increasing number of cycles ttGradually decreases.
In other embodiments, the training member i may also determine the number m of values to be synchronized uploaded to the server via other reasonable waysiTherefore, the description is omitted. It is noted that when the result obtained by the above calculation method is non-integer, the form of rounding up or rounding down can be adopted as miAnd (6) taking the whole. Similarly, other training members can determine the number of the parameters to be synchronized to be uploaded locally in a similar manner.
Training member i to select M from M parameters to be synchronizediDuring the synchronization, the selection may be performed in a random selection manner, or may be performed in an order from a large absolute value to a small absolute value of an update value, or may be performed continuously according to a preset initial position (e.g., ith parameter of a layer 2 neural network), or may be performed according to a parameter to be synchronized, which is specified in advance for the training member i, or may be performed in a combination of the above manners, which is not limited herein. In determining miAfter each parameter to be synchronized, training member i may obtain m determined in step 301iAnd uploading the updated value serving as the value to be synchronized to the server.
For selected miThe training member i can upload the parameters to the server side after marking the parameters through the unique identifier, and can also upload the parameters to the server side in a parameter matrix form, which is not limited in the specification. For example, the parameter to be synchronized marked by the unique identification is as (w)jk)iWherein (w)jk)iAnd representing the identification of the parameter to be synchronized corresponding to the kth parameter of the j-th layer of neural network of the training member i, wherein the value to be synchronized represents the corresponding value to be synchronized. On the other hand, in matrix form, miThe local sparse matrix corresponding to each value to be synchronized is recorded asThen the local sparse matrixM andim is the position corresponding to each parameter to be synchronizediOne value to be synchronized, the rest (M-M)i) Positions are all 0. When uploading the values to be synchronized in the form of a matrix, the data is transmitted in the form of, for example, [ j, k, ] for a single value to be synchronized]And representing the value to be synchronized of the model parameter corresponding to the jth row and kth column in the parameter matrix of the business model. In other words, the upload manner is index (index) + value (value), (j, k) is index and value. Thus, when the parameters to be synchronized in the form of matrix are uploaded to the server side, the uploaded value is miFurthermore, rows and columns can be defined in a numerical form corresponding to fewer bytes by shaping int and the like, so that the extra data volume is reduced when data is uploaded.
It is understood that pruning, sparsifying models, etc. generally select parameters with larger absolute values. The parameter with a larger absolute value usually has a larger influence on the result, and is the parameter determined in the current period and temporarily in the higher importance. Taking the TopK sparse model method as an example, if the model parameters are described in a matrix form, the training member i can be based on the locally updated parameter matrix to be synchronizedM is selected in which the absolute value is the largesti=KiSetting the value of each element to be 1 and setting the other values to be 0 at the corresponding position of the mapping matrix with the same dimension as the parameter matrix to be synchronized to obtain the current local sparse position matrix M of the training member ii,t. Local sparse matrixMay be a sparse position matrix Mi,tParameter matrix to be synchronizedThe product of (a):the server i can exchangeAnd uploading the data to a server. It can be appreciated that the local sparse matrixIn selected miThe element of each position is the current value to be synchronized, and the rest positions are 0, so that the value is uploadedOnly m can be uploaded in the processiElements of individual positions.
In an optional implementation manner, before uploading the to-be-synchronized parameters, a single training member may add disturbance meeting the difference privacy to the local to-be-synchronized parameters to protect the local data privacy. For example, the disturbed data satisfying the standard gaussian distribution with mean 0 and variance 1 can be added to the parameter to be synchronized through the gaussian mechanism of differential privacy, so as to form the disturbed data to be synchronized. In the case where the data to be synchronized is represented in a matrix form, the added disturbance data may be a disturbance matrix satisfying a predetermined gaussian distribution. Wherein a single training member i can select miAfter the parameter to be synchronized is selected, adding disturbance to the selected parameter to be synchronized, or selecting miAdding disturbance data before each parameter to be synchronized, and selecting M parameters to be synchronized after adding disturbance data according to the rule described in the foregoingiAnd (5) a parameter to be synchronized. This is not a limitation of the present specification. In addition, the noise added by the training member i on the parameter to be synchronized can also satisfy an exponential mechanism, a Laplace (Laplace) mechanism and the like.
And 303, aggregating the to-be-synchronized values uploaded by each training member by the server to obtain M aggregated values corresponding to the M to-be-synchronized parameters respectively. In the aggregation process of the to-be-synchronized values, the aggregation of the to-be-synchronized values of the to-be-synchronized parameters is independent. That is to say, for a single parameter to be synchronized, assuming that s training members feed back s corresponding values to be synchronized to the server, the aggregate value corresponding to the single parameter to be synchronized is the aggregate value of the s updated values.
The aggregation of the to-be-synchronized values of the single to-be-synchronized parameter may be performed by at least one of weighted summation, averaging, median taking, maximum taking, minimum taking, and the like. For example, for the parameter w to be determinedjkIf there are s training members feeding back the value to be synchronized, the aggregate value may beThe summation here is done for the training members, e.g. i takes from 1 to s. Under the condition that each training member uploads each local sparse matrix, the server side can also aggregate corresponding values to be synchronized through corresponding elements of the matrix, for example, an aggregation matrix formed by aggregation values in an averaging mode isThe matrix division here uses element-wise operation, i.e. the first element (e.g. the first row and the first column) of the matrix as dividend is divided by the first element of the matrix as divisor to obtain the matrix W as quotients,tThe first element of (1).
Particularly, it should be noted that when each training member feeds back parameters to be synchronized to the server, under the condition that there is no mutual agreement or advance negotiation, there may be a situation that none of the parameters to be synchronized corresponding to a plurality of model parameters passes back in the current synchronization period t, that is, the total amount of training samples in the current period corresponding to a single parameter to be synchronized is 0. Thus, in the calculation process, 0 is used as the denominator, and an error value (e.g., NON) is obtained. At this time, the aggregation value in the synchronization parameter set may be determined according to the actual situation. For example, the previous aggregation value may be used as the aggregation value of the current cycle. For another example, when the parameter to be synchronized is the parameter to be determined itself, a special flag may be set so that the values of the corresponding parameter to be synchronized are not synchronized. When the parameter to be synchronized is a gradient value, the value 0 can also be used as the aggregate value of the corresponding parameter to be synchronized in the current update period.
Step 304, the server feeds back the synchronization parameter set W to the training member i according to the M aggregation valuesi. In order to further reduce the traffic during the model updating process, the server may also feed back the aggregation value of part of the parameters to be synchronized to the training members. In consideration of the personalized characteristics of the local training data of each training member, the server can respectively determine different synchronization parameter sets for each training member.
Therefore, the server side can determine the sparse aggregation value issued by the server side based on the value to be synchronized uploaded by the training member i and the M aggregation values. For example, the aggregation value sent by the server is consistent with the parameter to be synchronized corresponding to the value to be synchronized uploaded by the training member i. Under the condition that the value to be synchronized uploaded by the training member i is described in a matrix form, the server side can share the local sparse matrixAnd the aggregation value of each parameter to be synchronized corresponding to the medium nonzero element is sent to the training member i.
In order to consider the characteristics of global data more comprehensively, in a possible implementation manner, a server may first determine the sparsification of M aggregation values corresponding to a global modelAnd (6) obtaining the result. The sparsification result may be noted as a global sparse value set, for exampleWhich, if described in matrix form, may be referred to as a global sparse matrixThe thinning result of the M aggregation values can be performed by pruning, thinning models and the like. In this way, the global sparse value set of the server side can select the relatively important parameter to be synchronized in the global model. Taking TOP K sparsification method as an example, K with the largest absolute value in M aggregation values can be takensAnd constructing a global sparse value setIn the matrix situation, the global sparse matrixIn the selected KsEach element being corresponding to KsThe other positions are 0.
Then, a global sparse value set may be utilizedAnd local sparse value sets of individual training members (e.g.) Together determining a respective set of synchronization parameters. Using training member i as an example, a global sparse value set may be utilizedAnd local sparse value setDetermine together the corresponding synchronization parameter set, denoted Wi. In particular, a global sparse value set may be utilizedAnd local sparse value setDetermining the corresponding synchronization parameter set W by at least one item in the intersection and the union of the corresponding parameters to be synchronizedi. The following description of the synchronization parameter set W takes the above value set as a matrix form as an exampleiThe determination process of (1).
Utilizing global sparse matricesAnd local sparse matrixThe non-zero element positions in (b) may determine the non-zero element positions in the synchronization parameter set, e.g. the global sparse position matrix M is determined separatelys,tAnd a local sparse position matrix Mi,tReuse of the global sparse position matrix Ms,tAnd a local sparse position matrix Mi,tDetermining a synchronization parameter matrixSparse position matrix of (1), asWherein the content of the first and second substances,the number of non-zero elements in (1) is recorded as niRepresents the service direction to train member feedback niAn aggregate value. Which aggregate values are fed back specificallyThe non-zero element position determination in (1). n isiIs usually less than M, which is related to MiMay be equal or unequal. Single synchronization parameter matrixCan be regarded asA sparse matrix of aggregation matrices of aggregated values. At this time, the process of the present invention,the element corresponding to the non-zero element position in (b) may be set to 1, which is used to describe a sparse position, and further, a sparse matrix corresponding to the synchronization parameter set corresponding to the training member i is obtained The set of synchronization parameters W, which can be seen in matrix formi,The non-zero element position in (1) corresponds to the synchronization parameter determined by the server for the training member i in the current updating period.
According to one possible design, a sparse position matrix M may be utilizedi,tAnd Ms,tDetermines the synchronization parameter set WiSparse position matrix inFor example, Mi,tAnd Ms,tThe union of the positions of the non-zero elements beingCan be formed asDetermined as a sparse position matrixOr fromIs obtained by randomly selecting a predetermined number of non-zero element positionsNon-zero element positions.
In an alternative implementation, the slaveIs obtained by randomly selecting a predetermined number of non-zero element positionsThe importance of each position may also be considered in the non-zero element position process in (1). For example, for sparse position matrix Mi,tAnd Ms,tThe intersection positions of the non-zero element positions in (b) use a greater selection probability. This is because Mi,tAnd Ms,tThe positions where all the non-zero elements are located may be more important positions relative to the local model of the corresponding training member and the global model of the server, and the intersection positions are still non-zero elements and help to transfer more important parameters. The matrix of the assumed intersection positions isCircle rideRepresenting multiplication of corresponding elements, e.g. Mi,tElement of first row and first column and Ms,tMultiplication of elements of the first row and the first columnThe first row and the first column. Further, can makeNeutralization ofHas a greater probability of selection (e.g., 0.7) for the corresponding position of the non-zero element(s), andhave a smaller choice in other positionsProbability (e.g., 0.3) based onA predetermined number of non-zero element positions are selected according to the probability.
According to another possible design, a global sparse matrix may also be utilizedAnd local sparse matrixOf the precedingAndcan be based onAndmay also be based on the corresponding sparse position matrix Mi,tAnd Ms,tIs determined by the comparison of (a). With Mi,tAnd Ms,tComparative example of (D), Mi,tAnd Ms,tThe correlation of (c) can be described by a correlation coefficient determined by a correlation distance such as the euler distance, cosine distance, manhattan distance, Pearson (Pearson) similarity, jaccard (jaccard) similarity, hamming (hamming) distance, and the like, as denoted by dist (M)i,t,Ms,t). Those skilled in the art will understand that the correlation distance may have a different range according to the description, for example, the euler distance may have a range of 0 to 2, the cosine distance may have a range of 0 to 1, the manhattan distance may have a range of 0 to 2k (2k is the sum of the non-zero elements of the two matrices), and so on. In order to make the correlation coefficient (as beta)i,t) Keeping the interval between 0 and 1, the correlation distance can be adjustedAs a correlation coefficient. E.g. coefficient of correlation beta at euler distancei,tIs dist (M)i,t,Ms,t) Correlation coefficient beta at/2, Manhattan distancei,tIs dist (M)i,t,Ms,t) /2k, etc. Direct use of global sparse matricesAnd local sparse matrixWhen determining the correlation coefficient, it is not necessary to use a corresponding sparse position matrix, and the correlation coefficient determining method is similar to that of the sparse position matrix, and is not described herein again.
In an alternative implementation, the correlation coefficient may be determined from Mi,tAnd Ms,tMatrix of intersection positionsTo select a first number N1Is not zero, e.g. as a first position matrixFrom Mi,tAboutNon-zero element complement matrix ofOf a second number N2Is not zero, e.g. as a second position matrixFrom Ms,tAboutOf the non-zero element complement matrixTo select a third number N3Is not zero, e.g. is denoted as a third position matrixThe first number, the second number, and the third number may be determined based on the above correlation coefficients.
In an alternative implementation, the method may be implemented in a computer systemThe number of the non-zero elements is recorded as ki,tIt can be understood that ki,tIs shown at Mi,tAnd Ms,tAll correspond to the number of positions of the non-zero element, then the first number N1=ki,t. The second quantity may be positively correlated with a correlation coefficient, e.g. N2=βi,t(k-ki,t) The third number may be inversely related to the correlation coefficient, e.g. N3=(1-βi,t)(k-ki,t). Where k is the number of non-zero elements in the global sparse matrix or the global sparse position matrix, and k may be a predetermined value or according to a predetermined sparse truncation value (for example, a numerical position in the global sparse matrix that is greater than the truncation value corresponds to a position of a non-zero element in the sparse matrix). Thus, the global sparse matrix for member i is trainedAnd local sparse matrixThe higher the correlation is, the higher the proportion of non-zero element positions of the synchronization parameter set determined according to the non-zero elements in the local sparse matrix of the training member is selected, and the more important parameter values to be synchronized uploaded by the training member tend to be fed back to the training member.
Further, the server may determine the first number N1A second number N2A third number N3Selecting a corresponding number of non-zero element positions from the correlation position matrixTo a corresponding first position matrixSecond position matrixThird position matrixThus, the final sparse position matrix determined for the training member i in the current update cycle
In another alternative implementation, the first number N1May be 0, a second number N2May be betai,tk, third number N3May be (1-. beta.)i,t) k, thus, can be selected fromIn selecting betai,tk elements are obtainedIn thatSelection of (1-. beta.)i,t) k elements are obtainedAnd then to
In yet another alternative implementation, the first number N1May be 0, the second number N2Can be combined withIn a non-zero amount, e.g. orderA third number N3May be (1-. beta.)i,t) k, thus, can be selected fromIn selecting betai,tk elements are obtainedIn thatSelection of (1-. beta.)i,t) k elements are obtainedAnd then to
In other optional implementation manners, the service side may also determine the synchronization parameter set of the current update period for each training member through other reasonable manners, which is not described herein again. In the above description, a sparse matrix is compared with its corresponding sparse position matrix, and the positions of the non-zero elements of the two are consistent, except that the non-zero elements of the sparse position matrix may be a predetermined value (e.g. 1), and the non-zero elements of the sparse matrix are the corresponding real element values of the matrix before sparsification. It is worth noting that the aggregate value set W of the global models,tGlobal sparse value setLocal sparse value setInstead of being described in the form of a matrix, the individual values of the flags may be identified by the parameters of the parameters to be determined, in which case the synchronization parameter set WiThe determination principle of (2) is similar to that in the form of a matrix, except that the intersection, union, etc. of the parameter identifications can be determined by the parameter identifications without judging via a position matrixAnd will not be described herein.
Then, the service side may feed back each synchronization parameter set corresponding to each training member to the corresponding training member. Wherein, the synchronization parameter set for the feedback of the training member i may be Wi. In an alternative embodiment, the synchronization parameter set WiMay be by means of a sparse matrixAnd (4) showing.
In step 305, training member i utilizes synchronization parameter set WiAnd updating the undetermined parameters in the local model so as to update the local model. It is understood that the synchronization parameter set fed back by the service party can be updated by each training member.
For example, in the case that the parameters to be synchronized are model parameters, the respective aggregation values in the synchronization parameter set are used to replace the corresponding parameters to be determined of the local model one by one. And under the condition that the parameter to be synchronized is the gradient of the parameter to be determined, updating the local corresponding parameter to be determined one by one according to the corresponding step length by using a gradient descent method, a Newton method and the like and using each aggregation value in the synchronization parameter set. Described in a matrix manner, the updating process of the parameters to be synchronized in the training member i is, for exampleWherein the content of the first and second substances,based on the synchronization parameter set WiA determined sparse position matrix. It can be seen that the training member is local to only the sparse matrix WiThe elements in (1) are replaced.
In this way, each training member selectively synchronizes the local parameters with other training members via the server. Therefore, the communication data volume is saved, the correlation between the global model and the local model is fully considered, and an effective training scheme is provided for the federal study with large data volume.
The training members can jointly train the model through a plurality of iterations of the synchronization period shown in fig. 2 with the assistance of the server. The iteration end condition of the joint training model may be: the parameters to be synchronized tend to converge, the model loss tends to converge, the iteration period reaches a predetermined period, and the like. Wherein convergence may be understood as the amount of change being smaller than a predetermined threshold.
The model update flow of one embodiment of the present specification is described above in connection with the schematic diagram of FIG. 3 from the interaction perspective of a training member and a server. FIG. 4 shows a flow diagram of a server and a plurality of training members jointly updating a model described from the perspective of the server.
As shown in fig. 4, from the perspective of the server, the process of jointly updating the model includes:
FIG. 5 shows a flow diagram of a server and a plurality of training members jointly updating a model described from the perspective of the training members. As shown in FIG. 5, from the perspective of training member i, the process of jointly updating the model includes:
It can be understood that fig. 4 and fig. 5 are specific examples of the flow executed by the server and the training member i in fig. 3 in a single synchronization cycle, respectively, and therefore, the corresponding description of the execution flow of the relevant party in fig. 3 is also applicable to fig. 4 and fig. 5, and is not repeated herein.
Reviewing the process, in a single synchronization period of the joint training model, after each training member updates M parameters to be synchronized by using a local training sample, each training member selects part of the updated values of the parameters to be synchronized to upload to the server, and the server selects part of the aggregated values to issue, so that the communication pressure between the training members and the server is reduced, the calculation amount of the server can be reduced, and the learning efficiency is improved. The server side aggregates the updated values of the parameters to be synchronized by each training member to obtain M aggregated values, and then determines a corresponding synchronization parameter set according to the personalized characteristics of each training member. The data characteristics of the local samples of the training members are fully considered, so that the federal learning is more pertinent and personalized.
According to an embodiment of another aspect, a system for federated update models is provided that includes a server and a plurality of training members. Assuming that the training member i is any training member of the plurality of training members and is recorded as the training member i, the training member i and the service side may be respectively provided with corresponding devices for jointly updating the model, so as to complete federal learning in a matching manner.
FIG. 6 illustrates an apparatus 600 for a federated update model hosted by a server. As shown in fig. 6, the apparatus 600 includes:
a receiving unit 61 configured to receive the to-be-synchronized values sent by each training member, where the number of the to-be-synchronized values sent by a single training member i is mi,miThe value to be synchronized is M of M parameters to be synchronized updated by using a local training sampleiUpdating values of the parameters to be synchronized, wherein the parameters to be synchronized correspond to the parameters to be determined of the model one by one;
the aggregation unit 62 is configured to aggregate the to-be-synchronized values uploaded by each training member to obtain M aggregation values corresponding to the M to-be-synchronized parameters, respectively;
a feedback unit 63 configured to respectively feed back each synchronization parameter set for each training member according to the M aggregation values, so that each training member updates the undetermined parameter in the local model by using the corresponding synchronization parameter set, thereby updating the local model, where the synchronization parameter set W corresponding to a single training member iiThere is a correspondence of niA polymerization value, niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediAnd determining the value to be synchronized and the M aggregation values together.
FIG. 7 illustrates an apparatus 700 for a joint update model provided to any one of the training members. As shown in fig. 7, the apparatus 700 includes:
the training unit 71 is configured to update M to-be-synchronized parameters corresponding to the model by using a local training sample, wherein each to-be-synchronized parameter corresponds to each to-be-determined parameter of the model one by one;
an uploading unit 72 configured to select M from the M parameters to be synchronizediThe parameter to be synchronized corresponds to miUploading the value to be synchronized to the server side for the server side to feed back the synchronization parameter set WiWherein the synchronization parameter set WiThere is a correspondence of niA polymerization value, niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediDetermining the value to be synchronized and the M aggregation values together;
a synchronization unit 73 configured to utilizeStep parameter set WiAnd updating the undetermined parameters in the local model so as to update the local model.
It should be noted that the apparatuses 600 and 700 shown in fig. 6 and fig. 7 correspond to the method embodiments shown in fig. 4 and fig. 5, respectively, and may be applied to the server and the single training member in the method embodiment shown in fig. 3, respectively, to cooperate with the training member to complete the process of jointly updating the business model in fig. 3. Therefore, the description related to the service party and the single training member in fig. 3 can be applied to the apparatuses 600 and 700 shown in fig. 6 and fig. 7, and will not be described herein again.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 4 or fig. 5 or the like.
According to an embodiment of still another aspect, there is also provided a computing device including a memory and a processor, the memory having stored therein executable code, the processor implementing the method described in conjunction with fig. 4 or fig. 5, and so on, when executing the executable code.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of this specification may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments are intended to explain the technical idea, technical solutions and advantages of the present specification in further detail, and it should be understood that the above-mentioned embodiments are merely specific embodiments of the technical idea of the present specification, and are not intended to limit the scope of the technical idea of the present specification, and any modification, equivalent replacement, improvement, etc. made on the basis of the technical solutions of the embodiments of the present specification should be included in the scope of the technical idea of the present specification.
Claims (24)
1. A method for jointly updating a model is applied to a process of jointly updating the model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, and the method comprises the following steps:
each training member updates M parameters to be synchronized corresponding to the model by using a local training sample, and each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;
each training member selects a plurality of parameters to be synchronized from M parameters to be synchronized and uploads the parameters to the corresponding server, wherein the number of the parameters to be synchronized selected by a single training member i is Mi;
The server side aggregates the values to be synchronized uploaded by each training member to obtain M aggregated values corresponding to M parameters to be synchronized respectively;
the server side feeds back each synchronization parameter set for each training member according to the M aggregation values, wherein the synchronization parameter set W corresponding to a single training member iiThere is a correspondence of niAn aggregate value of niN corresponding to each aggregate valueiM via which the parameter to be synchronized is uploadediDetermining the values to be synchronized and the M aggregation values together;
and each training member updates the undetermined parameters in the local model by using the corresponding synchronous parameter set, so that the local model is updated.
2. The method of claim 1, wherein the single parameter to be synchronized is one of a single pending parameter, a gradient of the single pending parameter, a difference between a current value of the single pending parameter and an initial value.
3. The method of claim 1, wherein the pending parameters of the local model of each training member are uniformly initialized by the server, and a horizontal split is formed between the local training samples of each training member.
4. The method of claim 1, wherein a single trainerNumber m of parameters to be synchronized selected by training member iiAccording to the product of the predetermined local activation ratio and the number M of the parameters to be synchronized.
5. The method of claim 1, wherein the number m of uploaded parameters to be synchronized is determined by a single training member i through at least one of pruning and sparsifying a modeli。
6. The method of claim 1, wherein the service side aggregates the to-be-synchronized values of the single to-be-synchronized parameter by at least one of weighted summation, averaging, median taking, maximum taking and minimum taking of the to-be-synchronized values uploaded by the training members with respect to the single to-be-synchronized parameter value.
7. The method of claim 1, wherein i, m is for a training memberiDescribing local sparse value set by each value to be synchronizedThe M aggregation values describe an aggregation value set W of the global models,tThe server determines the corresponding synchronization parameter set W in the following wayi:
Aggregation value set W for global models,tCarrying out sparsification to obtain a global sparse value set
8. The method of claim 7, wherein the aggregate value set of the global model is described by a matrix, and the local sparse value set and the global sparse value set are described by a local sparse matrix and a global sparse matrix, respectively.
9. The method of claim 8, wherein the local sparse value set based isAnd a global sparse value setDetermining a corresponding synchronization parameter set WiThe method comprises the following steps:
separately detecting local sparse matricesAnd a global sparse matrixObtaining a local sparse position matrix M by using the non-zero element positioni,tAnd a global sparse position matrix Ms,t;
Based on sparse position matrix Mi,tAnd Ms,tIs a union of non-zero element positionsDetermining a sparse position matrix corresponding to a synchronization parameter set
10. The method of claim 9, wherein the sparse position matrixThe non-zero element positions in (1) are:
11. The method of claim 8, wherein the local sparse value set-basedAnd a global sparse value setDetermining a corresponding synchronization parameter set WiThe method comprises the following steps:
Contrasting local sparse matricesAnd a global sparse matrixTo obtain a correlation coefficient betai,t;
Based on the correlation coefficient betai,tDetermining a synchronization parameter set WiCorresponding sparse position matrix
12. The method of claim 11, wherein the local sparse matrices are comparedAnd a global sparse matrixTo obtain a correlation coefficient betai,tThe method comprises the following steps:
detecting local sparse matricesAnd a global sparse matrixThe correlation distance is a local sparse matrixAnd a global sparse matrixOr a local sparse position matrix Mi,tAnd a global sparse position matrix Ms,tOne of the euler distance, cosine distance, manhattan distance, pearson similarity, jaccard similarity, and hamming distance;
determining the correlation coefficient beta according to the normalization result of the correlation distancei,t。
13. The method of claim 11, wherein the correlation coefficient β is based oni,tDetermining a synchronization parameter set WiCorresponding sparse position matrixThe method comprises the following steps:
from a local sparse position matrix Mi,tAnd a global sparse position matrix Ms,tMatrix of intersection positionsIs selected to be a first number N1To obtain a first position matrix
From Mi,tAboutNon-zero element complement matrix ofOf a second number N2To obtain a second position matrix
From Ms,tAboutNon-zero element complement matrix ofTo select a third number N3To obtain a third position matrix
15. The method of claim 14, wherein the first number N1Said second number N2The third number N3The sum being a predetermined value k, said second number N2The third number N3Are all equal to the predetermined valuek and ki,tThe difference of (a) is positively correlated.
16. The method of claim 13, wherein the first number N1Is 0, the second number N2Coefficient of correlation betai,tPositive correlation, the third number N3Correlation coefficient betai,tAnd (4) carrying out negative correlation.
18. The method of claim 1, wherein the training members each update the pending parameters in the local model with a respective set of synchronization parameters, such that updating the local model comprises:
a single training member i utilizes a corresponding set of synchronization parameters WiReplacing the value to be synchronized in the local parameter to be synchronized with each aggregation value in the local parameter to be synchronized;
and updating the local model by using the updated parameter to be synchronized.
19. A method for jointly updating a model, which is applied to a process of jointly updating the model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, and the method is executed by the server and comprises the following steps:
receiving the value to be synchronized sent by each training member, wherein the number of the value to be synchronized sent by a single training member i is mi,miThe value to be synchronized is M of M parameters to be synchronized updated by using a local training sampleiUpdating values of the parameters to be synchronized, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;
aggregating the values to be synchronized uploaded by each training member to obtain M aggregated values corresponding to M parameters to be synchronized respectively;
feeding back each synchronization parameter set for each training member according to the M aggregation values respectively so that each training member updates undetermined parameters in the local model by using the corresponding synchronization parameter set respectively, and accordingly updating the local model, wherein the synchronization parameter set W corresponding to a single training member iiThere is a correspondence of niAn aggregate value of niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediAnd determining the value to be synchronized and the M aggregation values together.
20. A method for jointly updating a model is applied to a process of jointly updating the model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, and the method is suitable for training a member i and comprises the following steps:
updating M parameters to be synchronized corresponding to the model by using a local training sample, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;
selecting M from M parameters to be synchronizediEach parameter to be synchronized is corresponding to miUploading the value to be synchronized to the server side for the server side to feed back the synchronization parameter set WiWherein the synchronization parameter set WiCorresponding to niAn aggregate value of niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediDetermining the value to be synchronized and the M aggregation values together;
using synchronization parameter set WiAnd updating the undetermined parameters in the local model so as to update the local model.
21. The utility model provides a device of joint update model, is applied to the process that server and a plurality of training members jointly updated the model, and wherein, the local model of each training member is unanimous with the global model structure that server held, the device is located the server side, includes:
a receiving unit configured to receive the to-be-synchronized values sent by each training member, wherein the number of the to-be-synchronized values sent by a single training member i is mi,miThe value to be synchronized is M of M parameters to be synchronized updated by using a local training sampleiUpdating values of the parameters to be synchronized, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;
the aggregation unit is configured to aggregate the to-be-synchronized values uploaded by each training member to obtain M aggregation values corresponding to M to-be-synchronized parameters respectively;
a feedback unit configured to feed back each synchronization parameter set for each training member according to the M aggregation values, so that each training member updates the undetermined parameter in the local model by using the corresponding synchronization parameter set, thereby updating the local model, wherein the synchronization parameter set W corresponding to a single training member iiCorresponding to niAn aggregate value of niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediAnd determining the value to be synchronized and the M aggregation values together.
22. The device for jointly updating the model is applied to the process of jointly updating the model by a server and a plurality of training members, wherein the local model of each training member is consistent with the global model structure held by the server, the device is arranged on a training member i, and the device comprises:
the training unit is configured to update M parameters to be synchronized corresponding to the model by using a local training sample, and each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;
an uploading unit configured to determine M corresponding to the M parameters to be synchronizediM of a parameter to be synchronizediUploading the value to be synchronized to the server side for the server side to feed back the synchronization parameter set WiWherein the parameter set W is synchronizediThere is a correspondence of niAn aggregate value of niN corresponding to each aggregate valueiM via which the parameter to be synchronized is uploadediA value to be synchronized and the M aggregated valuesJointly determining;
a synchronization unit configured to utilize a synchronization parameter set WiAnd updating the undetermined parameters in the local model, thereby updating the local model.
23. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of claim 19 or 20.
24. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of claim 19 or 20.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210380007.5A CN114676838A (en) | 2022-04-12 | 2022-04-12 | Method and device for jointly updating model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210380007.5A CN114676838A (en) | 2022-04-12 | 2022-04-12 | Method and device for jointly updating model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114676838A true CN114676838A (en) | 2022-06-28 |
Family
ID=82078238
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210380007.5A Pending CN114676838A (en) | 2022-04-12 | 2022-04-12 | Method and device for jointly updating model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114676838A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115186937A (en) * | 2022-09-09 | 2022-10-14 | 闪捷信息科技有限公司 | Prediction model training and data prediction method and device based on multi-party data cooperation |
CN115909746A (en) * | 2023-01-04 | 2023-04-04 | 中南大学 | Traffic flow prediction method, system and medium based on federal learning |
CN116935143A (en) * | 2023-08-16 | 2023-10-24 | 中国人民解放军总医院 | DFU medical image classification method and system based on personalized federal learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111898767A (en) * | 2020-08-06 | 2020-11-06 | 深圳前海微众银行股份有限公司 | Data processing method, device, equipment and medium |
CN112288100A (en) * | 2020-12-29 | 2021-01-29 | 支付宝(杭州)信息技术有限公司 | Method, system and device for updating model parameters based on federal learning |
CN113221105A (en) * | 2021-06-07 | 2021-08-06 | 南开大学 | Robustness federated learning algorithm based on partial parameter aggregation |
CN113360514A (en) * | 2021-07-02 | 2021-09-07 | 支付宝(杭州)信息技术有限公司 | Method, device and system for jointly updating model |
CN113377797A (en) * | 2021-07-02 | 2021-09-10 | 支付宝(杭州)信息技术有限公司 | Method, device and system for jointly updating model |
CN113469367A (en) * | 2021-05-25 | 2021-10-01 | 华为技术有限公司 | Method, device and system for federated learning |
-
2022
- 2022-04-12 CN CN202210380007.5A patent/CN114676838A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111898767A (en) * | 2020-08-06 | 2020-11-06 | 深圳前海微众银行股份有限公司 | Data processing method, device, equipment and medium |
CN112288100A (en) * | 2020-12-29 | 2021-01-29 | 支付宝(杭州)信息技术有限公司 | Method, system and device for updating model parameters based on federal learning |
CN113469367A (en) * | 2021-05-25 | 2021-10-01 | 华为技术有限公司 | Method, device and system for federated learning |
CN113221105A (en) * | 2021-06-07 | 2021-08-06 | 南开大学 | Robustness federated learning algorithm based on partial parameter aggregation |
CN113360514A (en) * | 2021-07-02 | 2021-09-07 | 支付宝(杭州)信息技术有限公司 | Method, device and system for jointly updating model |
CN113377797A (en) * | 2021-07-02 | 2021-09-10 | 支付宝(杭州)信息技术有限公司 | Method, device and system for jointly updating model |
Non-Patent Citations (2)
Title |
---|
IREM ERGÜN: "SPARSIFIED SECURE AGGREGATION FOR PRIVACY-PRESERVING FEDERATED LEARNING", 《ARXIV:2112.12872V1》, 23 December 2021 (2021-12-23), pages 1 - 28 * |
毛耀如: "针对分布式联邦深度学习的攻击模型及隐私对策研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》, 30 April 2021 (2021-04-30), pages 138 - 114 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115186937A (en) * | 2022-09-09 | 2022-10-14 | 闪捷信息科技有限公司 | Prediction model training and data prediction method and device based on multi-party data cooperation |
CN115909746A (en) * | 2023-01-04 | 2023-04-04 | 中南大学 | Traffic flow prediction method, system and medium based on federal learning |
CN116935143A (en) * | 2023-08-16 | 2023-10-24 | 中国人民解放军总医院 | DFU medical image classification method and system based on personalized federal learning |
CN116935143B (en) * | 2023-08-16 | 2024-05-07 | 中国人民解放军总医院 | DFU medical image classification method and system based on personalized federal learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhu et al. | Federated learning on non-IID data: A survey | |
US11170395B2 (en) | Digital banking platform and architecture | |
CN111931950B (en) | Method and system for updating model parameters based on federal learning | |
CN110084377B (en) | Method and device for constructing decision tree | |
CN114676838A (en) | Method and device for jointly updating model | |
CN113377797B (en) | Method, device and system for jointly updating model | |
CN112085159B (en) | User tag data prediction system, method and device and electronic equipment | |
US11238364B2 (en) | Learning from distributed data | |
CN112799708B (en) | Method and system for jointly updating business model | |
US11410644B2 (en) | Generating training datasets for a supervised learning topic model from outputs of a discovery topic model | |
WO2023174036A1 (en) | Federated learning model training method, electronic device and storage medium | |
CN112068866B (en) | Method and device for updating business model | |
CN111460528A (en) | Multi-party combined training method and system based on Adam optimization algorithm | |
CN113360514B (en) | Method, device and system for jointly updating model | |
US11843587B2 (en) | Systems and methods for tree-based model inference using multi-party computation | |
US20210342744A1 (en) | Recommendation method and system and method and system for improving a machine learning system | |
WO2024114640A1 (en) | User portrait-based user service system and method, and electronic device | |
JP7404504B2 (en) | Interpretable tabular data learning using sequential sparse attention | |
CN115049011A (en) | Method and device for determining contribution degree of training member model of federal learning | |
US11521601B2 (en) | Detecting extraneous topic information using artificial intelligence models | |
CN115345298A (en) | Method and device for jointly training models | |
US20220358366A1 (en) | Generation and implementation of dedicated feature-based techniques to optimize inference performance in neural networks | |
CN112307334B (en) | Information recommendation method, information recommendation device, storage medium and electronic equipment | |
CN114386583A (en) | Longitudinal federal neural network model learning method for protecting label information | |
CN113887740A (en) | Method, device and system for jointly updating model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |