CN114676838A - Method and device for jointly updating model - Google Patents

Method and device for jointly updating model Download PDF

Info

Publication number
CN114676838A
CN114676838A CN202210380007.5A CN202210380007A CN114676838A CN 114676838 A CN114676838 A CN 114676838A CN 202210380007 A CN202210380007 A CN 202210380007A CN 114676838 A CN114676838 A CN 114676838A
Authority
CN
China
Prior art keywords
synchronized
model
local
training
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210380007.5A
Other languages
Chinese (zh)
Inventor
郑龙飞
王磊
王力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202210380007.5A priority Critical patent/CN114676838A/en
Publication of CN114676838A publication Critical patent/CN114676838A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioethics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Complex Calculations (AREA)

Abstract

In the federal learning process, a training member uploads part of parameters to be synchronized, and a service party issues aggregate values of part of parameters to be synchronized to the training member, so that data communication traffic in the process of joint training is reduced. The method comprises the steps that for a single training member, a sent aggregation value is selected jointly based on parameters to be synchronized uploaded by the training member and aggregation values determined by a server, so that the local data characteristics and the global data characteristics of the training member are fully considered, a model trained through federal learning better meets the actual business requirements, and the effectiveness of federal learning is improved.

Description

Method and device for jointly updating model
Technical Field
One or more embodiments of the present disclosure relate to the field of computer technology, and more particularly, to a method and apparatus for jointly updating a model.
Background
The development of computer technology has enabled machine learning to be more and more widely applied in various business scenarios. Federated learning is a method of joint modeling with protection of private data. For example, enterprises need to perform collaborative security modeling, and federal learning can be performed, so that data of all parties are used for performing collaborative training on a data processing model on the premise of sufficiently protecting enterprise data privacy, and business data are processed more accurately and effectively. In a federal learning scenario, after negotiating a machine learning model structure (or an agreed model), each party can use private data to train locally, and aggregate model parameters by using a safe and reliable method, and finally, each party improves a local model according to the aggregated model parameters. The federal learning is realized on the basis of privacy protection, a data island is effectively broken, and multi-party combined modeling is realized.
However, with the gradual increase of task complexity and performance requirements, the number of layers of a model network in federal learning tends to be gradually increased, and the number of model parameters is increased correspondingly. Taking face recognition ResNET-50 as an example, the original model has over 2000 ten thousand parameters, and the size of the model exceeds 100 MB. Particularly in a scene with more training members participating in federal learning, the data received by the server is increased in geometric multiples. Therefore, how to sparsify the interactive parameters of each training member in the server side in the process of jointly training the model is an important problem of reducing communication pressure and avoiding communication blockage.
Disclosure of Invention
One or more embodiments of the present specification describe a method and apparatus for jointly updating a model to address one or more of the problems identified in the background.
According to a first aspect, a method for jointly updating a model is provided, which is applied to a process of jointly updating a model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, and the method comprises the following steps: each training member updates M parameters to be synchronized corresponding to the model by using the local training sample, and the parameters to be synchronized are one-to-oneRespective pending parameters corresponding to the model; each training member selects a plurality of parameters to be synchronized from M parameters to be synchronized and uploads the parameters to the corresponding server, wherein the number of the parameters to be synchronized selected by a single training member i is Mi(ii) a The service side aggregates the to-be-synchronized values uploaded by each training member to obtain M aggregated values corresponding to M to-be-synchronized parameters respectively; the server side feeds back each synchronization parameter set for each training member according to the M aggregation values, wherein the synchronization parameter set W corresponding to a single training member iiThere is a correspondence of niAn aggregate value of niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediDetermining the value to be synchronized and the M aggregation values together; and each training member updates the undetermined parameters in the local model by using the corresponding synchronous parameter set, so that the local model is updated.
In one embodiment, the single parameter to be synchronized is one of a single undetermined parameter, a gradient of the single undetermined parameter, and a difference between a current value and an initial value of the single undetermined parameter.
In one embodiment, the undetermined parameters of the local model of each training member are uniformly initialized by the server, and horizontal segmentation is formed between the local training samples of each training member.
In one embodiment, the number m of parameters to be synchronized is selected by a single training member iiAccording to the product of the predetermined local activation ratio and the number M of the parameters to be synchronized.
In one embodiment, the number m of uploaded parameters to be synchronized is determined by a single training member i through at least one of pruning and thinning modelsi
In one embodiment, the service side may aggregate the to-be-synchronized values of the single to-be-synchronized parameter by at least one of weighted summation, averaging, median taking, maximum taking and minimum taking of the to-be-synchronized values uploaded by the training members with respect to the single to-be-synchronized parameter value.
In one embodiment, i, m is trained for membersiThe value to be synchronized describes the partSparse value set
Figure BDA0003592417320000021
The M aggregation values describe an aggregation value set W of the global models,tThe server determines the corresponding synchronization parameter set W in the following wayi: aggregation value set W for global models,tCarrying out sparsification to obtain a global sparse value set
Figure BDA0003592417320000022
Based on local sparse value set
Figure BDA0003592417320000023
And a global sparse value set
Figure BDA0003592417320000024
Determining a corresponding synchronization parameter set Wi
In a further embodiment, the aggregate value set of the global model is described by a matrix, and the local sparse value set and the global sparse value set are described by a local sparse matrix and a global sparse matrix, respectively.
In a further embodiment, the local sparse value-based set is based on a local sparse value
Figure BDA0003592417320000025
And a global sparse value set
Figure BDA0003592417320000026
Determining a corresponding synchronization parameter set WiThe method comprises the following steps: separately detecting local sparse matrices
Figure BDA0003592417320000027
And a global sparse matrix
Figure BDA0003592417320000028
Obtaining a local sparse position matrix M by using the non-zero element positioni,tAnd a global sparse position matrix Ms,t(ii) a Based on sparse position matrix Mi,tAnd Ms,tIs a union of non-zero element positions
Figure BDA0003592417320000029
Determining a sparse position matrix corresponding to a set of synchronization parameters
Figure BDA00035924173200000210
According to a sparse position matrix
Figure BDA00035924173200000211
The indicated non-zero element position selects a plurality of corresponding aggregation values from the aggregation values to form a synchronization parameter set Wi
In a further embodiment, the sparse position matrix
Figure BDA00035924173200000212
The non-zero element positions in (1) are: union set
Figure BDA00035924173200000213
A non-zero element position of; from union set
Figure BDA00035924173200000214
A predetermined number of randomly selected non-zero element positions; from union according to predetermined selection probability
Figure BDA0003592417320000031
Of a predetermined number of non-zero element positions selected, wherein for the sparse position matrix Mi,tAnd Ms,tIs greater than the second selection probability of other positions.
In another further embodiment, the local sparse value-based set is based on a local sparse value
Figure BDA0003592417320000032
And a global sparse value set
Figure BDA0003592417320000033
Determining a corresponding synchronization parameter set WiThe method comprises the following steps: obtaining M aggregations corresponding to the global modelGlobal sparse matrix of values
Figure BDA0003592417320000034
Contrasting local sparse matrices
Figure BDA0003592417320000035
And a global sparse matrix
Figure BDA0003592417320000036
To obtain a correlation coefficient betai,t(ii) a Based on the correlation coefficient betai,tDetermining a synchronization parameter set WiCorresponding sparse position matrix
Figure BDA0003592417320000037
From a sparse position matrix
Figure BDA0003592417320000038
And the respective aggregation values determine the synchronization parameter set Wi
In a further embodiment, the local sparse matrix is contrasted
Figure BDA0003592417320000039
And a global sparse matrix
Figure BDA00035924173200000310
To obtain a correlation coefficient betai,tThe method comprises the following steps: detecting local sparse matrices
Figure BDA00035924173200000311
And a global sparse matrix
Figure BDA00035924173200000312
The correlation distance is a local sparse matrix
Figure BDA00035924173200000313
And a global sparse matrix
Figure BDA00035924173200000314
Or a local sparse position matrix Mi,tAnd a global sparse position matrix Ms,tOne of the euler distance, cosine distance, manhattan distance, pearson similarity, jaccard similarity, and hamming distance; determining the correlation coefficient beta according to the normalization result of the correlation distancei,t
In yet a further embodiment said correlation coefficient β is based on said correlation coefficienti,tDetermining a synchronization parameter set WiCorresponding sparse position matrix
Figure BDA00035924173200000315
The method comprises the following steps: from a local sparse position matrix Mi,tAnd a global sparse position matrix Ms,tMatrix of intersection positions
Figure BDA00035924173200000316
To select a first number N1To obtain a first position matrix
Figure BDA00035924173200000317
From Mi,tAbout
Figure BDA00035924173200000318
Non-zero element complement matrix of
Figure BDA00035924173200000319
To select a second number N2To obtain a second position matrix
Figure BDA00035924173200000320
From Ms,tAbout
Figure BDA00035924173200000321
Non-zero element complement matrix of
Figure BDA00035924173200000322
To select a third number N3To a non-zero element position of, obtainThird position matrix
Figure BDA00035924173200000323
Based on the first position matrix
Figure BDA00035924173200000324
Second position matrix
Figure BDA00035924173200000325
Third position matrix
Figure BDA00035924173200000326
Determining a sparse position matrix
Figure BDA00035924173200000327
In one embodiment, the first number N1As a matrix of intersection positions
Figure BDA00035924173200000328
Number k of non-zero elements ini,tSaid second number N2Correlation coefficient betai,tPositive correlation, the third number N3Correlation coefficient betai,tA negative correlation.
In one embodiment, the first number N1The second number N2The third number N3The sum being a predetermined value k, said second number N2The third number N3Are both equal to predetermined values k and ki,tThe difference of (a) is positively correlated.
In one embodiment, the first number N1Is 0, the second number N2Correlation coefficient betai,tPositive correlation, the third number N3Correlation coefficient betai,tA negative correlation.
In one embodiment, the first number N1Is 0, the second number N2And
Figure BDA00035924173200000329
a non-zero number of oneSo that said third number N3Correlation coefficient betai,tA negative correlation.
In one embodiment, each of the training members updates the pending parameters in the local model using the corresponding synchronization parameter set, so as to update the local model, including: a single training member i utilizes a corresponding set of synchronization parameters WiReplacing the value to be synchronized in the local parameter to be synchronized with each aggregation value in the local parameter to be synchronized; and updating the local model by using the updated parameter to be synchronized.
According to a second aspect, there is provided a method for jointly updating a model, applied to a process of jointly updating a model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, the method performed by the server, and comprising: receiving the value to be synchronized sent by each training member, wherein the number of the value to be synchronized sent by a single training member i is mi,miThe value to be synchronized is M of M parameters to be synchronized updated by using the local training sampleiUpdating values of the parameters to be synchronized, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one; aggregating the values to be synchronized uploaded by each training member to obtain M aggregated values corresponding to M parameters to be synchronized respectively; feeding back each synchronization parameter set for each training member according to the M aggregation values respectively so that each training member updates undetermined parameters in the local model by using the corresponding synchronization parameter set respectively, and accordingly updating the local model, wherein the synchronization parameter set W corresponding to a single training member iiThere is a correspondence of niAn aggregate value of niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediAnd determining the value to be synchronized and the M aggregation values together.
According to a third aspect, a method for jointly updating a model is provided, which is applied to a process of jointly updating a model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, and the method is applied to training a member i, and comprises the following steps: updating M parameters to be synchronized corresponding to the model by using the local training sample, wherein each parameter isThe parameters to be synchronized correspond to the parameters to be determined of the model one by one; selecting M from M parameters to be synchronizediEach parameter to be synchronized is corresponding to miUploading the value to be synchronized to the server side for the server side to feed back the synchronization parameter set WiWherein the synchronization parameter set WiCorresponding to niAn aggregate value of niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediDetermining the value to be synchronized and the M aggregation values together; using synchronization parameter set WiAnd updating the undetermined parameters in the local model so as to update the local model.
According to a fourth aspect, there is provided an apparatus for jointly updating a model, which is applied to a process of jointly updating a model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, the apparatus is provided on the server, and includes:
a receiving unit configured to receive the to-be-synchronized values sent by each training member, wherein the number of the to-be-synchronized values sent by a single training member i is mi,miThe value to be synchronized is M of M parameters to be synchronized updated by using a local training sampleiUpdating values of the parameters to be synchronized, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;
the aggregation unit is configured to aggregate the to-be-synchronized values uploaded by each training member to obtain M aggregation values corresponding to M to-be-synchronized parameters respectively;
a feedback unit configured to feed back each synchronization parameter set for each training member according to the M aggregation values, so that each training member updates the undetermined parameter in the local model by using the corresponding synchronization parameter set, thereby updating the local model, wherein the synchronization parameter set W corresponding to a single training member iiThere is a correspondence of niAn aggregate value of niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediAnd determining the value to be synchronized and the M aggregation values together.
According to a fifth aspect, there is provided an apparatus for jointly updating a model, which is applied to a process of jointly updating a model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, the apparatus is provided for a training member i, and includes:
the training unit is configured to update M parameters to be synchronized corresponding to the model by using a local training sample, and each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;
an uploading unit configured to determine M corresponding to the M parameters to be synchronizediM of a parameter to be synchronizediUploading the value to be synchronized to the server side for the server side to feed back the synchronization parameter set WiWherein the synchronization parameter set WiThere is a correspondence of niAn aggregate value of niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediDetermining the value to be synchronized and the M aggregation values together;
a synchronization unit configured to utilize a synchronization parameter set WiAnd updating the undetermined parameters in the local model so as to update the local model.
According to a sixth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the second or third aspect.
According to a seventh aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and the processor, when executing the executable code, implements the method of the second or third aspect.
By the method, the device and the system provided by the embodiment of the specification, in the federal learning process, the training members upload part of parameters to be synchronized, and the service side issues the aggregation values of part of parameters to be synchronized to the training members, so that the data communication traffic in the joint training process is reduced. The method comprises the steps that for a single training member, a sent aggregation value is selected jointly based on parameters to be synchronized uploaded by the training member and the aggregation value determined by a server, so that the local data characteristics of the training member are fully considered, the model training process is more personalized, and the effectiveness of the model is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of a system architecture for federal learning;
FIG. 2 is a schematic diagram of a specific implementation architecture under the technical concept of the present specification;
FIG. 3 is a schematic diagram illustrating the interaction flow between a server and a single training member in the process of jointly training a model according to one embodiment of the present disclosure;
FIG. 4 illustrates a flow diagram of a joint training model performed by a server in one embodiment of the present description;
FIG. 5 is a schematic flow diagram of a joint training model performed by a server according to another embodiment of the present disclosure;
FIG. 6 is a schematic block diagram of an apparatus for a server-side joint training model according to one embodiment of the present disclosure;
FIG. 7 is a schematic block diagram of an apparatus for a joint training model provided to training members according to another embodiment of the present disclosure.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
Federal Learning (Federated Learning), which may also be referred to as federal machine Learning, joint Learning, league Learning, and the like. Federal machine learning is a machine learning framework, and can effectively help a plurality of organizations to perform data use and machine learning modeling under the condition of meeting the requirements of user privacy protection, data safety and government regulations.
In particular, assuming that enterprise A and enterprise B each build a task model, individual tasks may be categorical or predictive, and these tasks have also been approved by the respective users when obtaining data. However, the models at each end may not be able to be built or may not work well due to incomplete data, such as lack of tag data for enterprise a, lack of user profile data for enterprise B, or insufficient data and insufficient sample size to build a good model. The problem to be solved by federal learning is how to build high-quality models on each end of a and B, the training of the models is used for the data of each enterprise, such as a and B, and the owned data of each enterprise is not known by other parties, namely, a common model is built without violating data privacy regulations. This common model is just like the optimal model that the parties aggregate the data together. In this way, the built model serves only the own targets in the area of each party.
The implementation architecture of federated learning is shown in FIG. 1. The various organizations for federal learning may be referred to as training members, which may also be data holders or data providers. Each training member can hold different business data, and can also participate in the joint training of the model through equipment, a computer, a server and the like. The service data may be various forms of data such as characters, pictures, voice, animation, video, and the like. Generally, the business data held by each training member has correlation, and the business party corresponding to each training member may also have correlation. For example, a plurality of banks relating to financial services are provided as business parties, and each business party can individually provide businesses such as savings and loan to a user, and can hold data such as the age, sex, income and expenditure lines, loan amount, and deposit amount of the user. For another example, a plurality of hospitals related to medical services are used as service parties, and each service party may use diagnosis records such as age, sex, symptoms, diagnosis results, treatment schemes, treatment results, and the like of a user as local service data.
Under this implementation architecture, the model may be trained jointly by two or more training members. The model can be used for processing business data to obtain various models of corresponding business processing results, and can also be called as a business model. What kind of service data is specifically processed and what kind of service processing result is obtained, which depends on actual requirements. For example, the service data may be data related to the user finance, and the obtained service processing result is a financial credit evaluation result of the user, and for example, the service data may be customer service dialogue data of the user, and the obtained service processing result is a recommendation result of a customer service answer, and the like. The service data may be in the form of various forms of data such as text, pictures, animation, audio, video, and the like. And each training member can utilize the trained service model to perform local service processing on local service data.
In the process of jointly training the business model, the service party can provide assistance for joint learning of each business party, for example, assistance in performing nonlinear calculation, comprehensive model parameter or gradient calculation and the like. Fig. 1 shows the service party in the form of a separate party, such as a trusted third party, that is provided independently of the individual training members. In practice, the service party may also be distributed to or composed of each training member, and joint auxiliary computation may be performed between each training member by using a secure computation protocol (e.g., secret sharing). This is not limited in this specification.
Referring to fig. 1, under the implementation framework of federal learning, a server may initialize a global model to distribute to each training member. Each training member can locally calculate the gradient of the model parameters according to the global model determined by the server side, and update the model parameters according to the gradient. The server may aggregate gradients of model parameters, or other parameters related to model parameters. The parameters needing to be aggregated by the service side can be collectively called as parameters to be synchronized, the numerical values corresponding to the parameters to be synchronized can be called as values to be synchronized, and the parameter values for aggregation and synchronization of the service side can be called as aggregation values. The service may feed back the aggregate value to the individual training members. And each training member updates the local model parameters according to the received aggregation values. And circulating in this way, and finally training the business model suitable for each business party. It is to be understood that if a model parameter to be adjusted in the model is referred to as a pending parameter, the parameter to be synchronized may be a parameter associated with the pending parameter, such as the pending parameter itself, a gradient of the pending parameter, a difference between the pending parameter and an initial value, and the like.
It is understood that federated learning can be divided into horizontal federated learning (feature alignment), vertical federated learning (sample alignment), and federated migrant learning. The implementation framework provided by the specification can be used for various federal learning frameworks, and is particularly suitable for horizontal federal learning, namely, each training member provides part of independent samples respectively, or sample data of each training member forms horizontal segmentation.
Under the condition that a plurality of training members participating in federal learning exist, data received by the server are increased in geometric multiple, communication blockage is easily caused, and the efficiency of integral training is seriously influenced. Therefore, in the multi-party federal learning process, the model is usually compressed, that is, the number of parameters uploaded to the service party by a single training member is compressed (i.e., thinned), so as to reduce the pressure of communication transmission. Federal learning generally performs model aggregation based on the assumption of data IID (identical and independent distributions). Specifically, the features of the training samples have the same distribution and are independent of each other. However, since the sample of the data owner is associated with one or more of the corresponding sample subject (e.g. user), the region where the sample subject is located, the time window of data acquisition, and the like, when performing the joint training, the data sets often have different feature distributions or label distributions, and the features are not independent of each other. This type of dataset is referred to as the Non-IID (identical and independent distribution) dataset. For example, banks have an association between income characteristics and deposit characteristics determined by a user, a bank may only have income characteristics of a certain user and no deposit characteristics (e.g., payroll is quickly transferred out to other bank deposits), and the bank or other banks have both income and deposit characteristics for another user. In the case of federal learning model aggregation based on the assumption of data IID, the federal model under the Non _ IID dataset may be poor in performance.
In order to adapt to federal learning under the Non _ IID data set, the specification provides a technical concept of jointly updating the model, and the model is aggregated and updated according to the data characteristics of each training member. Fig. 2 is a schematic diagram illustrating an interaction scenario of each training member with a service party (third party) under the technical concept of the present specification. As shown in fig. 2, assuming that the jointly trained business model is a multi-layer neural network (each layer of neural network is arranged from top to bottom or from bottom to top in fig. 2), the black solid points are considered as the nodes that are currently activated. It can be seen that, in the stage of updating the parameters to be synchronized using the local training data, the model nodes of the respective data parties may all be activated. Further, under the technical concept of the present specification, on one hand, the training member updates the parameter to be synchronized according to the local data characteristics and then uploads the parameter to be synchronized to the service provider as the to-be-synchronized value of the sparse parameter to be synchronized, as shown in the real point in fig. 2. On the other hand, after the service side aggregates the training members, the service side issues the aggregation values to be aggregated to the training members, or may be sparse aggregation values, and in combination with the sparse matrix uploaded by each training member, the service side individually issues the sparse matrix of the aggregation values to each training member. As shown in fig. 2, the activation nodes of the sparse matrix uploaded by a single data party and the sparse matrix issued by the service party may be the same or different, and are not limited herein.
The server side determines a single sparse matrix issued by a single training member in combination with a sparse matrix (with local data characteristics fused) uploaded by the training member. And training the members to synchronize partial parameters to be synchronized according to the sparse matrix issued by the server. Therefore, the communication data volume between the training members and the service side can be effectively reduced, the local model can be updated to adapt to the characteristics of local data, and the effectiveness of federal learning is improved.
The technical idea of the present specification is described in detail below.
Referring to fig. 3, an example of an interaction process of a single training member i with a service party in a federal learning (i.e., joint update model) process is shown. The process of jointly updating the model can be realized by a server and a plurality of training members in the federal learning, and the training member i can be any one of the training members participating in the federal learning. The service side can aggregate the parameters to be synchronized uploaded by each training member. The server here may be any device, platform or cluster of devices with computing, processing capabilities.
It is understood that there may be periods where multiple parameters are synchronized during the process of jointly updating the model. Initially, the server side may determine the global model and initialize the model parameters to be issued to each training member, or each training member may negotiate the model structure, each training member locally constructs the local model, and the server side initializes each model parameter. The service side may also preset the required hyper-parameters, which may include, for example, one or more of a waiting time T, a parameter compression ratio α, a parameter number m uploaded, a model parameter update step size, a total training cycle number, and the like. The model parameters (such as weight parameters, constant term parameters, etc.) in the model are parameters to be adjusted, i.e. parameters to be determined. As mentioned above, each pending parameter may correspond to a parameter to be synchronized. In each synchronization period, the training members determine the updated values of the parameters to be synchronized (namely the values to be synchronized) by using local training samples, upload the values to be synchronized to the server in a sparse mode, aggregate and synchronize the values by the server and feed back the values to the training members, and update the local parameters to be determined by the training members according to the aggregated values.
The process shown in fig. 3 is an example of a synchronization cycle in an embodiment, and an overall process of the joint update model is described in terms of interaction between a training member and a server in combination with any training member i in a plurality of training members. For convenience of description, the number of undetermined parameters in the model is assumed to be a positive integer M greater than 1.
First, in step 301, a training member i updates M parameters to be synchronized corresponding to a model by using a local training sample. It will be appreciated that the pending parameters in the model may be, for example: weight parameters for feature aggregation, excitation parameters for excitation layers, truncation parameters, and the like. The process of business model training is the process of determining the undetermined parameters. In the federal learning process, parameters to be synchronized among training members can be determined according to modes such as pre-negotiation and the like, and the parameters to be synchronized correspond to the parameters to be determined one by one.
In the updating period of the single undetermined parameter, each training member can update each undetermined parameter in the local model according to the local training sample. Training Member i reads local training sample XiN of one batchiStrip sample data biProceed to model YiForward propagation of, get biThe prediction labels corresponding to the training samples are recorded as
Figure BDA0003592417320000091
And then based on the actual sample label yiAnd predictive tagging
Figure BDA0003592417320000092
To determine model loss Li. And then according to the model loss LiAnd adjusting each undetermined parameter. In an optional embodiment, various updating methods such as a gradient descent method, a newton method, and the like may be generally used for updating the undetermined parameter, so as to reduce the undetermined parameter through a gradient direction of the undetermined parameter, so that the undetermined parameter tends to an optimal value. In this case, the training member i can lose L according to the modeliUpdating the gradient of each parameter to be determined by means of a back-propagation algorithm, e.g. as a gradient matrix Gi 0. The updating of the respective pending parameter is performed on the basis of the corresponding gradient.
The single synchronization cycle may correspond to one update cycle of the pending parameter, or may correspond to a plurality of update cycles of the pending parameter. For example, the parameter synchronization is performed every n update periods, where the positive integer n ≧ 1, or the immediate update period is determined at predetermined time intervals. For a single training member, the parameter synchronization process for the parameters to be synchronized may correspond to a single update period. Assuming that the current synchronization period is t, that is, the t-th parameter synchronization process (assuming that parameter synchronization of N periods is performed in total, then t is 0, 1.. and N), a training member may determine the current parameter to be synchronized in the N × t update period, for example, denoted as Wi,t u
In one embodiment, the parameter to be synchronized is a parameter to be determined, and in a corresponding single update cycle, a single training member may use the parameter to be determined updated according to the training samples of the current batch as a value to be synchronized of the parameter to be synchronized.
In another embodiment, the parameter to be synchronized is a gradient of the parameter to be determined, and a single training member may determine the gradient of the parameter to be determined as a value to be synchronized of the parameter to be synchronized before the parameter to be determined is updated according to the training samples of the current batch.
In another embodiment, the parameter to be synchronized may also be a difference between the parameter to be determined and an initial value thereof, and a single training member may compare the parameter to be determined updated according to the training samples of the current batch with the initial value initialized by the server, and the obtained difference is used as the value to be synchronized of the parameter to be synchronized.
In other embodiments, the parameter to be synchronized may also be other parameters according to different update modes of the parameter to be determined in the model, which is not described in detail herein. Aiming at the M undetermined parameters in the model, each training member can update the M undetermined parameters in the same mode to obtain corresponding M updated values.
Then, in step 302, the training member i selects M from M parameters to be synchronizediA parameter to be synchronized to convert the corresponding miAnd uploading the value to be synchronized to a server. It can be understood that under the technical idea of the present specification, each training member needs to compress the data volume of the uploading service. The training member i can compress the number of the parameters to be synchronized from M to MiAnd (4) respectively. m isiCan be much smaller than M, e.g. M is 10 ten thousand, MiIs 1000. The compression of the parameters to be synchronized can be performed by means of pruning, thinning and the like.
In one embodiment, miCan be determined by the number of samples associated with the training member i, such as a preset fixed value, or a fixed value positively correlated with the number of samples M of the training member i. A fixed value of (2). For example, M/N (where N is the number of training members), or other fixed value that is inversely related to the total sample size of each training member, positively related to the sample size held by training member i, and so forth. At this time, at eachThe synchronization period is that the training member i can upload m to the serveriThe value to be synchronized of each parameter to be synchronized.
In one embodiment, miCan be controlled by a preset uploading proportionality coefficient alphaiAnd (4) determining. For example mi=αiX M, or αi×|M|0Wherein | M | R0And determining the zero-order norm of the current updating value for the training member i aiming at the M parameters to be synchronized, wherein the value of the zero-order norm is M. Since the parameters to be synchronized tend to converge with the iteration of the synchronization period, a single training member may upload the parameters to be synchronized to the server according to a smaller upload proportion, in an alternative embodiment, the upload proportion coefficient αiThe reduction may be in the form of linearity, exponential, etc. If the attenuation coefficient rho is preset, the number m of uploaded parameters to be synchronized in the current period tiIs a decreasing function (e.g., exponential function, trigonometric function, etc.) of the attenuation coefficient p. Taking the decreasing function as an exponential function, e.g. mi=α×ρtX M, where p is a number less than 1, e.g. 0.95, then p raised to the power of t, p, with increasing number of cycles ttGradually decreases.
In other embodiments, the training member i may also determine the number m of values to be synchronized uploaded to the server via other reasonable waysiTherefore, the description is omitted. It is noted that when the result obtained by the above calculation method is non-integer, the form of rounding up or rounding down can be adopted as miAnd (6) taking the whole. Similarly, other training members can determine the number of the parameters to be synchronized to be uploaded locally in a similar manner.
Training member i to select M from M parameters to be synchronizediDuring the synchronization, the selection may be performed in a random selection manner, or may be performed in an order from a large absolute value to a small absolute value of an update value, or may be performed continuously according to a preset initial position (e.g., ith parameter of a layer 2 neural network), or may be performed according to a parameter to be synchronized, which is specified in advance for the training member i, or may be performed in a combination of the above manners, which is not limited herein. In determining miAfter each parameter to be synchronized, training member i may obtain m determined in step 301iAnd uploading the updated value serving as the value to be synchronized to the server.
For selected miThe training member i can upload the parameters to the server side after marking the parameters through the unique identifier, and can also upload the parameters to the server side in a parameter matrix form, which is not limited in the specification. For example, the parameter to be synchronized marked by the unique identification is as (w)jk)iWherein (w)jk)iAnd representing the identification of the parameter to be synchronized corresponding to the kth parameter of the j-th layer of neural network of the training member i, wherein the value to be synchronized represents the corresponding value to be synchronized. On the other hand, in matrix form, miThe local sparse matrix corresponding to each value to be synchronized is recorded as
Figure BDA0003592417320000111
Then the local sparse matrix
Figure BDA0003592417320000112
M andim is the position corresponding to each parameter to be synchronizediOne value to be synchronized, the rest (M-M)i) Positions are all 0. When uploading the values to be synchronized in the form of a matrix, the data is transmitted in the form of, for example, [ j, k, ] for a single value to be synchronized]And representing the value to be synchronized of the model parameter corresponding to the jth row and kth column in the parameter matrix of the business model. In other words, the upload manner is index (index) + value (value), (j, k) is index and value. Thus, when the parameters to be synchronized in the form of matrix are uploaded to the server side, the uploaded value is miFurthermore, rows and columns can be defined in a numerical form corresponding to fewer bytes by shaping int and the like, so that the extra data volume is reduced when data is uploaded.
It is understood that pruning, sparsifying models, etc. generally select parameters with larger absolute values. The parameter with a larger absolute value usually has a larger influence on the result, and is the parameter determined in the current period and temporarily in the higher importance. Taking the TopK sparse model method as an example, if the model parameters are described in a matrix form, the training member i can be based on the locally updated parameter matrix to be synchronized
Figure BDA0003592417320000113
M is selected in which the absolute value is the largesti=KiSetting the value of each element to be 1 and setting the other values to be 0 at the corresponding position of the mapping matrix with the same dimension as the parameter matrix to be synchronized to obtain the current local sparse position matrix M of the training member ii,t. Local sparse matrix
Figure BDA0003592417320000114
May be a sparse position matrix Mi,tParameter matrix to be synchronized
Figure BDA0003592417320000115
The product of (a):
Figure BDA0003592417320000116
the server i can exchange
Figure BDA0003592417320000117
And uploading the data to a server. It can be appreciated that the local sparse matrix
Figure BDA0003592417320000118
In selected miThe element of each position is the current value to be synchronized, and the rest positions are 0, so that the value is uploaded
Figure BDA0003592417320000119
Only m can be uploaded in the processiElements of individual positions.
In an optional implementation manner, before uploading the to-be-synchronized parameters, a single training member may add disturbance meeting the difference privacy to the local to-be-synchronized parameters to protect the local data privacy. For example, the disturbed data satisfying the standard gaussian distribution with mean 0 and variance 1 can be added to the parameter to be synchronized through the gaussian mechanism of differential privacy, so as to form the disturbed data to be synchronized. In the case where the data to be synchronized is represented in a matrix form, the added disturbance data may be a disturbance matrix satisfying a predetermined gaussian distribution. Wherein a single training member i can select miAfter the parameter to be synchronized is selected, adding disturbance to the selected parameter to be synchronized, or selecting miAdding disturbance data before each parameter to be synchronized, and selecting M parameters to be synchronized after adding disturbance data according to the rule described in the foregoingiAnd (5) a parameter to be synchronized. This is not a limitation of the present specification. In addition, the noise added by the training member i on the parameter to be synchronized can also satisfy an exponential mechanism, a Laplace (Laplace) mechanism and the like.
And 303, aggregating the to-be-synchronized values uploaded by each training member by the server to obtain M aggregated values corresponding to the M to-be-synchronized parameters respectively. In the aggregation process of the to-be-synchronized values, the aggregation of the to-be-synchronized values of the to-be-synchronized parameters is independent. That is to say, for a single parameter to be synchronized, assuming that s training members feed back s corresponding values to be synchronized to the server, the aggregate value corresponding to the single parameter to be synchronized is the aggregate value of the s updated values.
The aggregation of the to-be-synchronized values of the single to-be-synchronized parameter may be performed by at least one of weighted summation, averaging, median taking, maximum taking, minimum taking, and the like. For example, for the parameter w to be determinedjkIf there are s training members feeding back the value to be synchronized, the aggregate value may be
Figure BDA0003592417320000121
The summation here is done for the training members, e.g. i takes from 1 to s. Under the condition that each training member uploads each local sparse matrix, the server side can also aggregate corresponding values to be synchronized through corresponding elements of the matrix, for example, an aggregation matrix formed by aggregation values in an averaging mode is
Figure BDA0003592417320000122
The matrix division here uses element-wise operation, i.e. the first element (e.g. the first row and the first column) of the matrix as dividend is divided by the first element of the matrix as divisor to obtain the matrix W as quotients,tThe first element of (1).
Particularly, it should be noted that when each training member feeds back parameters to be synchronized to the server, under the condition that there is no mutual agreement or advance negotiation, there may be a situation that none of the parameters to be synchronized corresponding to a plurality of model parameters passes back in the current synchronization period t, that is, the total amount of training samples in the current period corresponding to a single parameter to be synchronized is 0. Thus, in the calculation process, 0 is used as the denominator, and an error value (e.g., NON) is obtained. At this time, the aggregation value in the synchronization parameter set may be determined according to the actual situation. For example, the previous aggregation value may be used as the aggregation value of the current cycle. For another example, when the parameter to be synchronized is the parameter to be determined itself, a special flag may be set so that the values of the corresponding parameter to be synchronized are not synchronized. When the parameter to be synchronized is a gradient value, the value 0 can also be used as the aggregate value of the corresponding parameter to be synchronized in the current update period.
Step 304, the server feeds back the synchronization parameter set W to the training member i according to the M aggregation valuesi. In order to further reduce the traffic during the model updating process, the server may also feed back the aggregation value of part of the parameters to be synchronized to the training members. In consideration of the personalized characteristics of the local training data of each training member, the server can respectively determine different synchronization parameter sets for each training member.
Therefore, the server side can determine the sparse aggregation value issued by the server side based on the value to be synchronized uploaded by the training member i and the M aggregation values. For example, the aggregation value sent by the server is consistent with the parameter to be synchronized corresponding to the value to be synchronized uploaded by the training member i. Under the condition that the value to be synchronized uploaded by the training member i is described in a matrix form, the server side can share the local sparse matrix
Figure BDA0003592417320000131
And the aggregation value of each parameter to be synchronized corresponding to the medium nonzero element is sent to the training member i.
In order to consider the characteristics of global data more comprehensively, in a possible implementation manner, a server may first determine the sparsification of M aggregation values corresponding to a global modelAnd (6) obtaining the result. The sparsification result may be noted as a global sparse value set, for example
Figure BDA0003592417320000132
Which, if described in matrix form, may be referred to as a global sparse matrix
Figure BDA0003592417320000133
The thinning result of the M aggregation values can be performed by pruning, thinning models and the like. In this way, the global sparse value set of the server side can select the relatively important parameter to be synchronized in the global model. Taking TOP K sparsification method as an example, K with the largest absolute value in M aggregation values can be takensAnd constructing a global sparse value set
Figure BDA0003592417320000134
In the matrix situation, the global sparse matrix
Figure BDA0003592417320000135
In the selected KsEach element being corresponding to KsThe other positions are 0.
Then, a global sparse value set may be utilized
Figure BDA0003592417320000136
And local sparse value sets of individual training members (e.g.
Figure BDA0003592417320000137
) Together determining a respective set of synchronization parameters. Using training member i as an example, a global sparse value set may be utilized
Figure BDA0003592417320000138
And local sparse value set
Figure BDA0003592417320000139
Determine together the corresponding synchronization parameter set, denoted Wi. In particular, a global sparse value set may be utilized
Figure BDA00035924173200001310
And local sparse value set
Figure BDA00035924173200001311
Determining the corresponding synchronization parameter set W by at least one item in the intersection and the union of the corresponding parameters to be synchronizedi. The following description of the synchronization parameter set W takes the above value set as a matrix form as an exampleiThe determination process of (1).
Utilizing global sparse matrices
Figure BDA00035924173200001312
And local sparse matrix
Figure BDA00035924173200001313
The non-zero element positions in (b) may determine the non-zero element positions in the synchronization parameter set, e.g. the global sparse position matrix M is determined separatelys,tAnd a local sparse position matrix Mi,tReuse of the global sparse position matrix Ms,tAnd a local sparse position matrix Mi,tDetermining a synchronization parameter matrix
Figure BDA00035924173200001314
Sparse position matrix of (1), as
Figure BDA00035924173200001315
Wherein the content of the first and second substances,
Figure BDA00035924173200001316
the number of non-zero elements in (1) is recorded as niRepresents the service direction to train member feedback niAn aggregate value. Which aggregate values are fed back specifically
Figure BDA00035924173200001317
The non-zero element position determination in (1). n isiIs usually less than M, which is related to MiMay be equal or unequal. Single synchronization parameter matrix
Figure BDA00035924173200001318
Can be regarded asA sparse matrix of aggregation matrices of aggregated values. At this time, the process of the present invention,
Figure BDA00035924173200001319
the element corresponding to the non-zero element position in (b) may be set to 1, which is used to describe a sparse position, and further, a sparse matrix corresponding to the synchronization parameter set corresponding to the training member i is obtained
Figure BDA00035924173200001320
Figure BDA00035924173200001321
The set of synchronization parameters W, which can be seen in matrix formi
Figure BDA00035924173200001322
The non-zero element position in (1) corresponds to the synchronization parameter determined by the server for the training member i in the current updating period.
According to one possible design, a sparse position matrix M may be utilizedi,tAnd Ms,tDetermines the synchronization parameter set WiSparse position matrix in
Figure BDA00035924173200001323
For example, Mi,tAnd Ms,tThe union of the positions of the non-zero elements being
Figure BDA00035924173200001324
Can be formed as
Figure BDA00035924173200001325
Determined as a sparse position matrix
Figure BDA00035924173200001326
Or from
Figure BDA00035924173200001327
Is obtained by randomly selecting a predetermined number of non-zero element positions
Figure BDA00035924173200001328
Non-zero element positions.
In an alternative implementation, the slave
Figure BDA00035924173200001329
Is obtained by randomly selecting a predetermined number of non-zero element positions
Figure BDA00035924173200001330
The importance of each position may also be considered in the non-zero element position process in (1). For example, for sparse position matrix Mi,tAnd Ms,tThe intersection positions of the non-zero element positions in (b) use a greater selection probability. This is because Mi,tAnd Ms,tThe positions where all the non-zero elements are located may be more important positions relative to the local model of the corresponding training member and the global model of the server, and the intersection positions are still non-zero elements and help to transfer more important parameters. The matrix of the assumed intersection positions is
Figure BDA0003592417320000141
Circle ride
Figure BDA0003592417320000142
Representing multiplication of corresponding elements, e.g. Mi,tElement of first row and first column and Ms,tMultiplication of elements of the first row and the first column
Figure BDA0003592417320000143
The first row and the first column. Further, can make
Figure BDA0003592417320000144
Neutralization of
Figure BDA0003592417320000145
Has a greater probability of selection (e.g., 0.7) for the corresponding position of the non-zero element(s), and
Figure BDA0003592417320000146
have a smaller choice in other positionsProbability (e.g., 0.3) based on
Figure BDA0003592417320000147
A predetermined number of non-zero element positions are selected according to the probability.
According to another possible design, a global sparse matrix may also be utilized
Figure BDA0003592417320000148
And local sparse matrix
Figure BDA0003592417320000149
Of the preceding
Figure BDA00035924173200001410
And
Figure BDA00035924173200001411
can be based on
Figure BDA00035924173200001412
And
Figure BDA00035924173200001413
may also be based on the corresponding sparse position matrix Mi,tAnd Ms,tIs determined by the comparison of (a). With Mi,tAnd Ms,tComparative example of (D), Mi,tAnd Ms,tThe correlation of (c) can be described by a correlation coefficient determined by a correlation distance such as the euler distance, cosine distance, manhattan distance, Pearson (Pearson) similarity, jaccard (jaccard) similarity, hamming (hamming) distance, and the like, as denoted by dist (M)i,t,Ms,t). Those skilled in the art will understand that the correlation distance may have a different range according to the description, for example, the euler distance may have a range of 0 to 2, the cosine distance may have a range of 0 to 1, the manhattan distance may have a range of 0 to 2k (2k is the sum of the non-zero elements of the two matrices), and so on. In order to make the correlation coefficient (as beta)i,t) Keeping the interval between 0 and 1, the correlation distance can be adjustedAs a correlation coefficient. E.g. coefficient of correlation beta at euler distancei,tIs dist (M)i,t,Ms,t) Correlation coefficient beta at/2, Manhattan distancei,tIs dist (M)i,t,Ms,t) /2k, etc. Direct use of global sparse matrices
Figure BDA00035924173200001414
And local sparse matrix
Figure BDA00035924173200001415
When determining the correlation coefficient, it is not necessary to use a corresponding sparse position matrix, and the correlation coefficient determining method is similar to that of the sparse position matrix, and is not described herein again.
In an alternative implementation, the correlation coefficient may be determined from Mi,tAnd Ms,tMatrix of intersection positions
Figure BDA00035924173200001416
To select a first number N1Is not zero, e.g. as a first position matrix
Figure BDA00035924173200001417
From Mi,tAbout
Figure BDA00035924173200001418
Non-zero element complement matrix of
Figure BDA00035924173200001419
Of a second number N2Is not zero, e.g. as a second position matrix
Figure BDA00035924173200001420
From Ms,tAbout
Figure BDA00035924173200001421
Of the non-zero element complement matrix
Figure BDA00035924173200001422
To select a third number N3Is not zero, e.g. is denoted as a third position matrix
Figure BDA00035924173200001423
The first number, the second number, and the third number may be determined based on the above correlation coefficients.
In an alternative implementation, the method may be implemented in a computer system
Figure BDA00035924173200001424
The number of the non-zero elements is recorded as ki,tIt can be understood that ki,tIs shown at Mi,tAnd Ms,tAll correspond to the number of positions of the non-zero element, then the first number N1=ki,t. The second quantity may be positively correlated with a correlation coefficient, e.g. N2=βi,t(k-ki,t) The third number may be inversely related to the correlation coefficient, e.g. N3=(1-βi,t)(k-ki,t). Where k is the number of non-zero elements in the global sparse matrix or the global sparse position matrix, and k may be a predetermined value or according to a predetermined sparse truncation value (for example, a numerical position in the global sparse matrix that is greater than the truncation value corresponds to a position of a non-zero element in the sparse matrix). Thus, the global sparse matrix for member i is trained
Figure BDA0003592417320000151
And local sparse matrix
Figure BDA0003592417320000152
The higher the correlation is, the higher the proportion of non-zero element positions of the synchronization parameter set determined according to the non-zero elements in the local sparse matrix of the training member is selected, and the more important parameter values to be synchronized uploaded by the training member tend to be fed back to the training member.
Further, the server may determine the first number N1A second number N2A third number N3Selecting a corresponding number of non-zero element positions from the correlation position matrixTo a corresponding first position matrix
Figure BDA0003592417320000153
Second position matrix
Figure BDA0003592417320000154
Third position matrix
Figure BDA0003592417320000155
Thus, the final sparse position matrix determined for the training member i in the current update cycle
Figure BDA0003592417320000156
In another alternative implementation, the first number N1May be 0, a second number N2May be betai,tk, third number N3May be (1-. beta.)i,t) k, thus, can be selected from
Figure BDA0003592417320000157
In selecting betai,tk elements are obtained
Figure BDA0003592417320000158
In that
Figure BDA0003592417320000159
Selection of (1-. beta.)i,t) k elements are obtained
Figure BDA00035924173200001510
And then to
Figure BDA00035924173200001511
In yet another alternative implementation, the first number N1May be 0, the second number N2Can be combined with
Figure BDA00035924173200001512
In a non-zero amount, e.g. order
Figure BDA00035924173200001513
A third number N3May be (1-. beta.)i,t) k, thus, can be selected from
Figure BDA00035924173200001514
In selecting betai,tk elements are obtained
Figure BDA00035924173200001515
In that
Figure BDA00035924173200001516
Selection of (1-. beta.)i,t) k elements are obtained
Figure BDA00035924173200001517
And then to
Figure BDA00035924173200001518
In other optional implementation manners, the service side may also determine the synchronization parameter set of the current update period for each training member through other reasonable manners, which is not described herein again. In the above description, a sparse matrix is compared with its corresponding sparse position matrix, and the positions of the non-zero elements of the two are consistent, except that the non-zero elements of the sparse position matrix may be a predetermined value (e.g. 1), and the non-zero elements of the sparse matrix are the corresponding real element values of the matrix before sparsification. It is worth noting that the aggregate value set W of the global models,tGlobal sparse value set
Figure BDA00035924173200001519
Local sparse value set
Figure BDA00035924173200001520
Instead of being described in the form of a matrix, the individual values of the flags may be identified by the parameters of the parameters to be determined, in which case the synchronization parameter set WiThe determination principle of (2) is similar to that in the form of a matrix, except that the intersection, union, etc. of the parameter identifications can be determined by the parameter identifications without judging via a position matrixAnd will not be described herein.
Then, the service side may feed back each synchronization parameter set corresponding to each training member to the corresponding training member. Wherein, the synchronization parameter set for the feedback of the training member i may be Wi. In an alternative embodiment, the synchronization parameter set WiMay be by means of a sparse matrix
Figure BDA00035924173200001521
And (4) showing.
In step 305, training member i utilizes synchronization parameter set WiAnd updating the undetermined parameters in the local model so as to update the local model. It is understood that the synchronization parameter set fed back by the service party can be updated by each training member.
For example, in the case that the parameters to be synchronized are model parameters, the respective aggregation values in the synchronization parameter set are used to replace the corresponding parameters to be determined of the local model one by one. And under the condition that the parameter to be synchronized is the gradient of the parameter to be determined, updating the local corresponding parameter to be determined one by one according to the corresponding step length by using a gradient descent method, a Newton method and the like and using each aggregation value in the synchronization parameter set. Described in a matrix manner, the updating process of the parameters to be synchronized in the training member i is, for example
Figure BDA0003592417320000161
Wherein the content of the first and second substances,
Figure BDA0003592417320000162
based on the synchronization parameter set WiA determined sparse position matrix. It can be seen that the training member is local to only the sparse matrix WiThe elements in (1) are replaced.
In this way, each training member selectively synchronizes the local parameters with other training members via the server. Therefore, the communication data volume is saved, the correlation between the global model and the local model is fully considered, and an effective training scheme is provided for the federal study with large data volume.
The training members can jointly train the model through a plurality of iterations of the synchronization period shown in fig. 2 with the assistance of the server. The iteration end condition of the joint training model may be: the parameters to be synchronized tend to converge, the model loss tends to converge, the iteration period reaches a predetermined period, and the like. Wherein convergence may be understood as the amount of change being smaller than a predetermined threshold.
The model update flow of one embodiment of the present specification is described above in connection with the schematic diagram of FIG. 3 from the interaction perspective of a training member and a server. FIG. 4 shows a flow diagram of a server and a plurality of training members jointly updating a model described from the perspective of the server.
As shown in fig. 4, from the perspective of the server, the process of jointly updating the model includes:
step 401, receiving the synchronization values sent by each training member, wherein the number of the synchronization values sent by a single training member i is mi,miThe value to be synchronized is M of M parameters to be synchronized updated by using a local training sampleiUpdating values of the parameters to be synchronized, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;
step 402, aggregating the values to be synchronized uploaded by each training member to obtain M aggregated values corresponding to M parameters to be synchronized respectively;
step 403, feeding back each synchronization parameter set for each training member according to the M aggregation values, so that each training member updates the undetermined parameter in the local model by using the corresponding synchronization parameter set, thereby updating the local model, where the synchronization parameter set W corresponding to a single training member iiThere is a correspondence of niA polymerization value, niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediAnd determining the value to be synchronized and the M aggregation values together.
FIG. 5 shows a flow diagram of a server and a plurality of training members jointly updating a model described from the perspective of the training members. As shown in FIG. 5, from the perspective of training member i, the process of jointly updating the model includes:
step 501, updating M parameters to be synchronized corresponding to a model by using a local training sample, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;
step 502, select M from M parameters to be synchronizediEach parameter to be synchronized is corresponding to miUploading the value to be synchronized to the server side for the server side to feed back the synchronization parameter set WiWherein the synchronization parameter set WiThere is a correspondence of niA polymerization value, niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediDetermining the values to be synchronized and the M aggregation values together;
step 503, using the synchronization parameter set WiAnd updating the undetermined parameters in the local model so as to update the local model.
It can be understood that fig. 4 and fig. 5 are specific examples of the flow executed by the server and the training member i in fig. 3 in a single synchronization cycle, respectively, and therefore, the corresponding description of the execution flow of the relevant party in fig. 3 is also applicable to fig. 4 and fig. 5, and is not repeated herein.
Reviewing the process, in a single synchronization period of the joint training model, after each training member updates M parameters to be synchronized by using a local training sample, each training member selects part of the updated values of the parameters to be synchronized to upload to the server, and the server selects part of the aggregated values to issue, so that the communication pressure between the training members and the server is reduced, the calculation amount of the server can be reduced, and the learning efficiency is improved. The server side aggregates the updated values of the parameters to be synchronized by each training member to obtain M aggregated values, and then determines a corresponding synchronization parameter set according to the personalized characteristics of each training member. The data characteristics of the local samples of the training members are fully considered, so that the federal learning is more pertinent and personalized.
According to an embodiment of another aspect, a system for federated update models is provided that includes a server and a plurality of training members. Assuming that the training member i is any training member of the plurality of training members and is recorded as the training member i, the training member i and the service side may be respectively provided with corresponding devices for jointly updating the model, so as to complete federal learning in a matching manner.
FIG. 6 illustrates an apparatus 600 for a federated update model hosted by a server. As shown in fig. 6, the apparatus 600 includes:
a receiving unit 61 configured to receive the to-be-synchronized values sent by each training member, where the number of the to-be-synchronized values sent by a single training member i is mi,miThe value to be synchronized is M of M parameters to be synchronized updated by using a local training sampleiUpdating values of the parameters to be synchronized, wherein the parameters to be synchronized correspond to the parameters to be determined of the model one by one;
the aggregation unit 62 is configured to aggregate the to-be-synchronized values uploaded by each training member to obtain M aggregation values corresponding to the M to-be-synchronized parameters, respectively;
a feedback unit 63 configured to respectively feed back each synchronization parameter set for each training member according to the M aggregation values, so that each training member updates the undetermined parameter in the local model by using the corresponding synchronization parameter set, thereby updating the local model, where the synchronization parameter set W corresponding to a single training member iiThere is a correspondence of niA polymerization value, niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediAnd determining the value to be synchronized and the M aggregation values together.
FIG. 7 illustrates an apparatus 700 for a joint update model provided to any one of the training members. As shown in fig. 7, the apparatus 700 includes:
the training unit 71 is configured to update M to-be-synchronized parameters corresponding to the model by using a local training sample, wherein each to-be-synchronized parameter corresponds to each to-be-determined parameter of the model one by one;
an uploading unit 72 configured to select M from the M parameters to be synchronizediThe parameter to be synchronized corresponds to miUploading the value to be synchronized to the server side for the server side to feed back the synchronization parameter set WiWherein the synchronization parameter set WiThere is a correspondence of niA polymerization value, niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediDetermining the value to be synchronized and the M aggregation values together;
a synchronization unit 73 configured to utilizeStep parameter set WiAnd updating the undetermined parameters in the local model so as to update the local model.
It should be noted that the apparatuses 600 and 700 shown in fig. 6 and fig. 7 correspond to the method embodiments shown in fig. 4 and fig. 5, respectively, and may be applied to the server and the single training member in the method embodiment shown in fig. 3, respectively, to cooperate with the training member to complete the process of jointly updating the business model in fig. 3. Therefore, the description related to the service party and the single training member in fig. 3 can be applied to the apparatuses 600 and 700 shown in fig. 6 and fig. 7, and will not be described herein again.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 4 or fig. 5 or the like.
According to an embodiment of still another aspect, there is also provided a computing device including a memory and a processor, the memory having stored therein executable code, the processor implementing the method described in conjunction with fig. 4 or fig. 5, and so on, when executing the executable code.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of this specification may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments are intended to explain the technical idea, technical solutions and advantages of the present specification in further detail, and it should be understood that the above-mentioned embodiments are merely specific embodiments of the technical idea of the present specification, and are not intended to limit the scope of the technical idea of the present specification, and any modification, equivalent replacement, improvement, etc. made on the basis of the technical solutions of the embodiments of the present specification should be included in the scope of the technical idea of the present specification.

Claims (24)

1. A method for jointly updating a model is applied to a process of jointly updating the model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, and the method comprises the following steps:
each training member updates M parameters to be synchronized corresponding to the model by using a local training sample, and each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;
each training member selects a plurality of parameters to be synchronized from M parameters to be synchronized and uploads the parameters to the corresponding server, wherein the number of the parameters to be synchronized selected by a single training member i is Mi
The server side aggregates the values to be synchronized uploaded by each training member to obtain M aggregated values corresponding to M parameters to be synchronized respectively;
the server side feeds back each synchronization parameter set for each training member according to the M aggregation values, wherein the synchronization parameter set W corresponding to a single training member iiThere is a correspondence of niAn aggregate value of niN corresponding to each aggregate valueiM via which the parameter to be synchronized is uploadediDetermining the values to be synchronized and the M aggregation values together;
and each training member updates the undetermined parameters in the local model by using the corresponding synchronous parameter set, so that the local model is updated.
2. The method of claim 1, wherein the single parameter to be synchronized is one of a single pending parameter, a gradient of the single pending parameter, a difference between a current value of the single pending parameter and an initial value.
3. The method of claim 1, wherein the pending parameters of the local model of each training member are uniformly initialized by the server, and a horizontal split is formed between the local training samples of each training member.
4. The method of claim 1, wherein a single trainerNumber m of parameters to be synchronized selected by training member iiAccording to the product of the predetermined local activation ratio and the number M of the parameters to be synchronized.
5. The method of claim 1, wherein the number m of uploaded parameters to be synchronized is determined by a single training member i through at least one of pruning and sparsifying a modeli
6. The method of claim 1, wherein the service side aggregates the to-be-synchronized values of the single to-be-synchronized parameter by at least one of weighted summation, averaging, median taking, maximum taking and minimum taking of the to-be-synchronized values uploaded by the training members with respect to the single to-be-synchronized parameter value.
7. The method of claim 1, wherein i, m is for a training memberiDescribing local sparse value set by each value to be synchronized
Figure FDA0003592417310000011
The M aggregation values describe an aggregation value set W of the global models,tThe server determines the corresponding synchronization parameter set W in the following wayi
Aggregation value set W for global models,tCarrying out sparsification to obtain a global sparse value set
Figure FDA0003592417310000012
Based on local sparse value set
Figure FDA0003592417310000013
And a global sparse value set
Figure FDA0003592417310000014
Determining a corresponding synchronization parameter set Wi
8. The method of claim 7, wherein the aggregate value set of the global model is described by a matrix, and the local sparse value set and the global sparse value set are described by a local sparse matrix and a global sparse matrix, respectively.
9. The method of claim 8, wherein the local sparse value set based is
Figure FDA0003592417310000021
And a global sparse value set
Figure FDA0003592417310000022
Determining a corresponding synchronization parameter set WiThe method comprises the following steps:
separately detecting local sparse matrices
Figure FDA0003592417310000023
And a global sparse matrix
Figure FDA0003592417310000024
Obtaining a local sparse position matrix M by using the non-zero element positioni,tAnd a global sparse position matrix Ms,t
Based on sparse position matrix Mi,tAnd Ms,tIs a union of non-zero element positions
Figure FDA0003592417310000025
Determining a sparse position matrix corresponding to a synchronization parameter set
Figure FDA0003592417310000026
According to a sparse position matrix
Figure FDA0003592417310000027
The indicated non-zero element positions select corresponding aggregation values from the aggregation values to form a synchronization parameter set Wi
10. The method of claim 9, wherein the sparse position matrix
Figure FDA0003592417310000028
The non-zero element positions in (1) are:
union set
Figure FDA0003592417310000029
A non-zero element position of;
from union set
Figure FDA00035924173100000210
A predetermined number of randomly selected non-zero element positions;
from union according to predetermined selection probability
Figure FDA00035924173100000211
Of a predetermined number of non-zero element positions selected, wherein for the sparse position matrix Mi,tAnd Ms,tIs greater than the second selection probability of other positions.
11. The method of claim 8, wherein the local sparse value set-based
Figure FDA00035924173100000212
And a global sparse value set
Figure FDA00035924173100000213
Determining a corresponding synchronization parameter set WiThe method comprises the following steps:
obtaining a global sparse matrix of M aggregation values corresponding to a global model
Figure FDA00035924173100000214
Contrasting local sparse matrices
Figure FDA00035924173100000215
And a global sparse matrix
Figure FDA00035924173100000216
To obtain a correlation coefficient betai,t
Based on the correlation coefficient betai,tDetermining a synchronization parameter set WiCorresponding sparse position matrix
Figure FDA00035924173100000217
From a sparse position matrix
Figure FDA00035924173100000218
And the respective aggregation values, determining a synchronization parameter set Wi
12. The method of claim 11, wherein the local sparse matrices are compared
Figure FDA00035924173100000219
And a global sparse matrix
Figure FDA00035924173100000220
To obtain a correlation coefficient betai,tThe method comprises the following steps:
detecting local sparse matrices
Figure FDA00035924173100000221
And a global sparse matrix
Figure FDA00035924173100000222
The correlation distance is a local sparse matrix
Figure FDA00035924173100000223
And a global sparse matrix
Figure FDA00035924173100000224
Or a local sparse position matrix Mi,tAnd a global sparse position matrix Ms,tOne of the euler distance, cosine distance, manhattan distance, pearson similarity, jaccard similarity, and hamming distance;
determining the correlation coefficient beta according to the normalization result of the correlation distancei,t
13. The method of claim 11, wherein the correlation coefficient β is based oni,tDetermining a synchronization parameter set WiCorresponding sparse position matrix
Figure FDA0003592417310000031
The method comprises the following steps:
from a local sparse position matrix Mi,tAnd a global sparse position matrix Ms,tMatrix of intersection positions
Figure FDA0003592417310000032
Is selected to be a first number N1To obtain a first position matrix
Figure FDA0003592417310000033
From Mi,tAbout
Figure FDA0003592417310000034
Non-zero element complement matrix of
Figure FDA0003592417310000035
Of a second number N2To obtain a second position matrix
Figure FDA0003592417310000036
From Ms,tAbout
Figure FDA0003592417310000037
Non-zero element complement matrix of
Figure FDA0003592417310000038
To select a third number N3To obtain a third position matrix
Figure FDA0003592417310000039
Based on the first position matrix
Figure FDA00035924173100000310
Second position matrix
Figure FDA00035924173100000311
Third position matrix
Figure FDA00035924173100000312
Determining a sparse position matrix
Figure FDA00035924173100000313
14. The method of claim 13, wherein the first number N1As a matrix of intersection positions
Figure FDA00035924173100000314
Number k of non-zero elements ini,tSaid second number N2Correlation coefficient betai,tPositive correlation, the third number N3Correlation coefficient betai,tA negative correlation.
15. The method of claim 14, wherein the first number N1Said second number N2The third number N3The sum being a predetermined value k, said second number N2The third number N3Are all equal to the predetermined valuek and ki,tThe difference of (a) is positively correlated.
16. The method of claim 13, wherein the first number N1Is 0, the second number N2Coefficient of correlation betai,tPositive correlation, the third number N3Correlation coefficient betai,tAnd (4) carrying out negative correlation.
17. The method of claim 13, wherein the first number N1Is 0, the second number N2And
Figure FDA00035924173100000315
is consistent with the non-zero number in (b), the third number N3Correlation coefficient betai,tA negative correlation.
18. The method of claim 1, wherein the training members each update the pending parameters in the local model with a respective set of synchronization parameters, such that updating the local model comprises:
a single training member i utilizes a corresponding set of synchronization parameters WiReplacing the value to be synchronized in the local parameter to be synchronized with each aggregation value in the local parameter to be synchronized;
and updating the local model by using the updated parameter to be synchronized.
19. A method for jointly updating a model, which is applied to a process of jointly updating the model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, and the method is executed by the server and comprises the following steps:
receiving the value to be synchronized sent by each training member, wherein the number of the value to be synchronized sent by a single training member i is mi,miThe value to be synchronized is M of M parameters to be synchronized updated by using a local training sampleiUpdating values of the parameters to be synchronized, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;
aggregating the values to be synchronized uploaded by each training member to obtain M aggregated values corresponding to M parameters to be synchronized respectively;
feeding back each synchronization parameter set for each training member according to the M aggregation values respectively so that each training member updates undetermined parameters in the local model by using the corresponding synchronization parameter set respectively, and accordingly updating the local model, wherein the synchronization parameter set W corresponding to a single training member iiThere is a correspondence of niAn aggregate value of niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediAnd determining the value to be synchronized and the M aggregation values together.
20. A method for jointly updating a model is applied to a process of jointly updating the model by a server and a plurality of training members, wherein a local model of each training member is consistent with a global model structure held by the server, and the method is suitable for training a member i and comprises the following steps:
updating M parameters to be synchronized corresponding to the model by using a local training sample, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;
selecting M from M parameters to be synchronizediEach parameter to be synchronized is corresponding to miUploading the value to be synchronized to the server side for the server side to feed back the synchronization parameter set WiWherein the synchronization parameter set WiCorresponding to niAn aggregate value of niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediDetermining the value to be synchronized and the M aggregation values together;
using synchronization parameter set WiAnd updating the undetermined parameters in the local model so as to update the local model.
21. The utility model provides a device of joint update model, is applied to the process that server and a plurality of training members jointly updated the model, and wherein, the local model of each training member is unanimous with the global model structure that server held, the device is located the server side, includes:
a receiving unit configured to receive the to-be-synchronized values sent by each training member, wherein the number of the to-be-synchronized values sent by a single training member i is mi,miThe value to be synchronized is M of M parameters to be synchronized updated by using a local training sampleiUpdating values of the parameters to be synchronized, wherein each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;
the aggregation unit is configured to aggregate the to-be-synchronized values uploaded by each training member to obtain M aggregation values corresponding to M to-be-synchronized parameters respectively;
a feedback unit configured to feed back each synchronization parameter set for each training member according to the M aggregation values, so that each training member updates the undetermined parameter in the local model by using the corresponding synchronization parameter set, thereby updating the local model, wherein the synchronization parameter set W corresponding to a single training member iiCorresponding to niAn aggregate value of niN corresponding to each aggregate valueiM via which each parameter to be synchronized is uploadediAnd determining the value to be synchronized and the M aggregation values together.
22. The device for jointly updating the model is applied to the process of jointly updating the model by a server and a plurality of training members, wherein the local model of each training member is consistent with the global model structure held by the server, the device is arranged on a training member i, and the device comprises:
the training unit is configured to update M parameters to be synchronized corresponding to the model by using a local training sample, and each parameter to be synchronized corresponds to each parameter to be determined of the model one by one;
an uploading unit configured to determine M corresponding to the M parameters to be synchronizediM of a parameter to be synchronizediUploading the value to be synchronized to the server side for the server side to feed back the synchronization parameter set WiWherein the parameter set W is synchronizediThere is a correspondence of niAn aggregate value of niN corresponding to each aggregate valueiM via which the parameter to be synchronized is uploadediA value to be synchronized and the M aggregated valuesJointly determining;
a synchronization unit configured to utilize a synchronization parameter set WiAnd updating the undetermined parameters in the local model, thereby updating the local model.
23. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of claim 19 or 20.
24. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of claim 19 or 20.
CN202210380007.5A 2022-04-12 2022-04-12 Method and device for jointly updating model Pending CN114676838A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210380007.5A CN114676838A (en) 2022-04-12 2022-04-12 Method and device for jointly updating model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210380007.5A CN114676838A (en) 2022-04-12 2022-04-12 Method and device for jointly updating model

Publications (1)

Publication Number Publication Date
CN114676838A true CN114676838A (en) 2022-06-28

Family

ID=82078238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210380007.5A Pending CN114676838A (en) 2022-04-12 2022-04-12 Method and device for jointly updating model

Country Status (1)

Country Link
CN (1) CN114676838A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115186937A (en) * 2022-09-09 2022-10-14 闪捷信息科技有限公司 Prediction model training and data prediction method and device based on multi-party data cooperation
CN115909746A (en) * 2023-01-04 2023-04-04 中南大学 Traffic flow prediction method, system and medium based on federal learning
CN116935143A (en) * 2023-08-16 2023-10-24 中国人民解放军总医院 DFU medical image classification method and system based on personalized federal learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898767A (en) * 2020-08-06 2020-11-06 深圳前海微众银行股份有限公司 Data processing method, device, equipment and medium
CN112288100A (en) * 2020-12-29 2021-01-29 支付宝(杭州)信息技术有限公司 Method, system and device for updating model parameters based on federal learning
CN113221105A (en) * 2021-06-07 2021-08-06 南开大学 Robustness federated learning algorithm based on partial parameter aggregation
CN113360514A (en) * 2021-07-02 2021-09-07 支付宝(杭州)信息技术有限公司 Method, device and system for jointly updating model
CN113377797A (en) * 2021-07-02 2021-09-10 支付宝(杭州)信息技术有限公司 Method, device and system for jointly updating model
CN113469367A (en) * 2021-05-25 2021-10-01 华为技术有限公司 Method, device and system for federated learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898767A (en) * 2020-08-06 2020-11-06 深圳前海微众银行股份有限公司 Data processing method, device, equipment and medium
CN112288100A (en) * 2020-12-29 2021-01-29 支付宝(杭州)信息技术有限公司 Method, system and device for updating model parameters based on federal learning
CN113469367A (en) * 2021-05-25 2021-10-01 华为技术有限公司 Method, device and system for federated learning
CN113221105A (en) * 2021-06-07 2021-08-06 南开大学 Robustness federated learning algorithm based on partial parameter aggregation
CN113360514A (en) * 2021-07-02 2021-09-07 支付宝(杭州)信息技术有限公司 Method, device and system for jointly updating model
CN113377797A (en) * 2021-07-02 2021-09-10 支付宝(杭州)信息技术有限公司 Method, device and system for jointly updating model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
IREM ERGÜN: "SPARSIFIED SECURE AGGREGATION FOR PRIVACY-PRESERVING FEDERATED LEARNING", 《ARXIV:2112.12872V1》, 23 December 2021 (2021-12-23), pages 1 - 28 *
毛耀如: "针对分布式联邦深度学习的攻击模型及隐私对策研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》, 30 April 2021 (2021-04-30), pages 138 - 114 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115186937A (en) * 2022-09-09 2022-10-14 闪捷信息科技有限公司 Prediction model training and data prediction method and device based on multi-party data cooperation
CN115909746A (en) * 2023-01-04 2023-04-04 中南大学 Traffic flow prediction method, system and medium based on federal learning
CN116935143A (en) * 2023-08-16 2023-10-24 中国人民解放军总医院 DFU medical image classification method and system based on personalized federal learning
CN116935143B (en) * 2023-08-16 2024-05-07 中国人民解放军总医院 DFU medical image classification method and system based on personalized federal learning

Similar Documents

Publication Publication Date Title
Zhu et al. Federated learning on non-IID data: A survey
US11170395B2 (en) Digital banking platform and architecture
CN111931950B (en) Method and system for updating model parameters based on federal learning
CN110084377B (en) Method and device for constructing decision tree
CN114676838A (en) Method and device for jointly updating model
CN113377797B (en) Method, device and system for jointly updating model
CN112085159B (en) User tag data prediction system, method and device and electronic equipment
US11238364B2 (en) Learning from distributed data
CN112799708B (en) Method and system for jointly updating business model
US11410644B2 (en) Generating training datasets for a supervised learning topic model from outputs of a discovery topic model
WO2023174036A1 (en) Federated learning model training method, electronic device and storage medium
CN112068866B (en) Method and device for updating business model
CN111460528A (en) Multi-party combined training method and system based on Adam optimization algorithm
CN113360514B (en) Method, device and system for jointly updating model
US11843587B2 (en) Systems and methods for tree-based model inference using multi-party computation
US20210342744A1 (en) Recommendation method and system and method and system for improving a machine learning system
WO2024114640A1 (en) User portrait-based user service system and method, and electronic device
JP7404504B2 (en) Interpretable tabular data learning using sequential sparse attention
CN115049011A (en) Method and device for determining contribution degree of training member model of federal learning
US11521601B2 (en) Detecting extraneous topic information using artificial intelligence models
CN115345298A (en) Method and device for jointly training models
US20220358366A1 (en) Generation and implementation of dedicated feature-based techniques to optimize inference performance in neural networks
CN112307334B (en) Information recommendation method, information recommendation device, storage medium and electronic equipment
CN114386583A (en) Longitudinal federal neural network model learning method for protecting label information
CN113887740A (en) Method, device and system for jointly updating model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination