WO2023036184A1 - Methods and systems for quantifying client contribution in federated learning - Google Patents

Methods and systems for quantifying client contribution in federated learning Download PDF

Info

Publication number
WO2023036184A1
WO2023036184A1 PCT/CN2022/117577 CN2022117577W WO2023036184A1 WO 2023036184 A1 WO2023036184 A1 WO 2023036184A1 CN 2022117577 W CN2022117577 W CN 2022117577W WO 2023036184 A1 WO2023036184 A1 WO 2023036184A1
Authority
WO
WIPO (PCT)
Prior art keywords
clients
training
utility
client
matrix
Prior art date
Application number
PCT/CN2022/117577
Other languages
English (en)
French (fr)
Inventor
Zhenan FAN
Huang FANG
Zirui ZHOU
Yong Zhang
Original Assignee
Huawei Cloud Computing Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co., Ltd. filed Critical Huawei Cloud Computing Technologies Co., Ltd.
Priority to CN202280060951.0A priority Critical patent/CN117999562A/zh
Publication of WO2023036184A1 publication Critical patent/WO2023036184A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]

Definitions

  • the present disclosure relates to methods and systems for federated learning, including methods and systems for quantifying client contributions in federated learning.
  • Federated learning is a machine learning technique in which multiple computing systems (also referred to as clients) with different data owners participate in training a machine learning algorithm to learn a global model (maintained at a central server) without sharing their data with the central server.
  • the local data of each client may be private or proprietary in nature (e.g., photos, health data, social media data, banking data, retail data, etc. ) .
  • Federated learning thus helps with preserving the privacy of such local data by enabling the global model to be trained (i.e., enabling the learnable parameters (e.g., weights and biases) of the global model to be set to values that result in satisfactory performance of the global model at inference) without requiring the clients to share their local data with the central server or with other clients.
  • a client performs localized training of a local model using a machine learning algorithm and its respective set of local data (also referred to as a local dataset) to learn values of the learnable parameters of the local model.
  • the client transmits information to the central server about its learned values (e.g., in the form of gradients) to be used to adjust the values of the learnable parameters of the global model.
  • the central server aggregates the information received from multiple clients and uses the aggregated information to adjust the values of the learnable parameters of the global model.
  • the present disclosure describes methods and systems that enable a more fair and efficient quantification of the contribution of individual clients to a federated learning system, while ensuring that data privacy of individual clients is preserved.
  • the contribution scores are based on the Shapley value and are computed using a utility matrix storing utility function values.
  • the present disclosure describes the use of a utility function, defined for use with federated learning, to compute the contribution scores for each client in the federated learning system in a manner that avoids unfairly penalizing clients due to random selection. Examples of the present disclosure thus provide the technical advantage that clients’ contribution to the federated learning system may be quantified, which may help to promote more efficient use of resources in the federated learning system.
  • the present disclosure describes a computing system including a processing unit configured to execute instructions to cause the computing system to: conduct multiple rounds of training with a plurality of clients, wherein the processing unit is further configured to cause the computing system to conduct each given round of training by: receiving updates from selected clients of the plurality of clients, the selected clients being selected for the given round of training; computing one or more utility function values for the given round of training using the received updates; and storing the computed one or more utility function values in a utility matrix.
  • the utility matrix after the multiple rounds of training, is a sparse matrix with missing entries.
  • the processing unit is also configured to execute instructions to cause the computing system to: complete the missing entries of the utility matrix, after the multiple rounds of training, by computing a completed utility matrix; and compute a contribution score for each of the plurality of clients using the completed utility matrix.
  • the processing unit may be further configured to cause the computing system to conduct each given round of training by: transmitting global parameters of a global model to the plurality of clients; computing an aggregated update using the received updates from the selected clients; and updating the global parameters using the aggregated update.
  • computing the one or more utility function values in each given round of training may include: identifying one or more subset of clients included in the selected clients; for each given identified subset of clients: compute a first test loss using the global model prior to updating the global parameters and a second test loss using the global model after updating the global parameters; and compute a difference between the first test loss and the second test loss, the computed difference being the utility function value for the given identified subset of clients in the given round of training.
  • the completed utility matrix may contain utility function values for all possible subsets of clients in the plurality of clients for all rounds of training.
  • the processing unit may be further configured to cause the computing system to: identify, among all possible subsets of clients in the plurality of clients, a subsample of subsets; where, during the multiple rounds of training, the one or more utility function values may be computed for only subsets of clients belonging to the identified subsample of subsets.
  • the contribution score for a given client may be computed according to:
  • i denotes an index of the given client
  • N denotes a total number of the plurality of clients
  • T denotes a total number of the multiple rounds of training
  • S denotes a subset of clients of the plurality of clients
  • U t S denotes the (t, S) entry of the utility matrix.
  • the completed utility matrix may be computed as a pair of decomposition matrices that together form the completed utility matrix.
  • the decomposition matrices may be substituted for the completed utility matrix in computation of the contribution score for each of the plurality of clients.
  • the processing unit may be further configured to cause the computing system to: exclude any client having a low contribution score from one or more future rounds of training.
  • the processing unit may be further configured to cause the computing system to: provide each client with resources proportionate to the respective contribution score.
  • the present disclosure describes a method including: conducting multiple rounds of training with a plurality of clients, each given round of training including: receiving updates from selected clients of the plurality of clients, the selected clients being selected for the given round of training; computing one or more utility function values for the given round of training using the received updates; and storing the computed one or more utility function values in a utility matrix.
  • the utility matrix after the multiple rounds of training, is a sparse matrix with missing entries.
  • the method may also include: completing the missing entries of the utility matrix, after the multiple rounds of training, by computing a completed utility matrix; and computing a contribution score for each of the plurality of clients using the completed utility matrix.
  • each given round of training may include: transmitting global parameters of a global model to the plurality of clients; computing an aggregated update using the received updates from the selected clients; and updating the global parameters using the aggregated update.
  • computing the one or more utility function values in each given round of training may include: identifying one or more subset of clients included in the selected clients; for each given identified subset of clients: computing a first test loss using the global model prior to updating the global parameters and a second test loss using the global model after updating the global parameters; and computing a difference between the first test loss and the second test loss, the computed difference being the utility function value for the given identified subset of clients in the given round of training.
  • the completed utility matrix may contain utility function values for all possible subsets of clients in the plurality of clients for all rounds of training.
  • the method may include: identifying, among all possible subsets of clients in the plurality of clients, a subsample of subsets; where, during the multiple rounds of training, the one or more utility function values are computed for only subsets of clients belonging to the identified subsample of subsets.
  • the contribution score for a given client may be computed according to:
  • i denotes an index of the given client
  • N denotes a total number of the plurality of clients
  • T denotes a total number of the multiple rounds of training
  • S denotes a subset of clients of the plurality of clients
  • U t S denotes the (t, S) entry of the utility matrix.
  • the completed utility matrix may be computed as a pair of decomposition matrices that together form the completed utility matrix.
  • the decomposition matrices may be substituted for the completed utility matrix in computation of the contribution score for each of the plurality of clients.
  • the method may include: excluding any client having a low contribution score from one or more future rounds of training.
  • the method may include: providing each client with resources proportionate to the respective contribution score.
  • the present disclosure describes a non-transitory computer readable medium having machine-executable instructions stored thereon, where the instructions, when executed by a processing unit of an apparatus, cause the apparatus to perform the method of any one of the preceding example aspects of the method.
  • FIG. 1 is a block diagram of a simplified example system that may be used to implement federated learning, in accordance with examples of the present disclosure
  • FIG. 2A is a block diagram of an example server that may be used to implement examples described herein;
  • FIG. 2B is a block diagram of an example client that may be used as part of examples described herein;
  • FIG. 3 is a block diagram illustrating an example implementation of a round of training in the federated learning system of FIG. 1;
  • FIG. 4 is a flowchart illustrating an example method for computing and assigning contribution scores to clients in a federated learning system, in accordance with examples of the present disclosure
  • FIG. 5 illustrates an example output of ranked contribution scores, in accordance with examples of the present disclosure
  • FIG. 6 shows example pseudocode representing operations of the federated system for quantifying client contributions, in accordance with examples of the present disclosure
  • FIG. 7 is a block diagram illustrating an example implementation of a vertical federated learning system.
  • FIG. 8 is a flowchart illustrating an example method for computing and assigning contribution scores to clients in a vertical federated learning system, in accordance with examples of the present disclosure.
  • model methods and systems for training a model related to a task (hereinafter referred to as “model” ) using federated learning. Examples are described to enable evaluation of the contribution of clients participating in federated learning. In particular, examples of the present disclosure enable the contribution of all participating clients to be fairly evaluated and quantified. Such evaluation may enable a central server to quantify the value of each participating client, for example to help select clients to participate in a round of training and/or to exclude clients from a round of training. Additionally, on the basis of this evaluation, clients may be assigned credits or other forms of incentive to promote participation in the federated learning system.
  • FIG. 1 is first discussed.
  • FIG. 1 illustrates an example system 100 that may be used to implement examples of federated learning, as disclosed herein.
  • the system 100 has been simplified in this example for ease of understanding; generally, there may be more entities and components in the system 100 than that shown in FIG. 1.
  • the system 100 includes a plurality of clients 102 (e.g., client-1 102 to client-N 102, where client-i 102 generically represents the ith client 102) , which may also be referred to as computing systems, client devices, data owners, users, user devices, terminals or nodes, for example. That is, the term “client” is not intended to limit implementation in a particular type of computing system or in a particular context.
  • Each client 102 communicates with a central server 110, which may also be referred to as a central node.
  • a client 102 may also communicate directly with another client 102.
  • Communications between a client 102 and the central server 110 may be via any suitable network 104 (e.g., the Internet, a P2P network, a wide area network (WAN) and/or a local area network (LAN) ) , and may include wireless or wired communications.
  • any suitable network 104 e.g., the Internet, a P2P network, a wide area network (WAN) and/or a local area network (LAN)
  • WAN wide area network
  • LAN local area network
  • Different clients 102 may use different networks to communicate with the central server 110, however only a single network 104 is illustrated for simplicity.
  • the central server 110 may be implemented using one or multiple servers.
  • the central server 110 may be implemented as a server, a server cluster, a distributed computing system, a virtual machine, or a container (also referred to as a docker container or a docker) running on an infrastructure of a datacenter, or infrastructure (e.g., virtual machines) provided as a service by a cloud service provider, among other possibilities.
  • the central server 110 may be implemented using any suitable combination of hardware and software, and may be embodied as a single physical apparatus (e.g., a server) or as a plurality of physical apparatuses (e.g., multiple machines sharing pooled resources such as in the case of a cloud service provider) .
  • the central server 110 may also generally be referred to as a computing system or processing system.
  • the central server 110 may implement techniques and methods to learn values of the learnable parameters of the global model, using federated learning techniques as disclosed herein.
  • Each client 102 may independently be an end user device, a server, a collection of servers, an edge device, a network device, a private network, or other singular or plural entity that stores a local dataset (which may be considered private data) and a local model.
  • the client 102 may be or may include such devices as a client device/terminal, user equipment/device (UE) , wireless transmit/receive unit (WTRU) , mobile station, fixed or mobile subscriber unit, cellular telephone, station (STA) , personal digital assistant (PDA) , smartphone, laptop, computer, tablet, wireless sensor, wearable device, smart device, machine type communications device, smart (or connected) vehicles, or consumer electronics device, among other possibilities.
  • UE user equipment/device
  • WTRU wireless transmit/receive unit
  • STA station
  • PDA personal digital assistant
  • the client 102 may be or may include a base station (BS) (e.g., eNodeB or gNodeB) , router, access point (AP) , personal basic service set (PBSS) coordinate point (PCP) , among other possibilities.
  • BS base station
  • AP access point
  • PBSS personal basic service set
  • PCP personal basic service set
  • the client 102 may be or may include a private network of an institute (e.g., a hospital or financial institute) , a retailer or retail platform, a company’s intranet, etc.
  • Each client 102 stores (or has access to) a respective local dataset (e.g., stored as data in the memory of the client 102, or accessible from a private database) .
  • the local dataset of each client 102 may be unique and distinctive from the local dataset of each other client 102.
  • the local dataset at a given client 102 may include private or proprietary data of that given client 102, which should not be accessible or identifiable by any other client 102 or the central server 110.
  • the local dataset may include local data that is collected or generated in the course of real-life use by user (s) of the client 102 (e.g., captured images/videos, captured sensor data, captured tracking data, etc. ) .
  • the local data included in the local dataset may include data that is collected from end user devices that are associated with or served by the network device.
  • a client 102 that is a BS may collect data from a plurality of user devices (e.g., tracking data, network usage data, traffic data, etc. ) and this may be stored in the local dataset on the BS.
  • FIG. 2A is a block diagram illustrating a simplified computing system that may be an example implementation of the central server 110.
  • Other example computing systems suitable for implementing embodiments described in the present disclosure may be used, which may include components different from those discussed below.
  • FIG. 2A shows a single instance of each component, there may be multiple instances of each component in the central server 110.
  • the central server 110 may include one or more processing devices 114, such as a processor, a microprocessor, a digital signal processor, an application-specific integrated circuit (ASIC) , a field-programmable gate array (FPGA) , a dedicated logic circuitry, a dedicated artificial intelligence processor unit, a tensor processing unit, a neural processing unit, a hardware accelerator, or combinations thereof.
  • processing devices 114 such as a processor, a microprocessor, a digital signal processor, an application-specific integrated circuit (ASIC) , a field-programmable gate array (FPGA) , a dedicated logic circuitry, a dedicated artificial intelligence processor unit, a tensor processing unit, a neural processing unit, a hardware accelerator, or combinations thereof.
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • Each processing device 114 may include one or more processing cores.
  • the central server 110 may include one or more network interfaces 122 for wired or wireless communication (e.g., with the network 104, the clients 102 or other entities of the system 100) .
  • the network interface (s) 122 may include wired links (e.g., Ethernet cable) and/or wireless links (e.g., one or more antennas) for intra-network and/or inter-network communications.
  • the central server 110 may also include one or more storage units 124, which may include a mass storage unit such as a solid state drive, a hard disk drive, a magnetic disk drive and/or an optical disk drive.
  • storage units 124 may include a mass storage unit such as a solid state drive, a hard disk drive, a magnetic disk drive and/or an optical disk drive.
  • the central server 110 may include one or more memories 128, which may include a volatile or non-volatile memory (e.g., a flash memory, a random access memory (RAM) , and/or a read-only memory (ROM) ) .
  • the non-transitory memory (ies) 128 may store processor executable instructions 129 for execution by the processing device (s) 114, such as to carry out examples described in the present disclosure.
  • the memory (ies) 128 may include other software stored as processor executable instructions 129, such as for implementing an operating system and other applications/functions.
  • the memory (ies) 128 may include processor executable instructions 129 for execution by the processing device 114 to implement a federated learning module 200 for performing methods related to federated learning, as discussed further below.
  • the central server 110 may additionally or alternatively execute instructions from an external memory (e.g., an external drive in wired or wireless communication with the server) or may be provided processor executable instructions by a transitory or non-transitory computer-readable medium.
  • Examples of non-transitory computer readable media include a RAM, a ROM, an erasable programmable ROM (EPROM) , an electrically erasable programmable ROM (EEPROM) , a flash memory, a CD-ROM, or other portable memory storage.
  • the memory (ies) 128 may also store a global model 126 trained to perform a task.
  • the global model 126 includes a plurality of learnable parameters 127 (also referred to as global parameters 127) , such as learned weights and biases of a neural network, whose values may be adjusted during the training process until the global model 126 converges on a set of global parameter values representing a solution to the task which the global model 126 is being trained to perform.
  • the global model 126 may also include other data, such as hyperparameters, which may be defined by an architect or designer of the global model 126 (or by an automatic process) prior to training, such as at the time the global model 126 is designed or initialized.
  • hyperparameters are parameters of a model that are used to control the learning process; hyperparameters are defined in contrast to learnable parameters, such as weights and biases of a neural network, whose values are adjusted during training.
  • FIG. 2B is a block diagram illustrating a simplified computing system that may be an example implementation of a client 102.
  • Other example computing systems suitable for implementing embodiments described in the present disclosure may be used, which may include components different from those discussed below.
  • FIG. 2B shows a single instance of each component, there may be multiple instances of each component in the client 102.
  • the client 102 may include one or more processing devices 130, one or more network interfaces 132, one or more storage units 134, and one or more non-transitory memories 138, which may each be implemented using any suitable technology such as those described in the context of the central server 110 above.
  • the memory (ies) 138 of the client 102 may store processor executable instructions 139 for execution by the processing device (s) 130, such as to carry out examples described in the present disclosure.
  • the memory (ies) 138 may include other software stored as processor executable instructions 139, such as for implementing an operating system and other applications/functions.
  • the memory (ies) 138 may include processor executable instructions 139 for execution by the processing device 130 to implement client-side operations of a federated learning system in conjunction with the federated learning module 200 executed by the central server 110, as discussed further below.
  • the memory (ies) 138 may also store a local model 136 trained to perform the same task as the global model 126 of the central server 110.
  • a local dataset 140 (comprising local data, which may be private to the client 102) is also stored in the memory (ies) 138.
  • the local dataset 140 may be stored in an external memory that is accessible by the client 102.
  • the local model 136 includes a plurality of learnable parameters 137 (also referred to as local parameters 137) , such as learned weights and biases of a neural network, whose values may be adjusted during a local training process based on the local dataset 140 until the local model 136 converges on a set of local learned parameter values representing a solution to the task which the local model 136 is being trained to perform.
  • learnable parameters 137 also referred to as local parameters 137
  • learned weights and biases of a neural network whose values may be adjusted during a local training process based on the local dataset 140 until the local model 136 converges on a set of local learned parameter values representing a solution to the task which the local model 136 is being trained to perform.
  • the local model 136 may also include other data, such as hyperparameters matching those of the global model 126 of the central server 110, such that the local model 136 has the same architecture and operational hyperparameters as the global model 126, and differs from the global model 126 only in the values of its local parameters 137 (i.e., the values of the local learnable parameters stored in the memory 138 after local training that are stored as the learned values of the local parameters 137) .
  • other data such as hyperparameters matching those of the global model 126 of the central server 110, such that the local model 136 has the same architecture and operational hyperparameters as the global model 126, and differs from the global model 126 only in the values of its local parameters 137 (i.e., the values of the local learnable parameters stored in the memory 138 after local training that are stored as the learned values of the local parameters 137) .
  • Federated learning is a machine learning technique that enables the clients 102 to participate in learning a model related to a task (e.g., a global model or a collaborative model) without having to share their local dataset with the central server 110 or with other clients 102.
  • a global model 126 is stored at the central server 110 and the values of the global parameters 127 of the global model 126 are learned via collaboration with the clients 102.
  • Each client 102 may use the global model 126 as a basis for their own local model 136 or may use the collaboratively learned global model 126 as-is (in which case the global model 126 is adopted as the local model 136 for that client 102) .
  • federated learning may help to ensure privacy of the local dataset 140 (which may contains privacy-sensitive information or proprietary information) while providing clients 102 with the benefits training using large amounts of data.
  • Federated learning may be characterized by certain features that differentiate federated learning from distributed optimization methods.
  • One differentiating feature is that the number of clients 102 (or nodes) that participate in federated learning is typically much higher than the number of clients (or nodes) that participate in distributed optimization (e.g., hundreds of clients 102 in federated learning compared to tens of clients in distributed optimization) .
  • Other differentiating features include a larger number of “straggler” clients 102 (i.e., clients 102 that are significantly slower to communicate with the central server 110 as compared to other clients 102) and a larger variation in the amount of local dataset 140 at each client 102 (e.g., differing by several orders of magnitude) compared to distributed optimization.
  • the local dataset 140 are typically non-IID (IID meaning “independent and identically-distributed” ) , meaning the local data of different clients 102 are unique and distinct from each other, and it may not be possible to infer the characteristics or a distribution of the local dataset 140 at any one client 102 based on the local dataset 140 of any other client 102.
  • the non-IID nature of the local dataset 140 means that many (or most) of the methods that have been developed for distributed optimization are ineffective in federated learning.
  • FIG. 3 illustrates an example of how a round of collaborative training may be carried out in the federated learning system 100.
  • the network 104 has been omitted from FIG. 3 and details are shown only for one client 102. It should be understood that each client 102 in the system 100 may maintain or have access to a respective local dataset 140 and may implement a respective local model 136 having local parameters 137.
  • the global model 126 may be any machine learning model implemented using any suitable neural network, such as a multilayer perceptron (MLP) , a convolutional neural network (CNN) , etc.
  • the global model 126 has global parameters 127 (e.g., the values of the weights in the neural network) denoted w.
  • the local datasets 140 stored or accessible by the N clients 102 may be denoted as local datasets D 1 , D 2 , ..., D i , ..., D N .
  • the local model 136 may be machine learning model having the same architecture as the global model 126, but implemented using different local parameters 137 (e.g., different values of the weights in the neural network) .
  • the local parameters 137 stored by each client 102 may be denoted as w i .
  • the goal of the central server 110 is to solve the following distributed optimization problem:
  • F (w) denotes the loss function at the central server 110 and F i (w) denotes the loss at the ith client 102.
  • the central server 110 trains the global model 126 over T rounds of training (e.g., until a termination condition is met, such as convergence of the global parameters 127 or a maximum number of rounds have been performed) .
  • a termination condition is met, such as convergence of the global parameters 127 or a maximum number of rounds have been performed.
  • the central server 110 broadcasts the latest values of the global parameters 127 (denoted as w t , where the t superscript denotes the tth round of training) to all the clients 102 (also referred to as the client computing systems of the data owners) .
  • the subscript i is used to indicate any arbitrary client 102 and i ⁇ ⁇ 1, 2, ..., N ⁇ unless otherwise indicated.
  • Each client 102 then performs local training of the respective local model 136, using the respective local dataset 140, and updates its respective local parameters 137. Mathematically, this may be expressed as:
  • ⁇ t is a hyperparameter and denotes the updated local parameters 137.
  • the central server 110 may select a subset of clients 102, denoted I t , to participate in training the global model 126 and may receive updates from the selected subset I t .
  • the received updates may be in the form of the updated local parameters 137. In other examples, the received updates may be in the form of gradients.
  • the clients 102 in the selected subset I t may differ from round to round, and the subset I t may be selected by the central server 110 using any method, for example using random selection with uniform probability.
  • the central server 110 may communicate the global parameters 127 only to the selected subset I t and may receive updates only from the selected subset I t .
  • the central server 110 may select all N clients 102 to participate in a round of training (e.g., if N is a relatively small number, such as in the range of 5 to 10) .
  • the central server 110 After receiving updates from the selected clients 102, the central server 110 aggregates the received updates to update the global parameters 127.
  • the central server 110 may use any suitable federated learning algorithm to aggregate the updates received from the clients 102 and update the global parameters 127.
  • An algorithm that may be used is commonly referred to as “FederatedAveraging” or FedAvg (e.g., as described by McMahan et al. “Communication-efficient learning of deep networks from decentralized data” AISTATS, 2017) , although it should be understood that the present disclosure is not limited to the FedAvg approach.
  • the central server 110 may aggregate the received updates using averaging:
  • the updated global parameters 127 may be transmitted to the clients 102 at the start of the next round of training.
  • Multiple rounds of training may take place (with possibly different clients 102 participating in each round of training) until a termination condition is met (e.g., a maximum number of rounds has been reached, or the global parameters have converged) .
  • the value of the global parameters 127 after the last round of training i.e., when the termination condition has been met
  • Each client 102 may use the trained global parameters 127 to update their own respective local parameters 137 and use the now trained local model 127 to perform inference.
  • the central server 110 may enable clients 102 to use their local datasets 140 to collaboratively train a global model 126 without having to explicitly share the data or data characteristics of their local datasets 140.
  • data privacy of all clients 102 can be preserved.
  • a challenge in federated learning is how to ensure fairness among all clients 102 participating in collaborative training and how to ensure quality of local datasets 140.
  • a malicious actor with no local data or spurious local data may pose as a client in order to gain the benefit of collaborative training with other clients 102 (who have good quality local datasets 140) , without any meaningful contribution from the malicious actor.
  • Such a malicious actor may use up resources (e.g., communication bandwidth) of the federated learning system 100 without contributing any benefit to the system 100 as a whole.
  • the central server 110 does not explicitly access any local dataset 140 and should not be able to derive the contents of any local dataset 140 from the updates shared by clients 102 (in order to preserve data privacy) , conventionally the central server 110 is not able to evaluate individual clients 102 based on their individual contribution to the collaborative learning. Thus, conventionally, the central server 110 may continue to select the malicious actor to participate in rounds of training and the malicious actor may continue to be a drain on the resources of the system 100.
  • a sophisticated client 102 with a rich and large local dataset may greatly contribute to training the global model 126.
  • the central server 110 is not able to identify the sophisticated client 102 as strongly contributing to the collaborative training and may select the sophisticated client 102 to participate in a round of training with equal probability as other clients 102 (or even the malicious actor) .
  • Such unfairness among contributions by different clients 102 may result in inefficient use of resources in the federated learning system 100, and may result in wasted rounds of training. Such unfairness may also result in clients 102 being discouraged from participation, which is also undesirable.
  • the present disclosure describes methods and systems that enable the central server 110 to fairly evaluate the quality of local data at different clients 102 based on their respective contribution to the federated learning process.
  • the central server 110 is able to compute a respective score for each client 102, which may be used to identify malicious actors or other clients 102 that have little or no contribution to the collaborative training.
  • the central server 110 may, based on the computed score, determine that low-scoring clients 102 should no longer participate in subsequent rounds of training, which may help to improve the efficiency of the overall federated learning system 100.
  • the computed score may be used by the central server 110 to identify high-scoring clients 102 that have positive contributions to the collaborative training, in order to select such clients 102 for subsequent rounds of training, for training future models and/or to reward such clients 102 (e.g., provide some monetary reward in order to encourage continued future participation from such clients 102) .
  • Shapley value originates from cooperative game theory where it can be used to distribute credit to each participant fairly.
  • the Shapley value of a given participant within a group of participants is defined as the average marginal contribution of the given participant to all the possible subsets of the remaining participants.
  • the Shapley value may be thought of as a way to measure the contribution of the given participant by comparing the outcome achieved by all possible subsets of participants when the given participant is absent with the outcome achieved by all possible subsets when the given participant is present.
  • Shapley value is a measure that satisfies requirements of fairness, such as balance, symmetry, zero element and additivity.
  • the Shapley value has many desired properties
  • computing the Shapley value in the context of federated learning requires exhaustively retraining and evaluating the global model with every possible subset of clients 102.
  • the number of clients 102 is very large (e.g., several hundred clients) , such that the amount of resources (e.g., communication costs, time requirements, etc. ) required to retrain and evaluate the global model with every possible subset of clients 102, in order to compute the Shapley value, becomes excessive and impractical.
  • federated Shapley value (FedSV)
  • the definition of FedSV is based on computation of the Shapley values for the clients 102 in each round of training and then summing the values over all the rounds at the end of training.
  • FedSV does not require retraining of the global model 126 for all possible subsets of clients 102.
  • a subset of clients 102 is selected to provide updates in each round of training (e.g., in order to reduce communication cost) .
  • the unselected clients 102 in a given round of training are assigned a score of zero for that round, which may cause unfairness.
  • a concept of fairness is that if two clients 102 have the same (or similar) local dataset 140 (and thus contribute equally to learning the global model) , they should have the same score.
  • two clients 102 with the same (or similar) local dataset 140 may receive different scores from the central server 110 due to the random selection of clients 102 in each round of training.
  • one client 102 may be randomly selected to participate in more rounds of training, thus receiving a higher total score, while the other client 102 may be randomly selected to participate in fewer rounds of training, thus receiver a lower total score, despite both clients 102 having the same (or similar) local dataset 140.
  • the present disclosure provides a solution to the problem of how to fairly evaluate the contribution of each client 102 in a federated learning system 100, while ensuring data privacy and without potential unfairness due to random selection of clients 102 in each round of training.
  • two clients 102 having the same or similar local datasets 140 will be assigned the same score by the central server 110, regardless of how the central server 110 selects clients 102 for each round of training.
  • the present disclosure describes how a fair utility matrix may be constructed, without requiring participation of all clients 102 in every round of training and without penalizing clients 102 that are not selected for a round of training.
  • the utility matrix is an approximately low-rank matrix that can be completed using existing matrix completion methods, based on the assumption that the contribution of each client 102 does not change significantly from round to round (this assumption may generally hold true when the local models 140 are Lipschitz continuous and smooth, which is typical for most existing machine learning based models) .
  • utility information can be obtained for all clients 102, to enable computation of a contribution score based on the Shapley value (also referred to as a completed federated Shapley value (ComFedSV) ) for each individual client 102.
  • Shapley value also referred to as a completed federated Shapley value (ComFedSV)
  • the contribution score satisfies the characteristics of the Shapley value, including being fair for all clients 102, regardless of whether or not a client 102 is selected for a round of training.
  • a Monte-Carlo type subsampling technique may be used to reduce both space and time complexity when the size of the utility matrix is very large (e.g., when there are hundreds of clients 102) .
  • the present disclosure describes a utility matrix, which is a matrix containing utility information representing the utility of all possible subsets of clients 102 in each and every round of training.
  • the utility matrix may be denoted U, where (i.e., the utility matrix contains real values and has dimension T x 2 N ) and where T denotes the number of rounds of training and N denotes the total number of clients 102.
  • the (t, S) entry of U i.e., the entry in the tth row and Sth column of U) is equal to the utility computed for a subset S of clients 102 in the tth round of training.
  • Utility is quantified by a utility function, which is defined herein as a quantification of the progress made in training the global model 126 between the start and the end of a round of training. More specifically, the utility function is defined as the change in the test loss computed at the central server 110 using updates from the selected subset S of clients 102 in the tth round of training. The test loss is the loss function computed by the central server 110 when the global model 126 (using the values of the global parameters 127 in the tth round of training) is applied to a test dataset.
  • the utility function may be defined as:
  • the utility matrix U is the matrix that stores all values of U t (S) for all t and all S (i.e., stores the utility function values computed for each and every round of training and for each and every possible subset of clients 102) .
  • the utility matrix U can be expected to be very large (e.g., a typical federated learning scenario involves hundreds of rounds of training with hundreds of clients 102) . Further, because not all possible subsets of clients 102 are selected for every round of training, the central server 110 is not able to compute the utility function for all possible subsets for every round of training. Accordingly, the utility matrix U that can be constructed during rounds of training is initially sparse (i.e., missing some entries) . The following discussion further describes how the utility matrix U may be constructed by the central server 110 and used to assign contribution scores to each client 102.
  • FIG. 4 is a flowchart showing an example method 400, which may be performed by the central server 110, to assign a contribution score to each client 102 in the federated learning system 100.
  • the method 400 may be executed by a processing device 114 of the central server 110 executing instructions stored in the memory 128, for example.
  • Initialization may include initializing the utility matrix with all zero entries and may also include initializing the global parameters 127 (e.g., initialized with random values) .
  • This initialization may, in some examples, be omitted from the method 400 (e.g., initialization may be performed ahead of time; the utility matrix may already be populated with utility information from a prior instance of the method 400; or the global parameters 127 may already have pre-trained values) .
  • each round of federated training involves the central server 110 communicating the latest global parameters 127 to all clients 102 and receiving updates from selected clients 102 (or in some cases receiving updates from all clients 102) .
  • Step 404 may be performed for all rounds of training (i.e., until the termination condition is satisfied and training terminates) before the method 400 proceeds to step 412.
  • step 404 may be performed for only some rounds of training (i.e., without the termination condition being met) , for example for only 100 rounds of training instead of several hundred rounds of training.
  • the method 400 may proceed to steps 412 and 414 to compute interim contribution scores for the clients 102 before the method 400 returns to step 404 to continue more rounds of training.
  • the interim contribution scores may be expected to be a relatively accurate reflection of the contribution of individual clients 102 to federated learning system 100 so long as the utility function has been computed at least a minimum subsample of client subsets (e.g., according to the Monte Carlo method, discussed further below) .
  • the interim contribution scores may be useful for the central server 110 to identify any low-scoring client (s) 102 that should be excluded from further participation in the rounds of training, as discussed further below.
  • the sparse utility matrix is constructed over the round (s) of training by performing steps 408 and 410.
  • Optional step 406 may be performed to identify a subsample of client subsets for which the utility function is computed (and used to construct the sparse utility matrix) .
  • Optional step 406 may be performed in order to help reduce the amount of computation required to construct the sparse utility matrix. Based on the Monte-Carlo method, it may be sufficient to compute the utility function for only a subsample of possible subsets. It can be proven that, using the Monte-Carlo method, a subsample of size M (where M is an integer of the order Nlog (N) , N being the number of clients 102) is sufficient to achieve a good approximation of the distribution of the utility functions for all possible subsets of clients 102.
  • a subsample of subsets can be identified by randomly identifying M subsets from among the 2 N possible subsets. Then the following steps 408 and 410 may only need to be performed if the clients 102 selected for a round of training includes a subset that is part of the identified subsample.
  • step 406 may be omitted. For example, if the number of clients N is not overly large (e.g., only 5 to 10 clients, which may be the case in some industrial applications) the Monte-Carlo subsampling may not significantly reduce the amount of computations.
  • utility function values are computed using updates received from the clients 102 (denoted I t ) that has been selected in the tth round of training.
  • Computing values of the utility function includes the central server 110 receiving, from the selected clients 102 I t , updates (e.g., in the form of locally updated local parameters 137) which the central server 110 aggregates into an aggregated update.
  • the central server 110 updates the global parameters 127 using the aggregated update, as described above.
  • the central server 110 computes a subset update by aggregating updates from a subset of clients 102 (where the subset is within the selected clients I t ) .
  • the central server 110 then computes a first test loss using the global model 126 having global parameters 127 prior to applying the subset update and a second test loss using the global model 126 having global parameters 127 after applying the subset update.
  • the difference between the first test loss and second test loss is the utility function value for the subset S in the tth round.
  • the formal expression of this utility function is provided above.
  • the computed utility function value is stored as the (t,S) entry in the utility matrix. This is performed for all subsets S included in the selected clients 102 I t .
  • the selected clients 102 can be denoted as I 5 and the subsets ⁇ 1 ⁇ , ⁇ 3 ⁇ , ⁇ 4 ⁇ , ⁇ 1, 3 ⁇ , ⁇ 1, 4 ⁇ , ⁇ 3, 4 ⁇ and ⁇ 1, 3, 4 ⁇ are included in the selected clients I 5 .
  • the utility function values for these subsets are computed by the central server 110 and stored the 5th row of the utility matrix in columns corresponding to these subsets.
  • the central server 110 may conduct another round of training.
  • the steps 408 and 410 may be repeated for each round of training within step 404. If optional step 406 is performed, the steps 408 and 410 may only be performed if the clients 102 selected for a round of training within step 404 includes a subset that is part of the identified subsample.
  • the utility function values computed at step 408 have been stored in the utility matrix (at step 410) .
  • the utility matrix thus stores utility function values computed for subsets of clients that are included in the selected clients over each of the rounds of training. Entries in the utility matrix may be indexed by the round of training (t) and by the particular subset of clients (S) .
  • the utility matrix resulting from the utility function values computed over the rounds of training is a sparse matrix. This means that the utility matrix has some entries missing. This is expected because not all possible subsets of clients 102 would be expected to be included among the clients selected over all rounds of training.
  • the utility matrix is completed. It can be shown that the utility matrix is approximately low-rank.
  • a low-rank matrix is a matrix that has few linearly independent columns among the total of all columns.
  • a low-rank matrix can be approximated as an inner product of two decomposition matrices.
  • Existing matrix completion techniques can be used to compute the decomposition matrices (by solving a well-known minimization problem) , which can be used to compute the missing entries of the utility matrix. It is expected that the utility matrix is approximately low-rank because it is expected that there will be some clients 102 having similar local datasets 140 and hence similar utility, resulting in similarity between columns of the utility matrix. As well, it is expected that changes to the local parameters 137 at an individual client 102 would be gradual, such that the utility of that client 102 should be similar between successive rounds of training, resulting in similarity between adjacent rows of the utility matrix.
  • the matrix completion problem may be formally expressed as follows:
  • W and H are decomposition matrices of the completed utility matrix (i.e., the completed utility matrix can be obtained by taking the inner product of W and H) , and the notation denotes the square of the Frobenius norm.
  • the above matrix completion problem can be solved using existing matrix completion solvers such as the Python packages LRIPy (which stands for Low-Rank Inducing Norms in Python) or LIBPMF (which stands for Library for large-scale Parallel Matrix Factorization) .
  • LRIPy which stands for Low-Rank Inducing Norms in Python
  • LIBPMF which stands for Library for large-scale Parallel Matrix Factorization
  • the contribution score of each client 102 is computed using the completed utility matrix.
  • the completed utility matrix can be used in the computation of the Shapley value, without having to assign zero values to any client 102 that has not been selected for a round of training. This is because the completed utility matrix contains utility function values for all possible subsets of clients 102 over all rounds of training (some of the utility function values are computed directly by the central server 110 at step 408 and others of the utility function values are computed by completing the utility matrix at step 412) .
  • the Shapley value-based contribution score (also referred to herein as the completed federated Shapley value, or ComFedSV) for each client-i 102 (where i ⁇ ⁇ 1, 2, ..., N ⁇ ) can be defined as follows:
  • W t and H s are the tth and Sth row vectors of the decomposition matrices W and H, respectively
  • T denotes the number of rounds of training
  • s i denotes the contribution score assigned to the ith client 102
  • C is choose notation.
  • the central server 110 may identify one or more low-scoring clients 102 (e.g., any clients 102 having a contribution score below a predefined threshold) and may exclude such low-scoring clients 102 from subsequent rounds of training. For example, if the contribution score computed at 414 is an interim score (i.e., training has not yet terminated) , the interim score may be used to identify any clients 102 that are not positively contributing to the collaborative training and may exclude such clients 102 from the remaining rounds of training. The method 400 may, after excluding any low-scoring clients 102, return to step 404 to continue the federated training.
  • the contribution score computed at 414 is an interim score (i.e., training has not yet terminated)
  • the interim score may be used to identify any clients 102 that are not positively contributing to the collaborative training and may exclude such clients 102 from the remaining rounds of training.
  • the method 400 may, after excluding any low-scoring clients 102, return to step 404 to continue the federated training.
  • step 416 may be used to exclude low-scoring clients 102 from future participation in federated learning.
  • the central server 110 may store the identity (e.g., unique device address or device identifier) of any low-scoring clients 102 so that such clients 102 can be excluded from participation if the global parameters 127 need to be retrained in the future.
  • the central server 110 may credit or rank all the clients 102 based on their respective contribution scores.
  • the central server 110 may provide (or credit) each client 102 with resources (e.g., computing resources, communication bandwidth, monetary resources, etc. ) proportionate to their respective contribution score such that higher-scoring clients 102 receive more resources than lower-scoring clients 102.
  • resources e.g., computing resources, communication bandwidth, monetary resources, etc.
  • the method 400 may help to promote further participation in the federated learning system 100 by enabling good contributors (i.e., higher-scoring clients 102) to more efficiently participate in the federated learning system 100 (by providing such higher-scoring clients 102 with more computing resources, communication bandwidth, etc. ) . This may result in overall improved use of resources in the federated learning system 100.
  • Ranking clients 102 based on their contribution scores may also enable the central server 110 to more easily identify which clients 102 should be invited to participate in future federated training.
  • the contribution score computed at 414 is an interim score (i.e., training has not yet terminated)
  • the interim score may be used to credit or rank clients 102 on an interim basis. This may allow the central server 110 to credit clients 102 prior to completion of training. This may also enable the central server 110 to identify higher-scoring clients 102 that may be preferentially selected for subsequent rounds of training.
  • the method 400 may, after crediting or ranking clients based on contribution scores, return to step 404 to continue the federated training.
  • the interim contribution score or the final contribution score may be outputted to all clients 102.
  • the interim contribution score or final contribution score may be outputted after normalization and anonymization, to enable the clients 102 to know their relative contribution to the federated learning system 100 while maintaining privacy. This may help to improve transparency and trust between the clients 102 and the central server 110.
  • FIG. 5 is a table 500 illustrating an example of how the contribution score may be outputted to be viewed by each client 102.
  • the table 500 (also referred to as a scoreboard) ranks all clients 102 (from 1 to N) according to descending contribution score, which has been normalized to a maximum of 1.00.
  • Each client 102 in the table 500 may be identified using an anonymized ID that is known only to each client 102.
  • FIG. 6 illustrates an example pseudocode 600 representing instructions that may be executed by the central server 110 and clients 102 to implement an example of the method 400.
  • the pseudocode 600 represents the operation of the overall federated learning system 100 and may not reflect the operation of any individual client 102 or central server 110 alone. Further, although the following discussion explains the pseudocode 600 in the context of the method 400, it should be understood that the method 400 may be implemented using different sets of instructions.
  • the pseudocode 600 includes code 602 for initialization (e.g., at step 402 of the method 400) , and code 604 for identifying a subsample of subsets (e.g., at step 406 of the method 400) .
  • Code 606 represents operations at each client 102 to update the local parameters 137.
  • Code 608 represents operations at the central server 110 to ensure that all clients 102 are selected for at least one round of training (in this example, the first round of training) .
  • Code 610 represents operations at the central server 110 to aggregate updates received from selected clients 102 and update the global parameters 127.
  • Code 612 represents operations by the central server 110 to compute the utility value functions for the subsamples (identified at code 604) and to store the computed utility value functions in the utility matrix (e.g., at steps 408 and 410 of the method 400) .
  • Code 614 represents operations by the central server 110 to complete the utility matrix (e.g., at step 412)
  • code 616 represents operations by the central server 110 to compute the Shapley value-based contribution score of each client 102 using the completed utility matrix (e.g., at step 414)
  • code 618 represents operations by the central server 110 to output the contribution scores.
  • examples of the present disclosure may also be used to quantify clients’ contributions in the context of vertical federated learning (also referred to as heterogeneous federated learning) .
  • the overall federated system 100 of FIG. 1 may be used for horizontal federated learning or vertical federated learning, although the details of implementation may be different.
  • the difference between horizontal federated learning and vertical federated learning lies in the local datasets 140 and local models 136 at respective clients 102.
  • the local datasets 140 at the different clients 102 have the same feature space but different sample space and can be processed by the same model architecture (although each client may have its own local model 136 with different values for the local parameters 137, the local models 136 all share the same architecture, including expected inputs and outputs, as the global model 126) .
  • An example of horizontal federated learning may be different banking institutions (which all have local models operating in the same feature space, for example related to customers’ loans) in different geographic locations (and therefore different customer sample spaces) that collaborate together to improve their local models, while maintaining the privacy of their customer data.
  • different clients 102 have respective local datasets 140 with the same sample space but different feature spaces.
  • the local models 136 of the clients 102 are embedding models, which generate embeddings from the features of the respective local datasets 140. Since the local datasets 140 store different features, the local embedding models generate different embeddings and thus may have different model architectures.
  • the clients 102 collaborate by sharing the locally generated embeddings with the central server 110, which aggregates the embeddings in order to learn the global parameters 127 of a global prediction model 126.
  • An example of vertical federated learning may be a banking institution and a retailer in the same geographic location (and therefore the same customer sample space) .
  • the banking institution may have a local model that generates embeddings from a feature space related to customers’ loans, while the retailer may have a local model that generates embeddings from a different feature space related to customers’ purchases.
  • the banking institution and retailer may wish to collaborate together to collaboratively learn a prediction model (e.g., to predict customer classification) , while maintaining the privacy of their customer data.
  • FIG. 7 illustrates an example of how a round of collaborative training may be carried out in a vertical federated learning system 150.
  • the network 104 has been omitted from FIG. 7 and details are shown only for one client 102.
  • the goal of vertical federated learning is for M clients 102 to collaboratively train a prediction model using a set of N aligned data samples.
  • the set of aligned data samples refers to the data samples contained in the local datasets 140 that correspond to the same set of common identifiers (e.g., customer identifiers) . That is, the local datasets 140 store different features corresponding to the same data source (e.g., store different features about the same group of customers) .
  • the feature vectors are distributed across the M clients 102 such that each client 102 stores a portion of the features. Formally, this may be expressed as:
  • d m denotes the feature dimension stored at the mth client 102 (such that )
  • [M] denotes the set of all M clients 102.
  • Each client 102 has a respective local dataset 140, which may be denoted for the mth client, and where [N] denotes the set of data sample indices.
  • Each client 102 processes the respective local dataset 140 using the respective local model 136 (which has local parameters 137, denoted ⁇ m ) to generate a respective set of embeddings, denoted
  • the labels for the feature vectors are stored by the central server 110 in a global dataset 142.
  • the goal of the central server 110 is to train a predictive global model 126 such that the global model 126 processes the embeddings from the clients 102 to accurately predict the label y i for a given x i .
  • a round of training starts with the central server 110 selecting a batch of data indices from among the N possible data indices.
  • Each client 102 processes the selected batch from the respective local dataset 140 using the respective local model 126 to generate respective local embeddings. This may be expressed as:
  • the embeddings are aggregated (e.g., summed) and the global loss is computed by processing the aggregated embeddings using the global model 126 and comparing the predicted label with the ground-truth label in the global dataset 142.
  • the gradient of the global loss (referred to as the global gradient) is computed and communicated back to each client 102.
  • Each client 102 uses the received global gradient to update the respective local parameters 137.
  • the training process described above may be referred to as synchronous vertical federated learning, because all clients 102 communicate their local embeddings to the central server 110 at the same (or approximately the same) time in each round of training.
  • Asynchronous vertical federated learning differs in that each client 102 communicates their local embeddings at their own timing (i.e., without the central server 110 coordinating all clients 102 to start each round of training) .
  • the central server 110 may aggregate embeddings received over a defined time period and communicate global gradients to the clients 102 periodically.
  • the Shapley value may be used to quantify contribution of each client 102 in vertical federated learning, similar to horizontal federated learning, in both synchronous and asynchronous scenarios.
  • the contribution score is computed based on the Shapley value using the utility function, which as mentioned above is defined herein as a quantification of the progress made in training the global model 126 between the start and the end of a round of training –and more generally between the start and the end of a defined time period –based on the difference in the loss computed at the start and at the end of the time period.
  • the difference between the definition of the utility function in horizontal federated learning compared to vertical federated learning lies in how the loss is defined.
  • the loss function is based on how successful the global model 126 is in predicting the label for a feature vector, given the embeddings generated by the clients 102.
  • the present disclosure presents a utility function, adapted for vertical federated learning, as follows:
  • t may be used to denote the time-point rather than the round of training (where a round of training may be defined as the time period [t-1, t] ) .
  • the utility function requires that the loss be computed based on embeddings generated over all N data points.
  • the Shapley value-based contribution score is computed for each client as follows:
  • T denotes a predefined time period (or a number of rounds of training) and S is a subset of clients.
  • an embedding matrix is defined for each mth client, which contains all embeddings for all data samples over all time points.
  • the (t, i) entry in is defined as
  • FIG. 8 is a flowchart showing an example method 800, which may be performed by the central server 110, to assign a contribution score to each client 102 in the vertical federated learning system 150.
  • the method 800 may be executed by a processing device 114 of the central server 110 executing instructions stored in the memory 128, for example. It should be understood that the method 800 is an adaptation of the previously-discussed method 400, hence certain details need not be repeated here.
  • the central server 110 may perform initialization similar to step 402 of the method 400 described above.
  • the central server 110 may initialize the global parameters 127 as well as the embedding matrix.
  • the central server 110 conducts federated training over multiple rounds of training (in the case of synchronous vertical federated learning) or over a series of time points (in the case of asynchronous vertical federated learning) .
  • the central server 110 receives the local embeddings from each client 102, and stores them in the embedding matrix at 810.
  • the local embeddings correspond only to the batch indices selected by the central server 110.
  • the embedding matrix is a sparse matrix with missing entries.
  • the central server 110 After training is complete, or after a predefined number of rounds (or a predefined time period) , the central server 110 computes a completed embedding matrix at 812.
  • the embedding matrix is an approximately low-rank matrix and can be completed by solving a matrix completion problem (similar to step 412 of the method 400) .
  • the matrix completion problem may be expressed as follows:
  • This matrix completion problem may be solved using existing matrix completion solvers, as described above.
  • the contribution score of each client 102 may then be computed using the completed entries of the embedding matrix.
  • the embedding values stored in the completed embedding matrix may be used to compute the utility function values (as defined above) and the computed utility function values may be used to compute the contribution scores (as defined above) for all the clients 102.
  • the Monte Carlo method may be used to identify a subsample of client subsets. Then the contribution scores may be computed based on the utility function values computed based on the identified subsample of subsets, as described previously in the context of horizontal federated learning.
  • matrix completion (at step 812) may not be performed and the contribution score computed at step 814 may be performed using the embedding matrix without having performed matrix completion.
  • the central server 110 stores the most recently received local embeddings from each client 102 (regardless of when the local embeddings were received) .
  • the local embeddings may not be received at the same time, this is accounted for by the defining of the Shapley value-based contribution score (which is defined based on a time period, not a specific round of training) .
  • the contribution score may be higher for a client 102 that updates its local embeddings more frequently, thus the contribution score may also reflect not only the quality of the contribution but whether contributions are made frequently.
  • the central server 110 may identify any low-scoring clients 102 (e.g., any clients 102 having a contribution score below a predefined threshold) and exclude such low-scoring clients 102 from future collaborative training.
  • any low-scoring clients 102 e.g., any clients 102 having a contribution score below a predefined threshold
  • the central server 110 may credit or rank all the clients 102 based on their respective contribution scores.
  • the central server 110 may credit each client 102 with resources (e.g., computing resources, communication bandwidth, monetary resources, etc. ) proportionate to their respective contribution score such that higher-scoring clients 102 receive more credit than lower-scoring clients 102.
  • resources e.g., computing resources, communication bandwidth, monetary resources, etc.
  • the present disclosure has described methods and systems for fairly quantifying the contribution of all clients in a federated learning system, including horizontal federated learning systems as well as vertical federated learning systems (synchronous or asynchronous) .
  • the present disclosure enables fairness in scoring the contribution from clients. In horizontal federated learning, this means that clients having similar local datasets (and hence similar utility) should have similar contribution scores (not dependent on random selection of clients in each round of training) . In vertical federated learning, fairness means that clients having similar utility should have similar contribution scores (not dependent on random selection of batch indices) .
  • the disclosed methods and systems also enable evaluation of clients’ contribution without requiring access to the clients’ local datasets or the clients’ local models. This helps to preserve data privacy, which is an important aspect of federated learning.
  • the Monte-Carlo subsampling method may be used to reduce the amount of computation required to compute the contribution scores of all clients. This may help to improve efficiency, particularly when there is a large number of clients (e.g., several thousands or millions of participating clients) .
  • the present disclosure is described, at least in part, in terms of methods, a person of ordinary skill in the art will understand that the present disclosure is also directed to the various components for performing at least some of the aspects and features of the described methods, be it by way of hardware components, software or any combination of the two. Accordingly, the technical solution of the present disclosure may be embodied in the form of a software product.
  • a suitable software product may be stored in a pre-recorded storage device or other similar non-volatile or non-transitory computer readable medium, including DVDs, CD-ROMs, USB flash disk, a removable hard disk, or other storage media, for example.
  • the software product includes instructions tangibly stored thereon that enable a processing device (e.g., a personal computer, a server, or a network device) to execute examples of the methods disclosed herein.
  • a processing device e.g., a personal computer, a server, or a network device
  • the machine-executable instructions may be in the form of code sequences, configuration information, or other data, which, when executed, cause a machine (e.g., a processor or other processing device) to perform steps in a method according to examples of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
PCT/CN2022/117577 2021-09-08 2022-09-07 Methods and systems for quantifying client contribution in federated learning WO2023036184A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202280060951.0A CN117999562A (zh) 2021-09-08 2022-09-07 用于量化联邦学习中的客户端贡献的方法和系统

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163242015P 2021-09-08 2021-09-08
US63/242,015 2021-09-08

Publications (1)

Publication Number Publication Date
WO2023036184A1 true WO2023036184A1 (en) 2023-03-16

Family

ID=85506092

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/117577 WO2023036184A1 (en) 2021-09-08 2022-09-07 Methods and systems for quantifying client contribution in federated learning

Country Status (2)

Country Link
CN (1) CN117999562A (zh)
WO (1) WO2023036184A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116205313A (zh) * 2023-04-27 2023-06-02 数字浙江技术运营有限公司 联邦学习参与方的选择方法、装置及电子设备
CN117521783A (zh) * 2023-11-23 2024-02-06 北京天融信网络安全技术有限公司 联邦机器学习方法、装置、存储介质及处理器
CN117557870A (zh) * 2024-01-08 2024-02-13 之江实验室 基于联邦学习客户端选择的分类模型训练方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553484A (zh) * 2020-04-30 2020-08-18 同盾控股有限公司 联邦学习的方法、装置及系统
CN112926897A (zh) * 2021-04-12 2021-06-08 平安科技(深圳)有限公司 基于联邦学习的客户端贡献计算方法和装置
CN113011587A (zh) * 2021-03-24 2021-06-22 支付宝(杭州)信息技术有限公司 一种隐私保护的模型训练方法和系统
CN113191484A (zh) * 2021-04-25 2021-07-30 清华大学 基于深度强化学习的联邦学习客户端智能选取方法及系统
US20210241167A1 (en) * 2020-02-03 2021-08-05 Microsoft Technology Licensing, Llc Computing system for training, deploying, executing, and updating machine learning models
CN113222179A (zh) * 2021-03-18 2021-08-06 北京邮电大学 一种基于模型稀疏化与权重量化的联邦学习模型压缩方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210241167A1 (en) * 2020-02-03 2021-08-05 Microsoft Technology Licensing, Llc Computing system for training, deploying, executing, and updating machine learning models
CN111553484A (zh) * 2020-04-30 2020-08-18 同盾控股有限公司 联邦学习的方法、装置及系统
CN113222179A (zh) * 2021-03-18 2021-08-06 北京邮电大学 一种基于模型稀疏化与权重量化的联邦学习模型压缩方法
CN113011587A (zh) * 2021-03-24 2021-06-22 支付宝(杭州)信息技术有限公司 一种隐私保护的模型训练方法和系统
CN112926897A (zh) * 2021-04-12 2021-06-08 平安科技(深圳)有限公司 基于联邦学习的客户端贡献计算方法和装置
CN113191484A (zh) * 2021-04-25 2021-07-30 清华大学 基于深度强化学习的联邦学习客户端智能选取方法及系统

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116205313A (zh) * 2023-04-27 2023-06-02 数字浙江技术运营有限公司 联邦学习参与方的选择方法、装置及电子设备
CN116205313B (zh) * 2023-04-27 2023-08-11 数字浙江技术运营有限公司 联邦学习参与方的选择方法、装置及电子设备
CN117521783A (zh) * 2023-11-23 2024-02-06 北京天融信网络安全技术有限公司 联邦机器学习方法、装置、存储介质及处理器
CN117557870A (zh) * 2024-01-08 2024-02-13 之江实验室 基于联邦学习客户端选择的分类模型训练方法及系统
CN117557870B (zh) * 2024-01-08 2024-04-23 之江实验室 基于联邦学习客户端选择的分类模型训练方法及系统

Also Published As

Publication number Publication date
CN117999562A (zh) 2024-05-07

Similar Documents

Publication Publication Date Title
WO2023036184A1 (en) Methods and systems for quantifying client contribution in federated learning
Dong et al. Federated class-incremental learning
Zhu et al. Federated learning on non-IID data: A survey
WO2021233030A1 (en) Methods and apparatuses for federated learning
US11715044B2 (en) Methods and systems for horizontal federated learning using non-IID data
WO2022063151A1 (en) Method and system for relation learning by multi-hop attention graph neural network
CN107688605B (zh) 跨平台数据匹配方法、装置、计算机设备和存储介质
CN110503531A (zh) 时序感知的动态社交场景推荐方法
WO2015103964A1 (en) Method, apparatus, and device for determining target user
CN110222838B (zh) 文档排序方法、装置、电子设备及存储介质
US20230006980A1 (en) Systems and methods for securely training a decision tree
CN112799708A (zh) 联合更新业务模型的方法及系统
WO2023061500A1 (en) Methods and systems for updating parameters of a parameterized optimization algorithm in federated learning
Arafeh et al. Data independent warmup scheme for non-IID federated learning
Long et al. Fedsiam: Towards adaptive federated semi-supervised learning
Mehrizi et al. A Bayesian Poisson–Gaussian process model for popularity learning in edge-caching networks
Qi et al. Graph neural bandits
Marnissi et al. Client selection in federated learning based on gradients importance
CN117216382A (zh) 一种交互处理的方法、模型训练的方法以及相关装置
Singhal et al. Greedy Shapley Client Selection for Communication-Efficient Federated Learning
CN115879564A (zh) 用于联合学习的自适应聚合
Nawaz et al. K-DUMBs IoRT: Knowledge Driven Unified Model Block Sharing in the Internet of Robotic Things
Houdou et al. NIFL: A statistical measures-based method for client selection in federated learning
CN111079003A (zh) 一种社交圈为关键支撑的潜在偏好关联预测模型的技术方案
WO2024031564A1 (en) Methods and systems for federated learning with local predictors

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22866647

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202280060951.0

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE