CN117131951A - Federal learning method and electronic equipment - Google Patents

Federal learning method and electronic equipment Download PDF

Info

Publication number
CN117131951A
CN117131951A CN202310165522.6A CN202310165522A CN117131951A CN 117131951 A CN117131951 A CN 117131951A CN 202310165522 A CN202310165522 A CN 202310165522A CN 117131951 A CN117131951 A CN 117131951A
Authority
CN
China
Prior art keywords
model
local
round
electronic device
models
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310165522.6A
Other languages
Chinese (zh)
Inventor
苏新铎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Honor Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honor Device Co Ltd filed Critical Honor Device Co Ltd
Priority to CN202310165522.6A priority Critical patent/CN117131951A/en
Publication of CN117131951A publication Critical patent/CN117131951A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a federal learning method and electronic equipment, and relates to an artificial intelligence technology. According to the federal learning method, a central party determines a first model for aggregation operation through a local model uploaded by a participant; the method comprises the steps of obtaining the influence degree of each first model on the aggregation operation of the current round, determining the first aggregation weight of each first model, and aggregating each first model based on the first aggregation weight of each first model to obtain the global model of the round. By adopting the method of the application, the weight of the model for aggregation can be dynamically adjusted according to the influence degree of the local model of each participant on the aggregation operation of the round, so as to reduce the influence of the model with poor effect on the global model of the round aggregation, improve the performance of the global model determined by each round, and simultaneously not increase the cost of the central party.

Description

Federal learning method and electronic equipment
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a federal learning method and electronic equipment.
Background
Federal learning (Federated Learning) is a distributed machine learning technique, and is implemented by performing distributed model training among a plurality of data sources having local data, and constructing a global model based on virtual fusion data only by exchanging model parameters or intermediate results on the premise of not exchanging local individual or sample data, thereby realizing balance of data privacy protection and data sharing calculation. Lateral federal learning (Sample-Partitioned Federated Learning or Sample-Partitioned Federated Learning) is one type of federal learning, and includes a client (i.e., a participant) and a server (i.e., a central party), and can be applied to scenarios where data sets of respective participants of federal learning have the same feature space and different Sample spaces. The client trains the global model issued by the server by utilizing the local training data, and returns the trained local model to the center. And (3) performing aggregation operation on each local model by a central party to obtain a new global model until a preset training termination condition is met, and obtaining a federal learning model after training is completed.
However, when the central party performs aggregation operation on the local models sent by the participants, if the local models are aggregated by adopting the same weight, the weights of the models with good performance and the models with poor performance are the same during aggregation, which results in poor performance of the global model obtained through training; if the central party performs the aggregation operation on the local model by adopting a knowledge distillation mode, the computational complexity of the central party is increased, and the cost of the central party is increased sharply.
Disclosure of Invention
In order to solve the technical problems, the application provides a federal learning method and electronic equipment, which can dynamically adjust the weight of a model for aggregation according to the influence degree of a local model of each participant on the aggregation operation of the round, so as to reduce the influence of a model with poor effect on the global model of the round aggregation, improve the performance of the global model determined by each round, and simultaneously not increase the cost of a central party.
In a first aspect, the present application provides a method of federal learning, for use with a first electronic device communicatively coupled to at least two second electronic devices, the method comprising: issuing global model information of the ith round of federation learning to each second electronic device so as to enable each second electronic device to perform local model training based on the global model information of the ith round, and uploading the local model information of the ith round to the first electronic device, wherein if i is an integer greater than 1, the global model information comprises global model or gradient information of the global model, and if i=1, the global model information is a preset initial model, and the local model information comprises local model or gradient information of the local model; determining a first model for the (i+1) -th round of aggregation operation according to each piece of uploaded local model information; acquiring a first influence estimation value of each first model, wherein the first influence estimation value of the first model is used for evaluating the influence of the first model on the aggregation operation of the (i+1) th round; determining a first aggregate weight for each first model based on the first impact estimate for each first model; aggregating each first model according to the first aggregation weight of each first model to obtain a global model of the (i+1) th round; and according to the global model of the (i+1) th round, downloading the global model information of the (i+1) th round to each second electronic device to perform federal learning of the next round until a preset training termination condition is met, and obtaining a trained federal learning model.
In this way, in the federal learning process, the first electronic device (i.e. the central party) determines the first model of the ith+1 round according to the local model information uploaded by each second electronic device (i.e. the participator), and dynamically determines the first aggregation weight of each first model in the ith+1 round, and since the first aggregation weights of each first model are not all the same, the contribution degree of the first model with good performance and the first model with poor performance in aggregation is consistent, and the influence degree of the model with poor performance on aggregation is avoided. Meanwhile, in the example, the first electronic device adjusts the first aggregation weight of each first model based on the influence degree of each first model on the aggregation operation of the (i+1) th round, so that the first aggregation weight of each first model is matched with the influence degree of the first model on the aggregation operation of the (i+1) th round, the performance of the global model of the (i+1) th round is improved (such as more accurate model prediction), the iteration times of the global model can be shortened, a knowledge distillation mode is not needed in the mode, the calculation amount of the central party is not increased, and the increase of the cost of the central party is avoided.
According to a first aspect, obtaining a first impact estimate for each first model comprises: aggregating each first model to generate a first reference model of the (i+1) th round; respectively obtaining second models corresponding to the first models, wherein the second model corresponding to the j-th first model is a model obtained by aggregating the first models except the j-th first model, j is an integer greater than or equal to 1, and the maximum value of j is equal to the number of the first models; and acquiring a first influence estimated value of each first model according to the first reference model and the second model corresponding to each first model.
Thus, the first electronic equipment obtains a first reference model of the (i+1) th round after aggregating each first model, so that the first reference model contains characteristic information of each first model; the second model corresponding to each first model is generated by aggregation after the first model is excluded, so that the second model corresponding to the first model does not contain the characteristic information of the first model; the difference value between the first reference model and the second model corresponding to the first model can show the influence degree of the first model on the global model of the (i+1) th round in polymerization; therefore, the first influence estimated value determined by the method can accurately reflect the influence degree of the first model on the global model of the (i+1) th round, and the method for determining the first influence estimated value is simple and does not increase the calculation complexity of the central party.
According to a first aspect, obtaining a first impact estimate for each first model from a first reference model and a second model corresponding to each first model, comprises: acquiring an evaluation index of a first reference model; acquiring an evaluation index of a second model corresponding to each first model; and respectively acquiring the difference between the evaluation index of the first reference model and the evaluation index of the second model corresponding to each first model as a first influence estimated value of each first model according to the evaluation index of the first reference model and the evaluation index of the second model corresponding to each first model. Thus, the evaluation index of the model can be usually the accuracy, recall, F1 score and the like, and the influence degree of the first model on the i+1st round polymerization operation can be rapidly quantified through the evaluation index of the model.
According to a first aspect, aggregating each first model to generate a first reference model for the (i+1) -th round, comprising: acquiring an average value of the number of the first models in the ith round as a second polymerization weight of each first model; and aggregating each first model according to the second aggregation weight of each first model to generate a first reference model. Therefore, the first electronic device can determine the second polymerization weight of each first model without acquiring additional data uploaded by the second electronic device, and the problem that the determined second polymerization weight of the first model is inaccurate due to false data uploaded by the second electronic device can be avoided.
According to a first aspect, aggregating each first model to generate a first reference model for the (i+1) -th round, comprising: obtaining model information of each first model in the ith round, wherein the model information comprises a training log used for indicating the number of training samples for obtaining the first model or indicating the first model; and determining a second polymerization weight of each first model according to the model information of each first model. In this way, the model information can show the performance advantages and disadvantages of the first model, so that the second polymerization weight of the first model with poor performance can be reduced, and the accuracy of the first reference model is improved.
According to a first aspect, if the model information of the first model includes a number of training samples for indicating that the first model is obtained; determining a second polymeric weight for each first model based on model information for each first model, comprising: obtaining the total number of training samples for generating all first models of the ith round; the quotient between the number of training samples per first model and the total number is obtained as a second aggregate weight per first model. In this way, the more the number of training samples is, the more accurate the model is, and the second polymerization weight of each first model is determined based on the number of training samples, so that the second polymerization weight of the first model with high performance is better, the second polymerization weight of the first model with poor performance is reduced, and the accuracy of the first reference model is improved.
According to a first aspect, determining a first aggregate weight for each first model from the first impact estimate for each first model comprises: acquiring the average value of the first influence estimated values of all the first models in the ith round as an average influence estimated value; how to process the first impact estimate for each first model: acquiring a difference value between a first influence estimated value and an average influence estimated value of the first model as a second difference value; the sum between the second aggregate weight and the second difference of the first model is obtained as the first aggregate weight of the first model. In this way, the second aggregate weight is finely tuned through the first influence estimation value, and the finely tuned second aggregate weight is used as the first aggregate weight, so that the rationality of the first aggregate weight of each first model is improved, and the accuracy of the global model of the (i+1) th round is improved.
According to a first aspect, determining a first model for an i+1th round of aggregation operations from each local model information uploaded, comprises: according to the uploaded local model information, determining a local model generated by each second electronic device in the ith round; dividing each local model in the ith round into N classes, wherein N is an integer greater than 1; the local models in each class are aggregated to obtain N first models. In this way, the first electronic device can perform classification operation on the local model of each second electronic device, aggregate the local model of each class, and obtain N first models, so that the number of the first models in the subsequent aggregation operation is reduced, and the calculation amount of the central party is reduced.
According to a first aspect, aggregating the local models in each class to obtain N first models, including: the following is done for the local model in each class: acquiring an average value of the number of the local models in the class as a third polymerization weight of each local model in the class; and according to the third polymerization weight of each local model in the class, aggregating each local model in the class to generate a first model. Therefore, the first electronic equipment is aggregated based on the number of the local models in the class, and other information uploaded by the second electronic equipment is not relied on, so that the influence of malicious data on the model aggregation can be effectively avoided, and the accuracy of each first model is improved.
According to a first aspect, partitioning each local model in an ith round into N classes includes: and dividing the local model of the ith round into N classes by adopting a K-means clustering mode. Therefore, when the local models are numerous, the first electronic equipment classifies the local models in a K-means clustering mode, and the classification speed is high and accurate.
According to a first aspect, partitioning each local model in an ith round into N classes includes: aggregating the local models generated by each second electronic device in the ith round to obtain a second reference model; respectively acquiring the distance between each local model and the second reference model; and determining the class to which each local model belongs according to each acquired distance and N preset distance threshold ranges. In this way, the first electronic device obtains the distances between the local model and the second reference model respectively, the similarity between the local model and the second reference model can be accurately determined according to the distances, and classification is performed based on the similarity between the local model and the second reference model.
According to a first aspect, partitioning each local model in an ith round into N classes includes: acquiring the variable quantity between the local model of the ith round and the global model issued by the ith round, which are uploaded by each second electronic device; and determining the class to which each local model belongs according to each acquired variable quantity and the preset N variable quantity threshold ranges. In this way, the first electronic device compares each local model with the global model of the ith round to obtain the similarity between each local model and the global model of the ith round, and the near models can be divided into one class based on the similarity, so that the first model of each class can accurately reflect the characteristics of the model.
According to a first aspect, the evaluation index comprises: accuracy, recall, or F1 score of the model.
According to a first aspect, before classifying each local model in the ith round into N classes, the method further comprises: the number of local models in the ith pass is detected to be greater than a preset threshold. Thus, only when the number of the local models is larger than a preset threshold value, each local model in the ith round is divided, so that the number of subsequent first models is reduced, and the calculation amount of a central party is reduced.
According to a first aspect, determining a first model for an i+1th round of aggregation operations from each local model information uploaded, comprises: according to the uploaded local model information, determining a local model generated by each second electronic device in the ith round; and when the number of the local models in the ith round is detected to be smaller than or equal to a preset threshold value, taking the local models in the ith round as the first models in the ith round. Thus, when the number of the local models is smaller than the preset threshold value, the local models do not need to be classified, and unnecessary calculation overhead is avoided.
According to a first aspect, before issuing the global model information of the i-th round federal learning to each of the second electronic devices, the method further comprises: responding to model training requests sent by at least two second electronic devices, and acquiring the initial model matched with the model training requests; after obtaining the trained federal learning model, the method further comprises: and respectively issuing the federal learning model to each second electronic device. Thus, the second electronic device can trigger the first electronic device to start the federal learning training target model so as to meet the requirements of different second electronic devices on the model.
In a second aspect, the present application provides an electronic device comprising: a memory and a processor, the memory coupled to the processor; the memory stores program instructions that, when executed by the processor, cause the electronic device to perform the first aspect and the federal learning method corresponding to any one of the implementations of the first aspect.
Any implementation manner of the second aspect and the second aspect corresponds to any implementation manner of the first aspect and the first aspect, respectively. The technical effects corresponding to the second aspect and any implementation manner of the second aspect may be referred to the technical effects corresponding to the first aspect and any implementation manner of the first aspect, which are not described herein.
In a third aspect, the present application provides a computer readable medium storing a computer program, which when run on an electronic device, causes the electronic device to perform the method of federal learning according to any one of the implementations of the first aspect and the first aspect.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an exemplary illustrated lateral federal learning system;
fig. 2 is a schematic diagram of an exemplary shown center-side structure;
FIG. 3 is a flow chart of an exemplary illustrated central-side training global model;
FIG. 4 is a schematic diagram of an exemplary federal learning system;
FIG. 5 is a schematic diagram of an exemplary federal learning system;
FIG. 6 is a schematic diagram of an exemplary illustrated federal learning system determining a global model in an ith round;
FIG. 7 is a flow chart of an exemplary illustrated central-side training global model;
FIG. 8 is a schematic diagram of an exemplary illustrated federal learning system determining a global model in an ith round;
fig. 9 is a schematic structural view of an exemplary second electronic device;
fig. 10 is a software configuration diagram of the second electronic device exemplarily shown.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone.
The terms first and second and the like in the description and in the claims of embodiments of the application, are used for distinguishing between different objects and not necessarily for describing a particular sequential order of objects. For example, the first target object and the second target object, etc., are used to distinguish between different target objects, and are not used to describe a particular order of target objects.
In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the description of the embodiments of the present application, unless otherwise indicated, the meaning of "a plurality" means two or more. For example, the plurality of processing units refers to two or more processing units; the plurality of systems means two or more systems.
Before describing the technical scheme of the embodiment of the present application, a description is first given of a lateral federal learning system according to an embodiment of the present application with reference to fig. 1. The lateral federal learning system includes a parameter server (i.e., a central party) and at least one client (i.e., a participant). In this example, the lateral federal learning system includes G clients, G being an integer greater than 2. The process of lateral federal learning will be specifically described below in conjunction with fig. 1:
s1, initializing global model parameters by a central party.
Specifically, the central party stores the structure of the global model of the federation learning in advance, and can initialize the parameters of the global model to obtain an initial global model (or called initial model).
S2, the central party transmits the initial global model to each participant.
And S3, each participant trains a model by using the local data, and sends the trained model (or gradient) to the central party.
S4, the central party receives the local model (or gradient) sent by each participant, and a local model of each participant is obtained; and performing aggregation operation on each local model to obtain the global model of the round.
Specifically, each participant may send the local model obtained by the present round of training to the central party. The participants may also send gradients of the local model obtained by training to the central party. And the central party determines a local model obtained by training each participant in the round according to the received gradient and the issued global model.
S5, repeating the steps S2 to S4 until the global model converges or reaches the preset training round number.
The central party typically uses federal averaging algorithms to determine the weight of each local model as it is polymerized. For example, if there are 5 participants, the local models trained by each participant are denoted as m1, m2, m3, m4, and m5, respectively. After the central party obtains the 5 local models, determining the weight of each local model to be 1/5 by adopting a federal average algorithm; that is, the global model of the present round obtained from the center is denoted as m=1/5 (m1+m2+m3+m4+m5). The global model obtained by the algorithm does not consider the training data quantity and the training data distribution when each participant trains the local model, and does not refer to the performance difference of each local model; resulting in low accuracy of the resulting global model.
In some embodiments, when the client side reports the local model to the central side, the client side may also report the performance score of the model, for example, information such as model loss, accuracy rate, and the like. The central party may determine the aggregate weight for each local model based on the model performance of that model. And the central party aggregates each local model according to the aggregation weight of each local model to obtain the global model of the round. This approach requires additional operations on the trained local model depending on the client, increasing the cost of the client, while also being difficult to defend against the problem of malicious reporting of false performance scores.
In some embodiments, the central party may generate the global model in a deep learning manner. For example, knowledge distillation of the participant models at the central party to aggregate the models, or optimization of the aggregate model using the attentional mechanisms. However, this approach increases the computational complexity of the central party, resulting in a dramatic increase in the cost of the central party, making it difficult to apply the method widely.
The embodiment of the application provides a federal learning method, which is applied to a central party, wherein the central party can determine a first model for an i+1st round of polymerization operation according to local model information returned by a participant in the federal learning process; and dynamically determining a first aggregate weight for each first model in the (i+1) th round according to each first model. In this example, the first aggregation weight of each first model depends on the influence degree of each first model on the aggregation operation, so that the weight of the local model with good effect is high, the aggregation weight of the local model with poor effect is reduced, the accuracy of the aggregated global model is improved, and the iteration times of federal learning are reduced. And the center does not need to adopt a knowledge distillation mode to carry out the aggregation operation, so that the calculation complexity of the center is reduced, and the cost of model aggregation of the center is reduced.
Fig. 2 is a schematic diagram of an exemplary illustrated structure of a center side. In this example, the central party may be a server, the central party comprising: the system comprises a first model determining module, a first influence estimation determining module, a first aggregation weight determining module and an aggregation module. The first model determining module is used for determining a first model for aggregation operation of the ith round and the (+1) th round according to the local model information reported by each participant in the ith round. The first influence estimate determination model is used to determine a first influence estimate for each first model. The first aggregate weight determination module is configured to determine a first aggregate weight for each first model based on the first impact estimate for each first model. And the aggregation module carries out aggregation operation on each first model according to the first aggregation weight of each first model to obtain a global model of the (i+1) th round. The aggregate model may also issue the global model for the (i+1) th round to each participant upon detecting that the global model does not meet training termination conditions.
The process of the central party training the global model is described in detail below in connection with fig. 3. In this example, the first electronic device, which may be a server, acts as a central party. The second electronic device may be a device with training functions as a participant, for example, the second electronic device may be a server, a mobile phone, a personal computer, etc.
Step 301: the first electronic device transmits global model information of the i-th round federal learning to each second electronic device.
In some embodiments, the first electronic device generates an initial model that matches the model training instructions according to the model training instructions entered by the user. For example, the model training instruction a input by the user indicates a model for training face recognition, and after receiving the model training instruction a, the first electronic device may obtain an initial model of face recognition matched with the model training instruction a. Optionally, the initial model includes a model structure and each parameter in the initial model.
Optionally, the first electronic device may further receive model training instructions sent by the plurality of second electronic devices, and the first electronic device may determine, according to the received model training instructions, an initial model that matches the model training instructions. For example, as shown in fig. 4, handset a and handset B send a model training request 1 to a server, and assume that the initial model corresponding to model training request 1 is model a. When the server receives the model training request 1, the initial model matched with the model training request 1 is queried as a model A, and the server can respectively send the model A to the mobile phone A and the mobile phone B as global model information. That is, at this time, the server acts as a center, and handset a and handset B act as participants.
It should be noted that, the first electronic device may receive a plurality of different model training requests, and may obtain an initial model that matches each model training request separately. The first electronic device issues each initial model to a model training request sender that matches the initial model.
For example, as shown in fig. 5, in the smart service scenario, handset a and handset B send a model training request 1 (implementing the face recognition function) to the server, and handset C and handset D send a model training request 2 (implementing the road safety identification function) to the server. Assuming that the initial model corresponding to the model training request 1 is a model a, and the initial model corresponding to the model training request 2 is a model B. When the server receives the model training request 1, a model A corresponding to the model training request 1 is acquired, and a model B corresponding to the model training request 2 is acquired. The server respectively transmits the model A to the mobile phone A and the mobile phone B, and the server respectively transmits the model B to the mobile phone C and the mobile phone D. It can be appreciated that the server, handset a and handset B comprise a first federal learning system, and the server, handset C and handset D comprise a second federal learning system. The first federal learning system is used for training a face recognition model, and the second federal learning system is used for training a road safety identification model.
The first electronic device takes the obtained initial model as a global model of the first round of federal learning. The first electronic device may issue the global model of the first round as global model information to each second electronic device, respectively.
The second electronic device receives the global model information issued by the first electronic device for the first time, and acquires an initial model from the global model information of the first round. Each second electronic device can train the initial model according to the local training data, and update parameters in the initial model to obtain a local model of the first round. The training data in the second electronic device may be data stored in the second electronic device in advance, or may be data acquired by the second electronic device from other electronic devices. For example, if the federal learns a model for training face recognition, the training data in the mobile phone a (the second electronic device) may be an image including a face downloaded from a network, or may be an image including a face stored in a gallery in the mobile phone a.
The second electronic device may upload the local model as the local model information server, or may upload parameters or gradients in the local model of the first round as the local model information of the first round.
The first electronic device may execute steps 302 to 306 after receiving the local model of the first round reported by each second electronic device. After the first electronic device determines the global model of the second round, the global model of the second round may be used as global model information of the second round, or gradients of the global model of the second round may be used as global model information of the second round. Similarly, when i is an integer greater than 2, the global model information of the ith round may be the global model of the ith round or the gradient of the global model of the ith round.
Step 302: the first electronic device determines a first model for the (i+1) th round of aggregation operation according to each piece of uploaded local model information.
In some embodiments, the following is done for each local model information of the ith round: when detecting that the current local model information comprises a gradient, the first electronic device can determine a local model uploaded by the ith wheel of the second electronic device according to the global model of the ith wheel and the gradient, and the first electronic device takes the local model as a first model. When it is detected that the current local model information includes the local model, the local model is directly taken as a first model.
For example, the participants include a mobile phone a and a mobile phone B, and the mobile phone a obtains the local model M by training with the local training data in the mobile phone a after receiving the initial model a1 Mobile phone A acquires local model M a1 As round 1 local model information a and upload round 1 local model information a to the server. After receiving the initial model, the mobile phone B trains and obtains a local model M by using local training data in the mobile phone B b1 Handset B will be in local model M b1 As the local model information B of round 1, and upload the local model information B of round 1 to the server. The server receives the local model information A uploaded by the mobile phone A of the 1 st round and the local model information B uploaded by the mobile phone B of the 1 st round. When the server detects that the local model information A comprises the gradient, the server can determine that the mobile phone A is in the process of determining the initial model and the gradientLocal model M obtained after local training of round 1 a1 . The server may send the local model M a1 A first model is made in round 1. The server detects that the local model M is included in the local model information B b1 The local model M can be directly used b1 A second first model in run 1 was made.
Step 303: the first electronic device obtains a first impact estimate for each first model.
After the first electronic device obtains each first model, a first influence estimation value of each first model may be determined according to the first model, where the first influence estimation value of the first model is used to evaluate an influence of the first model on an aggregation operation of the i+1st round, that is, an influence of the first model on a global model of the i+1st round to be determined.
In some embodiments, the first electronic device may aggregate each first model to generate a first reference model for the (i+1) -th round; respectively obtaining second models corresponding to the first models, wherein the second model corresponding to the j-th first model is a model obtained by aggregating the first models except the j-th first model, j is an integer greater than or equal to 1, and the maximum value of j is equal to the number of the first models; and acquiring a first influence estimated value of each first model according to the first reference model and the second model corresponding to each first model.
Specifically, the first electronic device may construct the first reference model of the i+1st round according to all the first models in the current i-th round. The first electronic device may aggregate each first model using a federal averaging algorithm, and the resulting model is used as the first reference model. For example, the first electronic device obtains an average value of the number of first models in the ith round as a second polymerization weight of each first model; and aggregating each first model according to the second aggregation weight of each first model to generate a first reference model. For example, each first model is denoted as m j The number of the first models of the ith round is n, and the first reference model is expressed as:
the process of generating the second model corresponding to each first model by the first electronic equipment is as follows: the first electronic equipment acquires a first model in the ith round, aggregates the first models except for the jth first model, and acquires a second model corresponding to the jth first model. Each first model is denoted as m j And if the number of the first models of the ith round is n, the second model corresponding to the jth first model is expressed as:
for example, the first model of round 1 has 5, m 1 、m 2 、m 3 、m 4 And m 5 . The first electronic device aggregates m by adopting a federal average algorithm 2 、m 3 、m 4 And m 5 Obtaining a modelThe model M -1 I.e. m 1 A corresponding second model. The first electronic device aggregates m by adopting a federal average algorithm 1 、m 3 、m 4 And m 5 Obtaining a modelThe model M -2 I.e. m 2 A corresponding second model. The first electronic device aggregates m by adopting a federal average algorithm 1 、m 2 、m 4 And m 5 Obtaining a model-> The model M -3 I.e. m 3 A corresponding second model. The first electronic device aggregates m by adopting a federal average algorithm 1 、m 2 、m 3 And m 5 Obtaining a modelThe model M -4 I.e. m 4 A corresponding second model. The first electronic device aggregates m by adopting a federal average algorithm 1 、m 2 、m 3 And m 4 Obtaining a model- >The model M -5 I.e. m 5 A corresponding second model.
After the first reference model and the second model corresponding to each first model are determined, the first electronic device can acquire the evaluation index of the first reference model, and the evaluation index of the model can be the accuracy, the precision, the recall rate, the F1 score and the like of the model. The first electronic equipment respectively acquires the evaluation indexes of the second models corresponding to the first models. The first electronic device can determine a first influence estimated value of each first model according to the evaluation index of the first reference model and the evaluation index of the second model corresponding to each first model. The first electronic device may obtain, as the first Influence estimate information of the jth first model, a difference between the evaluation index of the first reference model and the evaluation index of the second model corresponding to the jth first model j Is marked as Influence j =score-score -j Wherein score represents an evaluation index of the first reference model, score -j And the evaluation index of the second model corresponding to the j-th first model is represented.
FIG. 6 is a schematic diagram of the federal learning system in this example determining a global model in one round. As shown in fig. 6, there are n participants, assuming that each participant transmits local model information to the central party 2 nd time. The central party determines a local model of each participant according to the local model information of each participant, and n local models are obtained. The central party adopts a federal average algorithm to aggregate n local models to obtain a reference model (i.e., the first reference model). The central party adopts the federal tie algorithm to aggregate other local models except the local model 1 to obtain the local modelModel M corresponding to -1 (i.e., the second model corresponding to local model 1). Similarly, the central party adopts the federal tie algorithm to aggregate other local models except the local model n to obtain a model M corresponding to the local model -n (i.e., the second model corresponding to the local model n), and the determination process of the second model corresponding to the other local model is not listed one by one.
As shown in FIG. 6, the central party can acquire a reference model M using the test data of the central party avg And will reference model M avg As an evaluation score (i.e., an evaluation index of the first reference model). Center side same principle acquisition model M -1 As the model M 1 Score of (2) -1 (i.e., the evaluation index of the second model corresponding to the first model), and similarly, the center may obtain the evaluation scores of the n second models. First Influence estimation Influence of local model 1 1 =score-score -1 First Influence estimate Influence of local model 2 2 =score-score -2 First Influence estimation Influence of local model n, … … n =score-score -n
In some embodiments, the local model information uploaded by the second electronic device may further include a training sample number for training the local model. After the first electronic device receives the local model information, a first model may be determined according to the local model information. The first electronic device determines the training sample number of the first model according to the training sample data of the local model. The first electronic device determines a second polymeric weight of the first model based on the number of training samples of the first model. The first electronic equipment aggregates each first model according to the second aggregation weight of each first model to obtain a first reference model.
The process of determining the first reference model and each second model in another aggregation manner is described below with reference to fig. 6:
as shown in fig. 6, the local model information uploaded by the party 1 in the ith round includes: the gradient of the local model 1 and the training sample number num_1 for training the local model 1, and the local model information uploaded by the participant 2 includes: the gradient of local model 2 and the number of training samples num_2 used to train the local model 2. The local model information uploaded by the participant 3 includes: the gradient of local model 3 and the number of training samples num_3, … … used to train the local model 3, the local model information uploaded by party n includes: the gradient of the local model n and the number of training samples num_n used to train the local model n.
The central party can determine the local model obtained by training each participant in the ith round according to the local model information uploaded by each participant in the ith round. For example, the central party may determine the local model 1 based on the gradient of the local model 1 and the global model determined in the previous round.
The central party obtains the total number of training samples for generating all the first models of the ith round, and obtains the quotient between the number of the training samples of each first model and the total number as the second polymerization weight of each first model. The central party aggregates each first model according to the second aggregation weight of each first model to generate a first reference model.
For example, center Fang Huoqu ith round total number of training samples for all first modelsSecond weight w of the local model 1 (i.e. the first model) 21 =num_1/num_sum; the second weight of the local model 2 is w 22 Second weight w of local model n (i.e., first model) =num_2/num_sum, … … 2n =num_n/num_sum. The central party aggregates each first model according to the second aggregation weight of each first model to obtain a first reference model, wherein the first reference model is +.>
The process of generating the second model corresponding to each first model by the central party is as follows: the first electronic equipment acquires the first in the ith roundAnd aggregating the first models except for the jth first model to obtain a second model corresponding to the jth first model. Each first model is denoted as m j The number of the first models of the ith round is n, and the second models corresponding to the jth first model are expressed as:
for example, the first model of round 1 has 5, m1, m2, m3, m4, and m5, respectively. Training samples for m1 are num1, m2 are num2, m3 are num3, m4 are num4 and m5 are num5. The total number of training samples for all the first models of this 1 st round is noted as sum (i.e., sum=num 1+num2+num3+num4+num 5).
The first electronic device aggregates m according to the training sample number of each first model 2 、m 3 、m 4 And m 5 Obtaining a modelThe model M -1 I.e. m 1 A corresponding second model. Center square aggregate m 1 、m 3 、m 4 And m 5 Obtaining a model-> The model M -2 I.e. m 2 A corresponding second model. Center square aggregate m 1 、m 2 、m 4 And m 5 Obtaining a model->The model M -3 I.e. m 3 A corresponding second model. Center square aggregate m 1 、m 2 、m 3 And m 5 Obtaining a model-> The model M -4 I.e. m 4 A corresponding second model. Center square aggregate m 1 、m 2 、m 3 And m 4 Obtaining a model->The model M -5 I.e. m 5 A corresponding second model.
In this example, the second aggregation weight of each first model is determined based on the number of the first models, and since the second aggregation weight is not required to be calculated depending on additional information uploaded by the second electronic device, false information reported by malicious parties can be effectively defended, the high weight occupied by the models with poor performance in the i+1th round of aggregation is avoided, and the risk of performance degradation of the global model is avoided.
In some embodiments, the local model information uploaded by the second electronic device may further include a training log for indicating that the first model is obtained, where the training log may include evaluation indexes (such as accuracy, recall, and F1 score) of the local model trained by the second electronic device. The first electronic device may determine each first model according to the local model information, and the process of determining the first model is not described in detail. In this example, taking the example that the training log includes accuracy, a process of determining the first reference model and the second model corresponding to each first model is described.
Specifically, the first electronic device obtains the accuracy of each first model, and the accuracy of each first model can be used as the second polymerization weight of the first model. The first electronic device aggregates each first model according to the second aggregation weight of each first model to generate a first reference model. Similarly, the first electronic device may aggregate the first models except for the jth first model according to the second aggregation weight of each first model, so as to obtain a second model corresponding to the jth first model.
Step 304: the first electronic device determines a first aggregate weight for each first model based on the first impact estimate for each first model.
In this example, the first electronic device adjusts the second aggregate weight of the first model based on the first impact estimate of the first model, and uses the adjusted second aggregate weight as the first aggregate weight of the first model.
Alternatively, the first electronic device may acquire an average of the first influence estimates of all the first models in the ith round as the average influence estimate. How to process the first impact estimate for each first model: the first electronic device obtains a difference value between a first influence estimated value and an average influence estimated value of the first model as a second difference value; the first electronic device obtains a sum between the second aggregate weight and the second difference of the first model as a first aggregate weight of the first model.
For example, the first model is denoted as m j The start value of j is 1 and the maximum value is n. The number of the first models of the ith round is n, and the first influence estimation value of the first models is recorded as follows: influence j =score-score -j . The first electronic device obtains an average of the first impact estimates of all the first models in the ith round, the average impact estimates being expressed as:the second difference of the j-th first model is +.>The second polymeric weight of the jth first model is denoted as w 2j . The first electronic device determines a first aggregate weight w of the jth first model based on the second aggregate weight of the jth first model, the first impact estimate of the jth first model, and the average impact estimate j ,/>
If the second polymeric weight of the jth first model is expressed asThe first aggregate weight of the j-th first model +.>
If the second polymeric weight of the jth first model is expressed asThe first aggregate weight of the j-th first model +.>
Step 305: the first electronic equipment aggregates each first model according to the first aggregation weight of each first model to obtain a global model of the (i+1) th round.
Illustratively, after determining the first aggregate weight for each first model, the first electronic device may aggregate each first model according to the first aggregate weight for each first model to obtain a global model for the i+1st round. As shown in fig. 6, the first electronic device determines that the aggregation weight of the local model 1 is w 1 (i.e., the first aggregate weight), the aggregate weight w of the local model 2 2 … …, aggregation weight w of local model n n . The first electronic equipment aggregates each local model according to the aggregation weight of each local model to obtain a global model M weighted =∑w j *m j ,m j Represents the jth local model, w j The first aggregate weight of the jth local model is represented.
Step 306: and the first electronic equipment downloads global model information of the (i+1) th round to each second electronic equipment according to the global model of the (i+1) th round so as to perform federal learning of the next round until a preset training termination condition is met, and a trained federal learning model is obtained.
For example, after determining the global model of the (i+1) th round, the first electronic device may detect whether a preset training termination condition is currently satisfied. Alternatively, the training termination condition may be that convergence of the global model is detected, and the training termination condition may also be that the number of times of detecting the round reaches a preset number of times, and the preset number of times may be 20 times, 10 times, or the like.
And when the first electronic equipment detects that the global model of the (i+1) th round has converged or the number of the current (i+1) th round has reached the preset times, taking the global model of the (i+1) th round as a federal learning model after training is completed.
And when the first electronic equipment detects that the preset training termination condition is not met currently, generating global model information of the ith round+1, wherein the global model information of the ith round+1 comprises a global model of the ith round+1 or a gradient of the global model of the ith round+1. The first electronic device issues global model information of the (i+1) th round to each second electronic device, and continues training of the next round, namely, executing steps 301-306.
In the federal learning process, the first electronic device (central party) may aggregate the local models of the ith round of each participant to obtain a first reference model, and obtain an evaluation index of the first reference model. The first electronic equipment aggregates other local models except the jth local model to obtain a second model of the jth local model, and the first electronic equipment obtains an evaluation index of the second model. Since the jth local model is not aggregated while aggregating the second model of the jth local model, and the first reference model contains information of each local model in the ith round, a difference between the evaluation index of the first reference model and the evaluation index of the second model of the jth local model is obtained, and the difference can be used to indicate the influence degree of the jth local model in the (i+1) th round of aggregating the local models. The first electronic device can adjust the second aggregation weight of the jth local model through the influence degree, so that when the ith+1 round of aggregation is carried out, the local model is aggregated through the adjusted second aggregation weight, the aggregation weight of the local model with good effect can be improved, and the aggregation weight of the local model with poor effect is reduced, so that the local model with bad effect (namely, the model with random parameters) has good defense effect. Meanwhile, if an average aggregation algorithm is adopted when the first reference model is aggregated, the influence on the i+1st round aggregation global model caused by the feedback of dummy data of the participants can be avoided because the feedback of the dummy data of the participants does not need to depend on the feedback of the participants and the model information, and the risk of the performance degradation of the model is reduced.
In some embodiments, if there are multiple participants, such as if the participants are greater than a preset value, the central party may classify the acquired local models after acquiring the local model trained by each participant's ith round to obtain an aggregate model for each class. The central party can take the aggregation model of each class as a first model, and the number of the first models for carrying out the (i+1) th round aggregation can be reduced through classification operation, so that the calculation amount of the global model for determining the (i+1) th round is reduced.
The federal learning process is described in detail below in conjunction with fig. 7 and 8.
Step 701: the first electronic device transmits global model information of the i-th round federal learning to the second electronic device.
This step is similar to the step in step 301, and reference may be made to the description related to step 301, which will not be repeated here.
Step 702: and the first electronic equipment determines a local model generated by each second electronic equipment in the ith round according to the uploaded local model information.
For example, the first electronic device may determine, based on each local model information, a local model generated by each second electronic device in the ith round of training. Alternatively, when the local model (the local model includes the structure of the model and each parameter in the model) is included in the local model information, the first electronic device may directly acquire the local model.
Optionally, the local model information includes a gradient of the local model, and the first electronic device may determine, according to the global model of the ith round and the gradient of the local model, a local model generated by training the second electronic device of the ith round.
Alternatively, the types of information contained in the local model information uploaded by the different second electronic devices may be different. For example, the local model information uploaded by participant 1 includes the local model 1 trained on the ith round; the local model information uploaded by participant 2 includes the gradient of the local model 2 trained on the ith round.
In some embodiments, the first electronic device obtains a number of local models in an i-th round, and detects whether the number of local models is greater than a preset threshold; if the first electronic device detects that the number of local models is greater than the preset threshold, determining to execute step 703; if the first electronic device detects that the number of local models is less than or equal to the preset threshold, determining that the local model in the ith round is used as the first model in the ith round, and not executing step 703 and step 704. The preset threshold may be set according to the actual application, and for example, the preset threshold may be 10, 30, 50, 100, or the like.
Step 703: the first electronic device classifies the local model in the ith round into N classes.
In some embodiments, the first electronic device obtains a local model generated by training each second electronic device in the ith round, and the first electronic device may classify all local models in the ith round into N classes. Optionally, the first electronic device may divide the local model of the ith round into N classes by adopting a K-means clustering manner, and the process of the K-means clustering method is not described herein.
In some embodiments, the first electronic device may further divide the local models of the ith round into N classes according to the similarity of each local model. The first electronic device may use the distance between the local model and the second reference model as a basis for determining the similarity between each local model, or may use the variable between the local model of the current round and the global model of the current round as a basis for determining the similarity between each local model.
Optionally, the first electronic device may further aggregate the local model generated by each second electronic device in the ith round to obtain a second reference model; respectively acquiring the distance between each local model and the second reference model; and determining the class to which each local model belongs according to each acquired distance and N preset distance threshold ranges.
Specifically, the first electronic device may aggregate each local model in the ith round by using a federal averaging algorithm to obtain the second reference model. The first electronic device may store in advance distance threshold ranges corresponding to the N classes, for example, the first electronic device stores distance threshold ranges corresponding to 3 classes, where the 3 distance threshold ranges do not intersect with each other. The first electronic equipment respectively acquires the distance between each local model and the second reference model, acquires the distance threshold range where each distance is located, and classifies the local models corresponding to the distances in the same distance threshold range into one class.
For example, the first electronic device is divided into 3 classes, where the range of the distance threshold corresponding to the first class is Th1, the range of the distance threshold corresponding to the second class is Th2, and the range of the distance threshold corresponding to the third class is Th3. The second reference model is denoted as M avg2 The local patterns of the ith run are 10, denoted as m1, m2, m3, m4, m5, m6, m7, m8, m9 and m10, respectively. The first electronic equipment calculates the distance between each local model and the second reference model respectively, and obtains the distance L1 between M1 and the second reference model, wherein L1=m1-M avg2 Distance between M2 and second reference model l2=m2-M avg2 Similarly, the first electronic device may obtain distances between the other local models and the second reference model, denoted as L3, L4, … …, L10. The first electronic device detects that the distances L1, L4, and L7 are within the range of Th1, and classifies m1, m4, and m7 into a first class. The first electronic device determines to divide m2, m5, m6 and m10 into a second class if detecting that the distances L2, L5, L6 and L10 are within the range of Th 2; the first electronic device detects that the distances L3, L8, and L9 are within the range of Th3, determines to divide m3, m8, and m9 into a third class.
The first electronic device may also divide the local model of the ith round into N classes in other classification manners. Optionally, the first electronic device may obtain a variable quantity between the local model of the ith round and the global model issued by the ith round, which are uploaded by each second electronic device; and determining the class to which each local model belongs according to each acquired variable quantity and the preset N variable quantity threshold ranges.
Specifically, the first electronic device may store in advance a variation threshold range corresponding to each class, obtain a variation threshold range in which a variation of the jth local model and the global model issued by the ith round is located in the ith round, and divide the jth local model into classes corresponding to the variation threshold ranges.
For example, the first electronic device is divided into 3 classes, where the range of the variation threshold corresponding to the first class is Var1, the range of the distance threshold corresponding to the second class is Var2, and the range of the distance threshold corresponding to the third class is Var3. The global model issued by the ith round is marked as M i . The local patterns of the ith run are 10, denoted as m1, m2, m3, m4, m5, m6, m7, m8, m9 and m10, respectively. The first electronic equipment calculates each local model and M respectively i The first electronic equipment obtains the variation between M1 and M i The variation D1, d1=m1-M between avg2 M2 and M i The variation d2=m2-M between avg2 Similarly, the first electronic device may obtain other local models and M i The amount of change between these are denoted as D3, D4, … …, and D10. The first electronic device detects that the amounts of change D1, D4, and D7 are within the range of Var1, then classifies m1, m4, and m7 into a first class. The first electronic device detects that the variable amounts D2, D5, D6 and D10 are in the range of Var2, and then determines to divide m2, m5, m6 and m10 into a second class; the first electronic device detects that the amounts of change D3, D8, and D9 are within the range of Var3, determines to divide m3, m8, and m9 into a third class.
Step 704: the first electronic equipment aggregates the local models in each class to obtain N first models.
Specifically, the first electronic device may aggregate the local models in each class to obtain N first models. The first electronic device may aggregate the local models in each class in an average aggregate manner. Optionally, the first electronic device may also determine a third polymerization weight of each local model according to the number of training samples of each local model. And the first electronic equipment aggregates the local models in the class to obtain a first model of the class for the third aggregation weight of each local model in the class.
For example, 7 participants each upload to the central party a local model each trained on the ith round. In this example, the preset threshold is set to 6, and the first electronic device detects that the number of local models is greater than 6, and determines to divide the 7 local models into 4 classes. The first electronic device stores 4 classes, and a range of variation thresholds corresponding to each class. As shown in fig. 8, party 1 uploaded local model 1, party 2 uploaded local model 2, … …, and party 7 uploaded local model 7. In this example, the central party may obtain the variable quantity between each local model and the global model of the ith round, divide 7 local models into 4 classes according to each variable quantity and the variable quantity threshold ranges corresponding to the 4 classes, and refer to step 703 in the classification process, which is not described herein. The first category includes: local model 1, local model 2, and local model 3; the second category includes: a local model 4 and a local model 5; the third category includes: a local model 6; the fourth class includes: local model 7. The central party can aggregate three local models of the first class according to the federal average algorithm to obtain a combined model 1 (i.e. the first model corresponding to the first class), and aggregate two local models of the second class in the same way to obtain a combined model 2. And aggregating the local models of the third class to obtain a merging model 3, and aggregating the local models of the fourth class to obtain a merging model 4. It can be understood that the third class contains a local model 1, and the aggregated merge model is the local model in the class.
In this example, the central party takes the merge model 1, the merge model 2, the merge model 3, and the merge model 4 as 4 first models.
Optionally, if the second electronic device uploads the number of training samples of each local model, the number of training samples of the first model may be determined based on the local model of each class, and optionally, the first electronic device may use the sum of the numbers of training samples of all local models in the class as the number of training samples of the corresponding first model of the class.
For example, as shown in fig. 8, the number of training samples of the local model 1 is num1, the number of training samples of the local model 2 is num2, and the number of training samples of the local model 3 is num3. The training sample number num_1=num 1+num2+num3 of the merge model 1.
Step 705: the first electronic device obtains a first impact estimate for each first model.
This step is similar to step 303, and reference may be made to the description related to step 303, which will not be repeated here.
Step 706: the first electronic device determines a first aggregate weight for each first model based on the first impact estimate for each first model.
This step is similar to step 304 and reference may be made to the description of step 304, which will not be repeated here.
Step 707: the first electronic equipment aggregates each first model according to the first aggregation weight for killing the first models, and a global model of the (i+1) th round is obtained.
This step is similar to step 305 and reference may be made to the description of step 305, which will not be repeated here.
Step 708: and the first electronic equipment downloads global model information of the (i+1) th round to each second electronic equipment according to the global model of the (i+1) th round so as to perform federal learning of the next round until a preset training termination condition is met, and a trained federal learning model is obtained.
This step is similar to step 306 and reference may be made to the description of step 306, which will not be repeated here.
In this example, when the number of participants is large (i.e., the number of participants exceeds a preset threshold, such as 20, 10, etc.), the central party may classify the local models of the current round, aggregate the local models in each class, and obtain a global model for determining a first model for determining the next round. By the classification and aggregation operations, the number of first models is reduced, thereby reducing the computational effort for subsequently determining the global model for the next round.
The second electronic device in the embodiment of the application is used as a participant, and the second electronic device can be a smart phone, a tablet personal computer, a notebook personal computer and the like.
Fig. 9 is a schematic structural diagram of a second electronic device 100 according to an embodiment of the present application. It should be understood that fig. 9 illustrates the second electronic device 100 as only one example of a second electronic device, and that the electronic device 100 may have more or fewer components than shown in the figures, may combine two or more components, or may have a different configuration of components. The various components shown in fig. 9 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
The second electronic device 100 may include: processor 110, external memory interface 120, internal memory 121, universal serial bus (universal serial bus, USB) interface 130, charge management module 140, power management module 141, battery 142, antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headset interface 170D, sensor module 180, keys 190, motor 191, indicator 192, camera 193, display 194, and subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor, a gyroscope sensor, a barometric sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.
Fig. 10 is a software structural block diagram of the second electronic device 100 according to the embodiment of the present application.
The layered architecture of the second electronic device 100 divides the software into several layers, each with a distinct role and division of effort. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into three layers, an application layer, an application framework layer and a kernel layer from top to bottom. It will be appreciated that the layers and components contained in the layers in the software structure of fig. 10 do not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer layers than shown and may include more or fewer components per layer, as the application is not limited.
As shown in fig. 10, the application layer may include a series of application packages. The application packages may include model training applications, maps, WLANs, calendars, short messages, gallery, conversations, navigation, bluetooth, camera, etc. applications. The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions.
The model training application may be configured to send model training instructions to the first electronic device based on the model training instructions entered by the user. The model training application can acquire training data according to model training instructions, wherein the training data can be data in a gallery or can be training data downloaded from a network. After the model training application receives the global model issued by the first electronic device, model training can be performed according to the global model and training data, and the trained local model is uploaded to the first electronic device.
As shown in fig. 10, the application framework layer may include a window manager, a resource manager, a content provider, a view system, a phone manager, a notification manager, and the like.
The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.
The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.
The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.
The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.
The telephony manager is used to provide the communication functions of the electronic device 100. Such as the management of call status (including on, hung-up, etc.).
The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.
The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.
It will be appreciated that the layers and components contained in the layers in the software structure shown in fig. 10 do not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer layers than shown and may include more or fewer components per layer, as the application is not limited.
It will be appreciated that the electronic device, in order to achieve the above-described functions, includes corresponding hardware and/or software modules that perform the respective functions. The present application can be implemented in hardware or a combination of hardware and computer software, in conjunction with the example algorithm steps described in connection with the embodiments disclosed herein. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application in conjunction with the embodiments, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The present embodiment also provides a computer storage medium having stored therein computer instructions which, when executed on an electronic device, cause the electronic device to perform the above-described related method steps to implement the method of federal learning in the above-described embodiments. The storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The present embodiment also provides a computer program product which, when run on a computer, causes the computer to perform the above-described related steps to implement the method of federal learning in the above-described embodiments.
The electronic device, the computer storage medium, the computer program product, or the chip provided in this embodiment are used to execute the corresponding methods provided above, so that the beneficial effects thereof can be referred to the beneficial effects in the corresponding methods provided above, and will not be described herein.
Any of the various embodiments of the application, as well as any of the same embodiments, may be freely combined. Any combination of the above is within the scope of the application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims (18)

1. A method of federal learning, applied to a first electronic device communicatively coupled to at least two second electronic devices, the method comprising:
issuing global model information of i-th round federal learning to each second electronic device, so that each second electronic device performs local model training based on the global model information of the i-th round, and uploading the local model information of the i-th round to the first electronic device, wherein if i is an integer greater than 1, the global model information comprises global model or gradient information of the global model, and if i=1, the global model information is a preset initial model, and the local model information comprises local model or gradient information of the local model;
determining a first model for the (i+1) -th round of aggregation operation according to each piece of uploaded local model information;
acquiring a first influence estimation value of each first model, wherein the first influence estimation value of each first model is used for evaluating the influence of the first model on the aggregation operation of the (i+1) th round;
determining a first aggregate weight for each first model based on the first impact estimate for each first model;
Aggregating each first model according to the first aggregation weight of each first model to obtain a global model of the (i+1) th round;
and according to the global model of the (i+1) th round, downloading the global model information of the (i+1) th round to each second electronic device to perform federal learning of the next round until a preset training termination condition is met, and obtaining a trained federal learning model.
2. The method of claim 1, wherein the obtaining a first impact estimate for each first model comprises:
aggregating each first model to generate a first reference model of the (i+1) th round;
respectively obtaining second models corresponding to the first models, wherein the second model corresponding to the j-th first model is a model obtained by aggregating the first models except the j-th first model, j is an integer greater than or equal to 1, and the maximum value of j is equal to the number of the first models;
and acquiring a first influence estimated value of each first model according to the first reference model and the second model corresponding to each first model.
3. The method of claim 2, wherein the obtaining a first impact estimate for each first model based on the first reference model and the second model corresponding to each first model comprises:
Acquiring an evaluation index of the first reference model;
acquiring an evaluation index of a second model corresponding to each first model;
according to the evaluation index of the first reference model and the evaluation index of the second model corresponding to each first model,
and respectively acquiring the difference value between the evaluation index of the first reference model and the evaluation index of the second model corresponding to each first model as a first influence estimated value of each first model.
4. The method of claim 2, wherein the aggregating each first model to generate a first reference model for the (i+1) th round comprises:
acquiring an average value of the number of the first models in the ith round as a second polymerization weight of each first model;
and aggregating each first model according to the second aggregation weight of each first model to generate a first reference model.
5. The method of claim 4, wherein aggregating each first model to generate a first reference model for an i+1st round comprises:
obtaining model information of each first model in an ith round, wherein the model information comprises a training log used for indicating the number of training samples for obtaining the first model or indicating the first model;
And determining a second polymerization weight of each first model according to the model information of each first model.
6. The method of claim 5, wherein if the model information of the first model includes information indicating a number of training samples for obtaining the first model;
the determining the second polymerization weight of each first model according to the model information of each first model comprises the following steps:
obtaining the total number of training samples for generating all first models of the ith round;
the quotient between the number of training samples per first model and the total number is obtained as a second aggregate weight per first model.
7. The method of claim 5, wherein determining a first aggregate weight for each first model based on the first impact estimate for each first model comprises:
acquiring the average value of the first influence estimated values of all the first models in the ith round as an average influence estimated value;
how to process the first impact estimate for each first model: obtaining a difference value between a first influence estimated value of the first model and the average influence estimated value as a second difference value; the sum between the second aggregate weight of the first model and the second difference is obtained as a first aggregate weight of the first model.
8. The method of claim 1, wherein determining a first model for an i+1 th round of aggregation operation based on each local model information uploaded comprises:
according to the uploaded local model information, determining a local model generated by each second electronic device in the ith round;
dividing each local model in the ith round into N classes, wherein N is an integer greater than 1;
the local models in each class are aggregated to obtain N first models.
9. The method of claim 8, wherein aggregating the local models in each class to obtain N first models comprises:
the following is done for the local model in each class:
acquiring an average value of the number of the local models in the class as a third polymerization weight of each local model in the class;
and according to the third polymerization weight of each local model in the class, aggregating each local model in the class to generate a first model.
10. The method of claim 8, wherein classifying each local model in the ith round into N classes comprises:
and dividing the local model of the ith round into N classes by adopting a K-means clustering mode.
11. The method of claim 8, wherein classifying each local model in the ith round into N classes comprises:
aggregating the local models generated by each second electronic device in the ith round to obtain a second reference model;
respectively acquiring the distance between each local model and the second reference model;
and determining the class to which each local model belongs according to each acquired distance and N preset distance threshold ranges.
12. The method of claim 8, wherein classifying each local model in the ith round into N classes comprises:
acquiring the variable quantity between the local model of the ith round and the global model issued by the ith round, which are uploaded by each second electronic device;
and determining the class to which each local model belongs according to each acquired variable quantity and the preset N variable quantity threshold ranges.
13. A method according to claim 3, wherein the evaluation index comprises: accuracy, recall, or F1 score of the model.
14. The method of claim 8, wherein prior to classifying each local model in the ith round into N classes, the method further comprises:
the number of local models in the ith pass is detected to be greater than a preset threshold.
15. The method of claim 1, wherein determining a first model for an i+1 th round of aggregation operation based on each local model information uploaded comprises:
according to the uploaded local model information, determining a local model generated by each second electronic device in the ith round;
and when the number of the local models in the ith round is detected to be smaller than or equal to a preset threshold value, taking the local models in the ith round as the first models in the ith round.
16. The method of claim 1, wherein prior to issuing global model information for the ith round of federal learning to each of the second electronic devices, the method further comprises:
responding to model training requests sent by at least two second electronic devices, and acquiring the initial model matched with the model training requests;
after obtaining the trained federal learning model, the method further comprises:
and respectively issuing the federal learning model to each second electronic device.
17. An electronic device, comprising:
a memory and a processor, the memory coupled with the processor;
the memory stores program instructions that, when executed by the processor, cause the electronic device to perform the federal learning method of any one of claims 1-16.
18. A computer readable storage medium comprising a computer program, characterized in that the computer program, when run on an electronic device, causes the electronic device to perform the method of federal learning according to any one of claims 1-16.
CN202310165522.6A 2023-02-16 2023-02-16 Federal learning method and electronic equipment Pending CN117131951A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310165522.6A CN117131951A (en) 2023-02-16 2023-02-16 Federal learning method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310165522.6A CN117131951A (en) 2023-02-16 2023-02-16 Federal learning method and electronic equipment

Publications (1)

Publication Number Publication Date
CN117131951A true CN117131951A (en) 2023-11-28

Family

ID=88857042

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310165522.6A Pending CN117131951A (en) 2023-02-16 2023-02-16 Federal learning method and electronic equipment

Country Status (1)

Country Link
CN (1) CN117131951A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095512A (en) * 2021-04-23 2021-07-09 深圳前海微众银行股份有限公司 Federal learning modeling optimization method, apparatus, medium, and computer program product
CN113112027A (en) * 2021-04-06 2021-07-13 杭州电子科技大学 Federal learning method based on dynamic adjustment model aggregation weight
CN113965359A (en) * 2021-09-29 2022-01-21 哈尔滨工业大学(深圳) Defense method and device for federal learning data virus attack
CN114398634A (en) * 2022-01-18 2022-04-26 北京工业大学 Federal learning participant weight calculation method based on information entropy
CN114564746A (en) * 2022-02-28 2022-05-31 浙江大学 Federal learning method and system based on client weight evaluation
CN114626547A (en) * 2022-02-08 2022-06-14 天津大学 Group collaborative learning method based on block chain
CN114863092A (en) * 2022-04-29 2022-08-05 广州广电运通金融电子股份有限公司 Knowledge distillation-based federal target detection method and system
US20220366220A1 (en) * 2021-04-29 2022-11-17 Nvidia Corporation Dynamic weight updates for neural networks
CN115511103A (en) * 2022-10-20 2022-12-23 抖音视界有限公司 Method, apparatus, device and medium for federal learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113112027A (en) * 2021-04-06 2021-07-13 杭州电子科技大学 Federal learning method based on dynamic adjustment model aggregation weight
CN113095512A (en) * 2021-04-23 2021-07-09 深圳前海微众银行股份有限公司 Federal learning modeling optimization method, apparatus, medium, and computer program product
US20220366220A1 (en) * 2021-04-29 2022-11-17 Nvidia Corporation Dynamic weight updates for neural networks
CN113965359A (en) * 2021-09-29 2022-01-21 哈尔滨工业大学(深圳) Defense method and device for federal learning data virus attack
CN114398634A (en) * 2022-01-18 2022-04-26 北京工业大学 Federal learning participant weight calculation method based on information entropy
CN114626547A (en) * 2022-02-08 2022-06-14 天津大学 Group collaborative learning method based on block chain
CN114564746A (en) * 2022-02-28 2022-05-31 浙江大学 Federal learning method and system based on client weight evaluation
CN114863092A (en) * 2022-04-29 2022-08-05 广州广电运通金融电子股份有限公司 Knowledge distillation-based federal target detection method and system
CN115511103A (en) * 2022-10-20 2022-12-23 抖音视界有限公司 Method, apparatus, device and medium for federal learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JUNBIN CHEN 等: "Federated Learning for Bearing Fault Diagnosis with Dynamic Weighted Averaging", 《ICSMD》, 11 January 2022 (2022-01-11) *
应作斌 等: "动态聚合权重的隐私保护联邦学习框架", 《网络与信息安全学报》, vol. 8, no. 5, 15 October 2022 (2022-10-15) *

Similar Documents

Publication Publication Date Title
CN111461089B (en) Face detection method, and training method and device of face detection model
CN114186237A (en) Truth-value discovery-based robust federated learning model aggregation method
CN106682906B (en) Risk identification and service processing method and equipment
CN108536753B (en) Method for determining repeated information and related device
US11704893B2 (en) Segment action detection
CN110489659A (en) Data matching method and device
CN111126347B (en) Human eye state identification method, device, terminal and readable storage medium
CN112150280B (en) Federal learning method and device for improving matching efficiency, electronic device and medium
CN114881711B (en) Method for carrying out exception analysis based on request behaviors and electronic equipment
CN103984931A (en) Information processing method and first electronic equipment
CN111476668B (en) Identification method and device of credible relationship, storage medium and computer equipment
CN110008980A (en) Identification model generation method, recognition methods, device, equipment and storage medium
CN109447112A (en) A kind of portrait clustering method, electronic equipment and storage medium
CN117131951A (en) Federal learning method and electronic equipment
CN110222187B (en) Common activity detection and data sharing method for protecting user privacy
CN115376192B (en) User abnormal behavior determination method, device, computer equipment and storage medium
CN116778306A (en) Fake object detection method, related device and storage medium
Ko et al. On data summarization for machine learning in multi-organization federations
CN112929348B (en) Information processing method and device, electronic equipment and computer readable storage medium
WO2019132972A2 (en) Smart context subsampling on-device system
CN113762042A (en) Video identification method, device, equipment and storage medium
CN113094506B (en) Early warning method based on relational graph, computer equipment and storage medium
CN105847293B (en) A kind of method and terminal for realizing system login based on image recognition
CN111325316B (en) Training data generation method and device
CN109508703A (en) A kind of face in video determines method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination