CN113947214A - Client knowledge distillation-based federal learning implementation method - Google Patents

Client knowledge distillation-based federal learning implementation method Download PDF

Info

Publication number
CN113947214A
CN113947214A CN202111410414.8A CN202111410414A CN113947214A CN 113947214 A CN113947214 A CN 113947214A CN 202111410414 A CN202111410414 A CN 202111410414A CN 113947214 A CN113947214 A CN 113947214A
Authority
CN
China
Prior art keywords
client
model
server
training
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111410414.8A
Other languages
Chinese (zh)
Inventor
王建新
王殊
刘渊
张德文
聂璇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Sanxiang Bank Co Ltd
Original Assignee
Hunan Sanxiang Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Sanxiang Bank Co Ltd filed Critical Hunan Sanxiang Bank Co Ltd
Priority to CN202111410414.8A priority Critical patent/CN113947214A/en
Publication of CN113947214A publication Critical patent/CN113947214A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a federal learning implementation method based on client knowledge distillation. The method comprises the following steps: when a new communication turn starts, the server side sends a model parameter set to each client side, and the model parameter set is formed according to model parameters submitted to the server side by all the client sides in the previous communication turn; after each client receives the model parameter set sent by the server, local model parameter training of the current communication turn is started, and during the local model parameter training, characteristic information from other clients is migrated to the local model; when the client finishes the training of the current communication round, the client sends the parameters of the local model to the server; after all the clients send back the new model parameters, the server generates a new set according to the new model parameters for training the next communication turn; the number of communication rounds required by the federal learning modeling is reduced on the premise of not reducing the performance of the model, and the efficiency of the joint modeling by utilizing multi-mechanism data is improved.

Description

Client knowledge distillation-based federal learning implementation method
Technical Field
The invention relates to the technical field of federal learning, in particular to a client knowledge distillation-based federal learning implementation method.
Background
With the further development of big data technology, the importance of data privacy and security has become a global hot issue. At 2018, the international well-known social media "facebook" is penalized by local authorities by allowing some data analysis companies to collect large amounts of data for user group analysis without user permission, causing the stocks of the "facebook" companies to drop and large-scale anti-counseling activities to be triggered. In 2018, in the eu, General Data Protection Regulation (GDPR) was passed to retrieve control of citizens and residents over personal Data, and to simplify unified regulations in the eu for international commerce. Relevant departments in China will also go out related laws such as personal information protection law, data security law and the like to ensure that citizens have control right on personal information. The coming out of these laws and regulations raises the requirements of users on data security and privacy to a new level, thereby making it more difficult to collect and utilize the private data of the users.
In addition to the difficulty in collecting personal data of users, due to the problems of competition and privacy protection among various industries, it is no longer practical to centralize independent data distributed in various organizations into a complete data set and build and train a model with the integrated data set, and since widely distributed data cannot be utilized in a centralized manner, the isolated distributed data are called data islands. In the face of individual data islands, how to safely and legally use widely distributed data to perform combined modeling becomes a key ring in the landing process of an artificial intelligence system.
To address this data islanding problem, various approaches have been proposed by various research institutes to indirectly access the data islanding. The federal learning approach proposed by McMahan et al attempts to solve joint modeling with decentralized data under privacy protection. The cross-device federated learning method provided by Google utilizes an input method of users stored on mobile devices in the world, obtains a word prediction model by learning on the premise of protecting the privacy of the users, and the Sheller and the like regard medical data of a plurality of organizations as a plurality of data islands and obtain a semantic segmentation model after combined modeling by utilizing the federated learning method. The core steps of these methods are as follows: the method comprises the steps that a client logs in a server, the server selects a client terminal set which can run training in the current communication turn, then the server sends the latest global model parameters of the current communication turn to each selected client, then each selected client conducts training by using local private data of the selected client in parallel to obtain new model parameters and sends the new model parameters back to the server, and finally the server conducts weighted average on the model parameters sent back by the selected client and updates the global model parameters. Fig. 1 shows a workflow diagram of a classical federated learning approach.
The federated learning mode provides possibility for performing joint modeling across data islands, but the classical federated learning method needs to perform multiple communication rounds to achieve expected performance in the face of multi-organization data sets, and meanwhile, more communication rounds mean that more data needs to be exchanged between a server and a client, which causes a larger communication burden. Therefore, how to increase the convergence speed of the model and reduce the number of communication rounds is one of the targets of the federal learning method without sacrificing the performance of the global model.
Disclosure of Invention
The invention overcomes the defects of the existing federal learning method when facing multi-organization (an organization is regarded as a client participating in the federal learning training) data, realizes the federal learning method for distilling knowledge at the client, and utilizes the model sent by other clients in the last communication turn and extracts the knowledge information contained in the model so as to improve the local model of the client. The invention reduces the number of communication rounds required by the federal learning modeling on the premise of not reducing the performance of the model, and improves the efficiency of the joint modeling by utilizing multi-mechanism data.
The core part of the invention is a server and a client, and the server comprises the following steps:
step 1: the client is connected to the server, and once the number of connected clients meets the requirement, the learning process is started. And setting the current communication round variable as t, and setting the initial communication round as t as 0.
Step 2: the connected client uploads the initial parameters of the local model to the server, and the server stores the initial parameters as a parameter set. Let θiParameters sent to the server on behalf of client i.
And step 3: the server side sends a parameter set theta to each client sidet={θ0,θ1,…,θKExpressing a parameter set at the t communication turn, wherein K is the number of the clients; the current communication round t is incremented by t + 1.
Each client removes the parameter set of the target client sent by the server after receiving
Figure BDA0003370309270000021
Then the operation is carried out in parallel, and the steps are as follows:
s1: the client k receives the parameter set sent by the server
Figure BDA0003370309270000022
Thereafter using local data XiObtain the value f (X) of its local model outputi;θk)。
S2: the client uses the parameter set sent by the server to merge and combine with the local data to obtain the distribution E with the same number as the parameter setk(XiT), defined as:
Figure BDA0003370309270000023
s3: the client uses KL Divergence (Kullback-Leibler Divergence) DKL(p | q) metric distribution pAnd similarity to q. The similarity of the predicted distribution and the mean distribution using the KL divergence measure in the present invention is defined as:
Figure BDA0003370309270000024
where σ is the activation function. Function of loss
Figure BDA0003370309270000031
As a first part of the overall loss function of the client k, reducing the KL divergence helps to migrate other client data information contained in the parameter set to the current client.
S4: the client calculates the task related loss, which is considered as the second part of the overall loss function. For the semantic segmentation task, a Dice coefficient (Dice coefficient) of the predicted distribution and the labels of the data is calculated as a task loss, and for the classification task, a cross entropy loss function is calculated.
S5: client-side reduction loss function to update parameter theta of local modelk
S6: the client side updates the model parameter thetakAnd sending back to the server.
After all the clients finish sending the model parameters, the server side renews the parameter set thetat-1: and replacing the old parameters corresponding to the client with the new parameters.
And 4, step 4: and if the maximum communication turn is reached, finishing training, otherwise, continuing to step 3.
The invention has the following beneficial effects: the federate learning implementation method based on the knowledge distillation of the client utilizes the information contained in the model parameters sent by the client to the server and utilizes the information when the client carries out local training, thereby improving the performance of the local model of the client, effectively reducing the communication turns required by the federate learning method and accelerating the training process; the information is migrated to the local model by using a knowledge distillation technology, and the information migration process is performed by using a model from a different client as a teacher model and a model local to the client as a student model. Compared with the traditional federal study, the performance of the global model on each client is improved in a mode of carrying out model aggregation on the server, the performance of the local model is directly improved by the client in a local knowledge distillation mode, the time required by joint modeling is further reduced by the mode of directly updating the local model, and meanwhile, the performance of the model is improved.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any inventive exercise.
FIG. 1 is a schematic workflow diagram of a classical federated learning approach;
FIG. 2 is a federated learning implementation of the present invention;
FIG. 3 is a diagram of a server sending a set of parameters to a client;
FIG. 4 is a schematic illustration of a client performing local knowledge distillation;
FIG. 5 is a schematic diagram of an application example of the method under financial wind control.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the specific embodiments of the present invention and the accompanying drawings. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. The technical solutions provided by the embodiments of the present invention are described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic workflow diagram of a classical federated learning method, and one of the objectives of federated learning is to establish a general global model capable of performing well on each client, so that the global model needs to utilize information extracted by using its own data in the client, and how to utilize the data information of the client has extensive research in a traditional centralized learning manner, one of which is a knowledge distillation technology. In the knowledge distillation technology, two models are provided, namely a teacher model with huge parameter quantity and a student model with relatively small parameter quantity, and the aim is to transfer the contained knowledge information to the student model by using the teacher model with strong expression capability, so that the student model can show the performance similar to the teacher model with smaller model parameter quantity. In the invention, each client migrates the knowledge contained in the client into the local model by using the parameter information of other clients acquired from the server before performing local calculation, thereby improving the performance of the local model.
The invention provides a new federal learning implementation mode, which is used for solving the problem that the convergence speed of a federal learning method is low (for example, more communication turns are needed) when the federal learning method is used for training on the basis of a cross-island multi-mechanism data set. In the invention, the client acquires other models sent to the server by the client from the server, and the models are regarded as teacher models, and then the knowledge distillation is utilized to transfer the information contained in the models to the local models. In order to help users to better understand the objective and workflow of the present invention, the technical details of the present invention will be described below with reference to the accompanying drawings. In fig. 2, there are K clients and their local models, which need to transmit the model to the server in the initial stage, and the server will save the model parameters of these clients as a set. Assuming that the current task is an image semantic segmentation task, image data X exists in local data of the client kiAnd corresponding tag data YiIs stored as DkIn order from D when the client is trainingkMiddle read data-tag pair
Figure BDA0003370309270000041
The models used by the clients are all the same structure, and each client has an independent local model which participates inNumber thetak
Step 1: the server receives the initialized parameters from the client and stores the parameters in theta0。t=0;
Step 2: t is t +1, and the server sends the parameter set of other clients except the client k to the client k
Figure BDA0003370309270000042
The process is shown in figure 3. The client side runs training in parallel, and the local model theta is updated for any client side kkThe steps of (1):
s1: data set D based on current client kkGet data-tag pairs (X)i,Yi) Where i represents an index of data
Figure BDA0003370309270000043
S2: after the client receives the parameter set sent by the server, two loss functions are calculated. Please refer to fig. 3, which includes the following steps:
a: based on XiObtaining the predicted label output f (X) of the current client local modeli;θk);
B: based on XiAnd obtaining average output E of other client models by using formula (1)k(XiT), σ is the Softmax activation function;
c: calculating KL divergence L using equation (2)dAs part of the overall loss function L;
d: the second partial loss function is the mission loss function LtaskIn the image segmentation task, a Dice coefficient is often used as a loss, and is defined as follows:
Figure BDA0003370309270000051
e: the overall loss function is defined as:
L=αLtask+(1-α)Ld (4)
where α is a weighting factor for the loss function.
F: updating client local model theta using loss functionkAs follows:
Figure BDA0003370309270000052
where η is the learning rate.
S3: circularly operating the steps A to F until i ═ Dk|;
S4: when the client k finishes training, a new theta is addedkSending back to the server;
and step 3: when the server receives a new parameter theta from the client kkThereafter, the theta is replacedt-1Old thetak. When all the client sending parameters are received and the theta is updatedt-1Then, jump is made to step 2.
Take financial wind control based on the horizontal federal learning domain as an example. Financial institutions face the problem of sample shortage in anti-fraud scenarios, and can be applied to horizontal federal learning in anti-fraud scenarios among banking industries. Therefore, the invention provides a federal learning implementation mode which can construct a high-quality financial wind control model in banking and on the premise of protecting user privacy. For example, bank a has transaction data and anti-fraud data of users in one region, while bank B has transaction data and anti-fraud data of users in another region, in order to optimize anti-fraud models of both parties under the condition of protecting bank data privacy, modeling across banking institutions may be performed by using the method, and the working flow is shown in fig. 5. Legend: the bank organization A/B respectively obtains the latest bank B/A model stored in the server, and the bank organization A/B trains by using local data and updates the local model. And uploading the latest model parameters to the server.
Unlike the traditional federal learning algorithm which constructs a universal global model by carrying out weighted average on model parameters on a server side, the method provided by the invention improves the performance of the model on a local model by carrying out knowledge distillation on the local part of the client side. By feedback of the example, the method effectively improves the model learning efficiency and reduces the number of communication rounds required by joint modeling by utilizing multi-mechanism data while not reducing the model performance.
The embodiment of the invention also provides a storage medium, wherein a computer program is stored in the storage medium, and when being executed by a processor, the computer program realizes part or all of the steps in each embodiment of the client knowledge distillation-based federal learning realization method provided by the invention. The storage medium may be a magnetic disk, an optical disk, a Read-only memory (ROM) or a Random Access Memory (RAM).
Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention.

Claims (6)

1. A realization method of federal learning based on client knowledge distillation is characterized by comprising the following steps:
the method comprises the steps that when a new communication turn starts, a server side sends a model parameter set to each client side, wherein the model parameter set is formed by model parameters submitted to the server side by all the client sides in the previous communication turn;
after each client receives a model parameter set sent by a server, starting local model parameter training of the current communication turn, wherein the local model is a model stored locally at the client; when local model parameter training is carried out, feature information from other clients is transferred into a local model;
when the client finishes the training of the current communication round, sending the parameters of the local model to the server;
after waiting for all clients to send back new model parameters, the server generates a new set according to the new model parameters for training of the next communication turn.
2. The method of claim 1, wherein the server sending a set of model parameters for each client at the beginning of a new communication round comprises:
before the new communication round t begins, the server side performs the formula
Figure FDA0003370309260000011
Sending to each client k the model parameters formed by the model parameters sent by the other clients of the last communication turn (t-1) to the server
Figure FDA0003370309260000012
Wherein, thetakThe model theta is sent to the server side for the client side k in the (t-1) communication turnt-1The model parameters are set by model parameters sent by all the clients to the server after the last communication turn (t-1).
3. The method of claim 2, wherein starting the local model parameter training for the current communication round after each client receives the model parameter set sent by the server comprises:
after receiving the model parameter set sent by the server, defining the characteristic information E from other clients at any client k of the communication turn tk(XiT) is as follows:
Figure FDA0003370309260000013
wherein N is the number of all clients, XiData samples owned by the client, f (X)i;θk) A predicted distribution output for the local model.
4. The method of claim 1, wherein migrating feature information from other clients into the local model while performing local model parameter training comprises:
at any client k in the communication turn t, the characteristic information E from other clients to be definedk(XiT) migration to the local model parameter θkThe similarity of the distribution and the mean distribution predicted using the KL divergence measure is obtained:
Figure FDA0003370309260000014
where σ is the activation function, DKLIs the KL divergence.
5. The method of claim 4, wherein sending parameters of the local model to the server when the client completes training of the current communication round comprises:
at any client k of the communication turn t, the updated parameter theta is sent to the serverk
6. The method of claim 5, wherein after waiting for all clients to send back new model parameters, the server generates a new set from the new model parameters for training of a next communication round comprising:
parameter set theta sent by client side through server side12,…,θNRespectively replace thetat-1And start training for the next communication round, wherein,θiParameters sent to the server on behalf of client i.
CN202111410414.8A 2021-11-23 2021-11-23 Client knowledge distillation-based federal learning implementation method Pending CN113947214A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111410414.8A CN113947214A (en) 2021-11-23 2021-11-23 Client knowledge distillation-based federal learning implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111410414.8A CN113947214A (en) 2021-11-23 2021-11-23 Client knowledge distillation-based federal learning implementation method

Publications (1)

Publication Number Publication Date
CN113947214A true CN113947214A (en) 2022-01-18

Family

ID=79338692

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111410414.8A Pending CN113947214A (en) 2021-11-23 2021-11-23 Client knowledge distillation-based federal learning implementation method

Country Status (1)

Country Link
CN (1) CN113947214A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115423540A (en) * 2022-11-04 2022-12-02 中邮消费金融有限公司 Financial model knowledge distillation method and device based on reinforcement learning
CN117437039A (en) * 2023-12-21 2024-01-23 湖南三湘银行股份有限公司 Commercial bank loan wind control method based on longitudinal federal learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115423540A (en) * 2022-11-04 2022-12-02 中邮消费金融有限公司 Financial model knowledge distillation method and device based on reinforcement learning
CN115423540B (en) * 2022-11-04 2023-02-03 中邮消费金融有限公司 Financial model knowledge distillation method and device based on reinforcement learning
CN117437039A (en) * 2023-12-21 2024-01-23 湖南三湘银行股份有限公司 Commercial bank loan wind control method based on longitudinal federal learning
CN117437039B (en) * 2023-12-21 2024-04-30 湖南三湘银行股份有限公司 Commercial bank loan wind control method based on longitudinal federal learning

Similar Documents

Publication Publication Date Title
Chen et al. Asynchronous online federated learning for edge devices with non-iid data
Liu et al. Competing bandits in matching markets
CN111931062B (en) Training method and related device of information recommendation model
CN110347932B (en) Cross-network user alignment method based on deep learning
CN111428147A (en) Social recommendation method of heterogeneous graph volume network combining social and interest information
CN113947214A (en) Client knowledge distillation-based federal learning implementation method
CN110852447A (en) Meta learning method and apparatus, initialization method, computing device, and storage medium
EP4350572A1 (en) Method, apparatus and system for generating neural network model, devices, medium and program product
US11514368B2 (en) Methods, apparatuses, and computing devices for trainings of learning models
CN112436992B (en) Virtual network mapping method and device based on graph convolution network
CN115344883A (en) Personalized federal learning method and device for processing unbalanced data
CN111428127A (en) Personalized event recommendation method and system integrating topic matching and two-way preference
Ma et al. Adaptive distillation for decentralized learning from heterogeneous clients
CN112085158A (en) Book recommendation method based on stack noise reduction self-encoder
CN117236421B (en) Large model training method based on federal knowledge distillation
CN112487305B (en) GCN-based dynamic social user alignment method
CN116757262B (en) Training method, classifying method, device, equipment and medium of graph neural network
Ostonov et al. Rlss: A deep reinforcement learning algorithm for sequential scene generation
CN116992151A (en) Online course recommendation method based on double-tower graph convolution neural network
CN116957106A (en) Federal learning model training method based on dynamic attention mechanism
WO2023169167A1 (en) Model training method and apparatus, and device and storage medium
CN117009674A (en) Cloud native API recommendation method integrating data enhancement and contrast learning
WO2020151017A1 (en) Scalable field human-machine dialogue system state tracking method and device
CN116431915A (en) Cross-domain recommendation method and device based on federal learning and attention mechanism
CN115470520A (en) Differential privacy and denoising data protection method under vertical federal framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination