CN116259057A - Method for solving data heterogeneity problem in federal learning based on alliance game - Google Patents

Method for solving data heterogeneity problem in federal learning based on alliance game Download PDF

Info

Publication number
CN116259057A
CN116259057A CN202310167065.4A CN202310167065A CN116259057A CN 116259057 A CN116259057 A CN 116259057A CN 202310167065 A CN202310167065 A CN 202310167065A CN 116259057 A CN116259057 A CN 116259057A
Authority
CN
China
Prior art keywords
client
alliance
neural network
convolutional neural
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310167065.4A
Other languages
Chinese (zh)
Inventor
惠一龙
胡洁
赵高升
麻小晴
杨宇燊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202310167065.4A priority Critical patent/CN116259057A/en
Publication of CN116259057A publication Critical patent/CN116259057A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/1801Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/042Backward inferencing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1059Inter-group management mechanisms, e.g. splitting, merging or interconnection of groups

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for solving the problem of data heterogeneity in federal learning based on alliance gaming, which comprises the following steps: 1) Constructing a convolutional neural network; 2) Generating a training set of each client; 3) Each client transmits the label class set and the iteration time to a server through base station communication; 4) The server obtains the optimal alliance partition; 5) The server groups each federation in the optimal federation partition; 6) And performing cooperative training on the convolutional neural network by utilizing federal learning. The method can solve the problem of overlarge weight divergence of the local model and the global model caused by data heterogeneity in data federal learning, improves the performance of the federal learning model, improves federal learning algorithm by utilizing the differentiation of the computing resources of the client, accelerates the convergence speed of federal learning, and can be used for solving the problem of data heterogeneity in federal learning.

Description

Method for solving data heterogeneity problem in federal learning based on alliance game
Technical Field
The invention belongs to the technical field of edge calculation, and further relates to a method for solving the problem of data heterogeneity in federation learning based on alliance gaming in the technical field of data processing.
Background
More and more cell phones and tablet computers are the primary computing devices used by many people, where powerful sensors (including cameras, microphones and GPS) can obtain data that was not available. These data are used for machine learning training, but transmitting the data to the traditional machine learning model of server-focused training presents a number of problems: communication overhead is excessive, server computing resources are limited, and privacy security is a problem. Federal learning frameworks have been proposed to address the above problems.
In conventional machine learning, data is distributed on the same machine, and it is assumed that the data is sampled independently from the same distribution, i.e., the data is independently distributed (Independently Identically Distribution, IID). Federal learning is a form of machine learning involving multiple devices, each client providing a local model with its own data set, and the server creating a hybrid model with these local model parameters with the goal of making the hybrid model perform better than any client alone. Because the device belongs to a certain user, enterprise or scene, the data distribution is often extremely different, namely, the data is Non-independent and co-distributed (Non-Independent and Identically Distributed, non-IID). The nature of the client personal data results in changes to the local model that only apply to the client's own data, but not to other data sets, compromising the performance of the hybrid model. The data of the non-independent and same distribution causes great difference in updating direction of parameters on each client, and causes more difference between the global model parameters and the local model parameters formed by final mixing. The problem that the accuracy of the federal learning model is obviously reduced due to the data which are not independently distributed in the same way is the problem of data heterogeneity. The difference between the global model parameters obtained by federal learning aggregation and the parameters obtained by updating the local model through a random gradient descent method is defined as weight divergence. When the weight divergence is larger, the difference between the global model and the local model is larger, and the federal learning performance is poorer.
Brendan McMahan et al in its published paper "Communication-Efficient Learning Of Deep Networks From Decentralized Data" (Artificial Intelligence and Statistics. (2017): 1273-1282.) propose a method of deep network joint learning based on iterative model averaging. The method comprises the following implementation steps: first, a certain proportion of clients are selected. Second, the loss gradient of all data held by the selected client is calculated. And thirdly, the selected client uses a gradient descent method to update local model parameters. And fourthly, the server takes a weighted average value of the local models of the selected clients to form a global model. The method can obtain relatively excellent results when processing independent co-distributed data, but the method still has the defect that when facing non-independent co-distributed data, the difference of local model parameters and global model weights caused by data heterogeneity cannot be reduced.
In patent literature "a method for solving the problem of data unbalance in federal learning based on a second derivative" (application number: 202110917450.7, application publication number: CN 113691594B) applied by the university of Hangzhou electronic technology, a method for solving the problem of data heterogeneity in federal learning based on a second derivative is proposed, and the implementation steps of the method are as follows: first, the cloud server initializes the global model and the proxy dataset. And secondly, when the test precision of the first round of global iteration or the current round of iteration is smaller than a certain threshold value compared with the test precision of the previous round of iteration, the cloud server obtains the global model by calculating the second derivative of the loss function with respect to the global model parameters. Third, the cloud server transmits the global model, the global model parameter importance weight and the global data unbalance information to the edge client. Fourthly, the edge client builds a regular term according to the received global model, the global model parameter importance weight and the global data unbalance information, adds the regular term to a preset optimization target to form a new optimization target, so that the difference between the local model and the global model is reduced, the contribution of a large class to the global model is reduced, then the local data is utilized to perform model training locally, and the trained local model is uploaded to the cloud server. And fifthly, the cloud server updates the global model by using the received local model. And sixthly, the cloud server judges whether the precision of the global model reaches a preset value, if not, the second step is returned, and if so, the training is finished. The method solves the problem that the global model training is influenced by the differentiation of the local model and the global model caused by the dependent co-distributed data. However, this method still has the disadvantages: the computing resources of the client are not fully utilized, and the convergence speed of the model is low.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method for solving the problem of data heterogeneity in federation learning based on federation game aiming at the data heterogeneity challenge in an edge computing scene, which is used for solving the problem of overlarge weight divergence of a local model and a global model caused by the data heterogeneity in data federation learning, improving the performance of the federation learning model, improving a federation learning algorithm by utilizing the differentiation of client computing resources and accelerating the convergence rate of federation learning.
The idea for realizing the purpose of the invention is as follows: the present invention uses the average bulldozing distance (EMD) to measure the Non-IID level of client data. EMD is the minimum cost of normalizing from one distribution to another. The Non-IID degree refers to the degree of difference between the probability distribution of each type of data in the client's data set and the probability distribution of each type of data in the ideal data set. Thus, EMD is defined as the probability distribution distance for each class of data.
Non-IID data can cause weight differences between local and global models, and therefore the accuracy of federal learning models can be affected by data distribution bias. The invention carries out alliance game according to EMD, and gathers the clients into several alliances. And the Non-IID degree of the same alliance formed by different clients is calculated in an iterative mode, and the alliance with large Non-IID degree can be eliminated, so that the alliance data set Non-IID degree of each alliance is very low when the game is ended, the weight difference between the local model and the global model of each alliance is small, and the accuracy of the alliance learning model is improved. The invention divides the clients into a plurality of subgroups according to the differentiation of the computing resources of the clients, the computing resources of the clients of each subgroup are similar, the rounds of local training of each subgroup in one round of training are different, and the rounds of local training of the subgroups with more computing resources are integral multiples of the rounds of local training of the subgroups with less computing resources. Therefore, the clients with more computing resources can fully utilize the computing resources, so that time is not wasted for waiting for the clients with less computing resources to finish local training, and the convergence speed of the alliance learning model is accelerated.
In order to achieve the above purpose, the specific implementation steps of the present invention include the following:
step 1, constructing a convolutional neural network:
step 1.1, building a convolutional neural network with the same structure for each client, wherein each layer of the network is formed by connecting layers in series, and the structure is as follows: the input layer, the first convolution layer, the first pooling layer, the second convolution layer, the second pooling layer and the full connection layer;
step 1.2, setting super parameters of a convolutional neural network: setting the number of neurons of the input layer to 28×28; the convolution kernels of the first convolution layer and the second convolution layer are respectively set to be 5 multiplied by 5 and 3 multiplied by 3, the number of the convolution kernels is set to be 64, and the sliding step length is set to be 1; setting the sizes of pooling windows of the first pooling layer and the second pooling layer to be 2 multiplied by 2, and setting the sliding step length to be 2; the activation functions are realized by adopting ReLU; setting the number of output neurons of a full-connection layer to 10, and realizing an activation function by adopting Softmax;
step 2, generating a training set of each client:
step 2.1, forming a sample set of each client from the handwriting digital picture of the client; labeling each handwritten digital picture of each client sample set with a label;
step 2.2, carrying out mean variance normalization processing on each picture marked by each sample set, enabling the processed data to accord with standard front distribution, and enabling the pictures normalized by each sample set to form a training set of the client;
step 3, each client transmits the label class set and the iteration time to the server through base station communication;
step 4, the server obtains the optimal alliance partition:
step 4.1, taking each client as a alliance, and forming an alliance partition by all the alliances;
step 4.2, generating benefits of each alliance in the alliance partition according to the following formula:
Figure BDA0004096226930000041
wherein V is j Representing the benefit of the j-th federation in a federated partition, log (& gt) represents a base 10 log operation, || represents an absolute value operation, n m Representing the number of handwritten digital pictures with m-type labels of all client training sets in jth alliance in alliance partition, D j Representing the total number of hand-written digital pictures in the training set of all clients in the jth alliance in the alliance partition;
step 4.3, utilizing a alliance game forming algorithm to continuously leave the client from the original alliance, adding other alliances, and forming a new alliance partition by all the alliances when the sum of benefits of the two alliances is increased;
step 4.4, when any client-side joins any coalition and cannot make the sum of benefits of the original coalition and the newly joined coalition larger, and no new coalition partition is generated any more, stopping iteration of the coalition game forming algorithm; deleting the hollow alliances of the alliance partition, and forming the rest alliances into an optimal alliance partition;
step 5, the server groups for each federation in the optimal federation partition:
step 5.1, the server finds the client with the minimum iteration time in each alliance according to the iteration time uploaded by each client, and the client is used as an alliance leader of each alliance;
step 5.2, the server calculates the local training round of each client;
step 5.3, forming a group by the clients with the same local training round in each alliance;
step 6, performing cooperative training on the convolutional neural network by utilizing federal learning:
step 6.1, the server transmits the same convolutional neural network parameter matrix to each client in the optimal alliance partition;
step 6.2, each client updates the own convolutional neural network by using the received convolutional neural network parameter matrix;
step 6.3, each client inputs each training set into the corresponding convolutional neural network, calculates a convolutional neural network parameter matrix after 10 times of iterative updating of the convolutional neural network of each client by using an SGD gradient descent algorithm, and uploads the convolutional neural network parameter matrix to a alliance leader;
step 6.4, the alliance leader of each alliance receives the convolutional neural network parameter matrix of the client, and averages the received convolutional neural network parameter matrix with different characteristics;
step 6.5, the alliance leader of each alliance judges whether the parameter matrix of the convolutional neural network of the client of all the groups is received or not, if yes, the average value of the parameter matrix of the convolutional neural network is sent to the server, and then the step 6.6 is executed; otherwise, the average value of the parameter matrix of the convolutional neural network is sent to the client and then the step 6.2 is executed;
step 6.6, the server averages the parameter matrix of the convolutional neural network with different characteristics of all the alliances received by the server, and then transmits the average value to each alliance leader, and each alliance leader transmits the average value of the parameter matrix to each alliance client;
step 6.7, judging whether the server has executed the averaging operation in step 6.6 for 500 times, if yes, ending the cooperative training, updating the convolutional neural network of the server by using the parameter matrix average value, and executing step 7.1; otherwise, executing the step 6.2;
step 7, predicting the type of the handwritten digital picture of the server:
and 7.1, processing the handwritten digital picture of the server by adopting the same preprocessing method as the step 2 to obtain a test set of the server.
And 7.2, inputting all images of the server test set into a convolutional neural network of the server, and outputting a predicted handwriting digital recognition result.
Compared with the prior art, the invention has the following advantages:
first, the invention utilizes EMD to carry out alliance game on the client with high Non-IID degree, reduces the Non-IID degree of each alliance, ensures that the weight difference between the local model and the global model of each alliance is small, overcomes the defect of large weight difference between the local model and the global model caused by data heterogeneity in the prior art, and ensures that the invention has the advantage of high precision of the alliance learning model.
Secondly, the invention utilizes the differentiation of the computing resources to carry out grouping strategy on the client, determines the local training round for the client according to the grouping, fully utilizes the computing resources of the client, overcomes the defect of the waste of the computing resources of the client in the prior art, and ensures that the invention has the advantage of high convergence rate of the federal learning model.
Drawings
FIG. 1 is a flow chart of the present invention;
fig. 2 is a simulation diagram of the present invention.
Detailed Description
The steps for implementing the present invention are further described below with reference to fig. 1 and the embodiment.
The embodiment of the invention comprises 30 clients and a server, wherein the clients train a convolutional neural network which can be used for identifying all handwritten numbers by using a training set generated by own handwritten digital pictures, the handwritten digital pictures only contain one number, and the trained convolutional neural network can not only identify the handwritten numbers in 30 training sets, but also identify handwritten digital pictures out of 30 training sets.
The convolutional neural network is particularly suitable for processing data such as pictures, videos, audios, language words and the like, and is a neural network structure with the most obvious advantages in the field of image recognition. By constructing a convolutional neural network with the same structure at each client, inputting each picture in the training set into the convolutional neural network to learn the characteristics of each picture, and updating the parameters of the convolutional neural network, the parameters of the convolutional neural network of each client are different. By learning the characteristics of each picture in the training set, the convolutional neural network of each client has better performance. The learning characteristics of 30 clients are summarized at the server, and finally the convolutional neural network constructed at the server learns the characteristics of the handwritten digital pictures in the training set of all clients, so that the convolutional neural network at the server can not only identify the handwritten numbers in the 30 training sets, but also accurately identify all the handwritten digital pictures.
And 1, constructing a convolutional neural network.
Step 1.1, constructing a convolutional neural network with the same structure at each client, wherein each layer of the network is formed by connecting layers in series, and the structure is as follows: input layer, first convolution layer, first pooling layer, second convolution layer, second pooling layer, full connected layer.
Step 1.2, setting super parameters of a convolutional neural network: the number of input layer neurons is set to 28×28. The convolution kernels of the first convolution layer and the second convolution layer are respectively set to be 5 multiplied by 5 and 3 multiplied by 3, the number of the convolution kernels is set to be 64, and the sliding step length is set to be 1. And setting the sizes of the pooling windows of the first pooling layer and the second pooling layer to be 2 multiplied by 2, and setting the sliding step sizes to be 2. The activation functions are all implemented with Relu. The number of output neurons of the full-connection layer is set to 10, and the activation function is realized by adopting Softmax. The super parameters of each layer of network of each client are the same, the super parameters refer to the number of neurons of each layer of network or the network size, the parameters of each layer of network of each client are different, and the parameters refer to the weight matrix of each layer of network.
And 2, generating a training set of each client.
Step 2.1, forming a sample set of each client from the handwriting digital picture of the client, and forming 30 sample sets in the embodiment of the invention. Because each client has not only the handwritten digital pictures, but also other pictures, such as flower pictures and portrait pictures, the invention only selects the handwritten digital pictures of each client to form a sample set, and each handwritten digital picture only contains one handwritten number. Labeling each handwritten digital picture of each client sample set, wherein the sample set consists of the handwritten digital pictures, so that the picture labels are respectively as follows: 0,1,2,3,4,5,6,7,8,9. For example, a client contains 3 pictures, and the first picture contains only one handwritten data 0, so the label of the picture is labeled 0. The second picture contains only one handwritten data 1, so the label of this picture is labeled 1. The third picture contains only one handwritten data 2, so the label of this picture is marked 2.
And 2.2, carrying out mean variance normalization processing on each picture marked by each sample set, enabling the processed data to accord with standard front distribution, and enabling the pictures normalized by each sample set to form a training set of the client.
And 3, each client transmits the label class set and the iteration time to the server.
Each client communicates its own tag class set and iteration time to the server via the base station. The label class set refers to how many handwritten digital pictures are in each training set, and the iteration time refers to the time when each client performs one-time convolution on each training data in the training set to obtain a predicted label of the training data. The 1 st client has 300 handwritten digital pictures, wherein 100 handwritten digital pictures with the label of 0 type are 100 handwritten digital pictures with the label of 1 type and 100 handwritten digital pictures with the label of 5 type are 100 handwritten digital pictures. The tag class set constituting the 1 st client is {100,100,0,0,0,100,0,0,0,0}. The 2 nd client has 300 handwritten digital pictures, 100 handwritten digital pictures with the label of 7 and 100 handwritten digital pictures with the label of 8 and 100 handwritten digital pictures with the label of 9, and the label class set of the 2 nd client is {0,0,0,0,0,0,0,100,100,100}.
And step 4, the server obtains the optimal alliance partition.
And 4.1, taking each client as a alliance, and forming an alliance partition by all the alliances. In the embodiment of the invention, 30 client sides form 30 alliances, the first alliance comprises the 1 st client side, the second alliance comprises the 2 nd client side, the 30 th alliance comprises the 30 th client side, and the 30 alliances form an alliance partition.
Step 4.2, generating benefits of each alliance in the alliance partition according to the following formula:
Figure BDA0004096226930000071
wherein V is j Representing the benefit of the j-th federation in a federated partition, log (& gt) represents a base 10 log operation, || represents an absolute value operation, n m Representing the number of handwritten digital pictures with m-type labels of all client training sets in jth alliance in alliance partition, D j Representing the total number of handwritten digital pictures in the training set for all clients in the jth federation in the federation partition. The invention is thatBenefit V of the first federation in the examples 1 Approximately 0.35; benefit V of the second federation 2 ≈0.35。
And 4.3, continuously leaving the client from the original alliance by utilizing an alliance game forming algorithm, adding other alliances, and forming a new alliance partition by all the alliances when the sum of benefits of the two alliances is increased. The sum of the benefit of the first alliance and the benefit of the second alliance in the embodiment of the invention is 0.7, when the 1 st client leaves the first alliance and joins the second alliance, the benefit of the first alliance becomes V 1 The benefit of the second federation becomes V 2 The sum of the benefits of the first federation and the benefits of the second federation becomes 0.88, and the sum of the benefits of the two federations increases, forming a new federation partition comprising 30 federations, the first federation being an empty set, the second federation comprising a 1 st client and a 2 nd client, the 30 th federation comprising a 30 th client.
The coalition game formation algorithm is one proposed according to Hui yilonig et al in its published paper "A Game Theoretic Scheme for Optimal Access Control in Heterogeneous Vehicular Networks" (IEEE transactions on intelligent transportation systems 2019,20 (12): 4590-4603).
And 4.4, stopping iteration of the alliance game forming algorithm when any client-side joining any alliance can not make the sum of benefits of the original alliance and the newly joined alliance larger and no new alliance partition is generated. When the coalition game formation algorithm ceases, the coalition partition in the embodiment of the invention contains 30 coalitions,second allianceComprises a 1 st client, a 2 nd client, a 3 rd client, a 4 th client and a 5 th client,third allianceComprises a 6 th client, a 7 th client, an 8 th client, a 9 th client, a 10 th client and an 11 th client,ninth allianceComprises a 12 th client, a 13 th client, a 14 th client, a 15 th client, a 16 th client, a 17 th client, a 18 th client, a 19 th client, a 20 th client and a 21 st client,tenth allianceComprises a 22 th client, a 23 rd client, a 24 th client, a 25 th client, a 26 th client, a 27 th client, a 28 th client, a 29 th client and a 30 th client,the rest alliances are empty sets
And 4.5, deleting the hollow alliances in the alliance partition, and forming the rest alliances into the optimal alliance partition. The optimal federation partition in the embodiment of the present invention contains 4 federations: the second, third, ninth, tenth federation.
Step 5, the server groups each federation in the optimal federation partition
And 5.1, the server finds the client with the minimum iteration time in each alliance according to the iteration time uploaded by each client, and the client is used as an alliance leader of each alliance. In the embodiment of the invention, the iteration time of the 1 st client of the second alliance is 0.2s, the iteration time of the 2 nd client is 0.25s, the iteration time of the 3 rd client is 0.3s, the iteration time of the 4 th client is 0.35s, the iteration time of the 5 th client is 0.6s, and the alliance leader of the second alliance is the 1 st client.
Step 5.2, the server calculates the local training round for each client according to the following formula:
Figure BDA0004096226930000091
wherein mu i Representing the local training round of the ith client,
Figure BDA0004096226930000092
representing a downward rounding symbol, ">
Figure BDA0004096226930000093
Representing the client with the largest iteration time in each federation,>
Figure BDA0004096226930000094
when iterating on behalf of the client with the largest iteration time within each federationIn the middle of the process, the process comprises the steps of, t i representing the iteration time of the ith client. In the embodiment of the invention, the local training round of the 1 st client of the second alliance is 30 times, the local training round of the 2 nd client is 20 times, the local training round of the 3 rd client is 20 times, the local training round of the 4 th client is 10 times, and the local training round of the 5 th client is 10 times.
And 5.3, forming a group by the clients with the same local training round in each alliance. In the embodiment of the invention, the second federation forms 3 subgroups, wherein the first subgroup comprises a 1 st client, the second subgroup comprises a 2 nd client and a 3 rd client, and the third subgroup comprises a 4 th client and a 5 th client.
And 6, performing cooperative training on the convolutional neural network by utilizing federal learning.
And 6.1, the server transmits an identical convolutional neural network parameter matrix to each client in the optimal alliance partition.
And 6.2, each client updates the convolutional neural network by using the received convolutional neural network parameter matrix.
And 6.3, each client inputs each training set into the corresponding convolutional neural network, calculates a convolutional neural network parameter matrix after 10 times of convolutional neural network iteration updating of each client by using an SGD gradient descent algorithm, and uploads the convolutional neural network parameter matrix to a alliance leader.
And 6.4, the alliance leader of each alliance receives the convolutional neural network parameter matrix of the client, and averages the received parameter matrix of the convolutional neural network with different characteristics.
Step 6.5, the alliance leader of each alliance judges whether the parameter matrix of the convolutional neural network of the client of all the groups is received or not, if yes, the average value of the parameter matrix of the convolutional neural network is sent to a server, and step 6.6 is executed; otherwise, the average value of the parameter matrix of the convolutional neural network is sent to the client which sends the parameter matrix to the client in the step 6.4, and the step 6.2 is executed. In the embodiment of the invention, the alliance leader of the second alliance receives the parameter matrix of the 1 st subgroup at the 2 nd, averages the parameter matrix of the 1 st subgroup and then sends the parameter matrix to the 1 st client, receives the parameter matrix of the 2 nd subgroup at the 3 rd s, averages the parameter matrix of the 2 nd subgroup and then sends the parameter matrix to the 2 nd client and the 3 rd client, and receives the parameter matrix of the 1 st subgroup at the 4 th s and then sends the parameter matrix of the 1 st subgroup to the 1 st client. And receiving the parameter matrixes of the 1 st subgroup, the 2 nd subgroup and the 3 rd subgroup at the 6s, averaging the parameter matrixes of the 1 st subgroup, the 2 nd subgroup and the 3 rd subgroup, and then sending the averaged parameter matrixes to the server.
And 6.6, the server averages the parameter matrix of the convolutional neural network with different characteristics of all the alliances received by the server, and then transmits the average value to each alliance leader, and each alliance leader transmits the average value of the parameter matrix to each alliance client.
Step 6.7, judging whether the server has executed the averaging operation in step 6.6 for 500 times, if yes, ending the cooperative training, updating the convolutional neural network of the server by using the parameter matrix average value, and executing step 7.1; otherwise, executing the step 6.2;
step 7, predicting the type of the handwritten digital picture of the server:
and 7.1, processing the handwritten digital picture of the server by adopting the same preprocessing method as the step 2 to obtain a test set of the server.
And 7.2, inputting all images of the server test set into a convolutional neural network of the server, and outputting a predicted handwriting digital recognition result.
The effects of the present invention are further described below in conjunction with simulation experiments:
1. simulation experiment conditions:
the platform of the simulation experiment of the invention is: windows 11 operating system and PyCharm2021.
The training set and the testing set used in the simulation experiment are the training set and the testing set of MNIST data set.
2. Simulation content and result analysis:
the simulation experiment of the invention is to respectively identify the handwritten digital pictures by adopting the invention and two prior arts (a federal average method and a second derivative federal learning method).
In simulation experiments, two prior art techniques employed refer to:
the prior art federal averaging method refers to: the method of deep network joint learning based on iterative model averaging, abbreviated as federal averaging method, is proposed in the paper "Communication-Efficient Learning Of Deep Networks From Decentralized Data" (Artificial Intelligence and Statistics. (2017): 1273-1282) published by brendan McMahan et al.
The prior art second derivative federal learning method is as follows: the Hangzhou electronic technology university proposes a method for solving the problem of data heterogeneity in federal learning based on a second derivative, namely a method for solving the problem of data imbalance in federal learning based on a second derivative (application number: 202110917450.7, application publication number: CN 113691594B) in the patent literature filed by Hangzhou electronic technology university, which is called a second derivative federal learning method for short.
The effects of the present invention are further described below in conjunction with the simulation diagram of fig. 2.
Fig. 2 is a comparison chart of recognition accuracy of hand-written digital pictures of a test set by three methods of the simulation experiment of the present invention. The recognition accuracy refers to the proportion of 10000 handwritten digital pictures in the test set to be correctly recognized.
The abscissa in fig. 2 represents the training round of the server and client cooperatively training the convolutional neural network using federal learning, and the ordinate represents the accuracy of identifying the handwritten number class of the test set using the trained convolutional neural network. The solid line in fig. 2 represents the precision curves obtained with the method of the present invention under different training runs, the dotted solid line in fig. 2 represents the precision curves obtained with the federal averaging method under different training runs, and the dotted line in fig. 2 represents the precision curves obtained with the second derivative federal learning method under different training runs.
As can be seen from fig. 2, under any fixed training round, compared with the accuracy obtained by the other two methods, the accuracy obtained by the method can always bring the highest recognition accuracy to the convolutional neural network of the server, mainly because the federal average method cannot reduce the local model parameters and global model weight differences caused by data heterogeneity when facing to non-independent co-distributed data; for the second derivative federal learning method, the computing resources of the client are not fully utilized, and the convergence speed of the model is slower under a fixed training round.
The simulation experiment shows that: the method can reduce the data heterogeneity between the client training sets through the alliance game, solves the problem of overlarge weight divergence of the local model and the global model, improves the performance of the federal learning model, improves the federal learning algorithm by utilizing the differentiation of the computing resources of the client, accelerates the convergence rate of the federal learning, and is a very practical method for solving the data heterogeneity problem in the federal learning.

Claims (4)

1. A method for solving the problem of data heterogeneity in federation learning based on federation gaming is characterized in that a convolutional neural network is cooperatively trained on a server and a client by using federation learning; the method comprises the following specific steps:
step 1, constructing a convolutional neural network:
step 1.1, building a convolutional neural network with the same structure for each client, wherein each layer of the network is formed by connecting layers in series, and the structure is as follows: the input layer, the first convolution layer, the first pooling layer, the second convolution layer, the second pooling layer and the full connection layer;
step 1.2, setting super parameters of a convolutional neural network: setting the number of neurons of the input layer to 28×28; the convolution kernels of the first convolution layer and the second convolution layer are respectively set to be 5 multiplied by 5 and 3 multiplied by 3, the number of the convolution kernels is set to be 64, and the sliding step length is set to be 1; setting the sizes of pooling windows of the first pooling layer and the second pooling layer to be 2 multiplied by 2, and setting the sliding step length to be 2; the activation functions are realized by adopting ReLU; setting the number of output neurons of a full-connection layer to 10, and realizing an activation function by adopting Softmax;
step 2, generating a training set of each client:
step 2.1, forming a sample set of each client from the handwriting digital picture of the client; labeling each handwritten digital picture of each client sample set with a label;
step 2.2, carrying out mean variance normalization processing on each picture marked by each sample set, enabling the processed data to accord with standard front distribution, and enabling the pictures normalized by each sample set to form a training set of the client;
step 3, each client transmits the label class set and the iteration time to the server through base station communication;
step 4, the server obtains the optimal alliance partition:
step 4.1, taking each client as a alliance, and forming an alliance partition by all the alliances;
step 4.2, calculating the benefit of each alliance in the alliance partition;
step 4.3, utilizing a alliance game forming algorithm to continuously leave the client from the original alliance, adding other alliances, and forming a new alliance partition by all the alliances when the sum of benefits of the two alliances is increased;
step 4.4, when any client-side joins any coalition and cannot make the sum of benefits of the original coalition and the newly joined coalition larger, and no new coalition partition is generated any more, stopping iteration of the coalition game forming algorithm; deleting the hollow alliances of the alliance partition, and forming the rest alliances into an optimal alliance partition;
step 5, the server groups for each federation in the optimal federation partition:
step 5.1, the server finds out the client with the shortest iteration time in each alliance as an alliance leader of each alliance according to the iteration time uploaded by each client;
step 5.2, the server calculates the local training round of each client;
step 5.3, forming a group by the clients with the same local training turn in each alliance;
step 6, performing cooperative training on the convolutional neural network by utilizing federal learning:
step 6.1, the server transmits the same convolutional neural network parameter matrix to each client in the optimal alliance partition;
step 6.2, each client updates the own convolutional neural network by using the received convolutional neural network parameter matrix;
step 6.3, each client inputs each training set into the corresponding convolutional neural network, calculates a convolutional neural network parameter matrix after 10 times of iterative updating of the convolutional neural network of each client by using an SGD gradient descent algorithm, and uploads the convolutional neural network parameter matrix to a alliance leader;
step 6.4, the alliance leader of each alliance receives the convolutional neural network parameter matrix of the client, and averages the received convolutional neural network parameter matrix with different characteristics;
step 6.5, the alliance leader of each alliance judges whether the parameter matrix of the convolutional neural network of the client of all the groups is received or not, if yes, the average value of the parameter matrix of the convolutional neural network is sent to the server, and then the step 6.6 is executed; otherwise, the average value of the parameter matrix of the convolutional neural network is sent to the client and then the step 6.2 is executed;
step 6.6, the server averages the parameter matrix of the convolutional neural network with different characteristics of all the alliances received by the server, and then transmits the average value to each alliance leader, and each alliance leader transmits the average value of the parameter matrix to each alliance client;
step 6.7, judging whether the server has executed the averaging operation 5000 times in the step 6.6, if yes, ending the cooperative training, updating the convolutional neural network of the server by using the parameter matrix average value, and executing the step 7.1; otherwise, executing the step 6.2;
step 7, predicting the type of the handwritten digital picture of the server:
step 7.1, processing the handwritten digital picture of the server by adopting the same preprocessing method as the step 2 to obtain a test set of the server;
and 7.2, inputting all images of the server test set into a convolutional neural network of the server, and outputting a predicted handwriting digital recognition result.
2. The method for solving the data heterogeneity problem in federal learning based on federation gaming of claim 1, wherein the tag of step 2.1 comprises: 0,1,2,3,4,5,6,7,8,9.
3. The method of claim 1, wherein the calculating the benefit of each of the coalition zones in step 5.2 is based on the following formula:
Figure FDA0004096226910000031
wherein V is j Representing the benefit of the j-th federation in a federated partition, log (& gt) represents a base 10 log operation, || represents an absolute value operation, n m Representing the number of handwritten digital pictures with m-type labels of all client training sets in jth alliance in alliance partition, D j Representing the total number of handwritten digital pictures in the training set for all clients in the jth federation in the federation partition.
4. The method for solving the data heterogeneity problem in federation learning based on federation gaming of claim 1, wherein the computing of the local training round for each client in step 5.2 is derived from:
Figure FDA0004096226910000032
wherein mu i Representing the local training round of the i-th client,
Figure FDA0004096226910000033
representation directionLower rounding operation, < ->
Figure FDA0004096226910000034
Representing the client with the largest iteration time in each federation,/->
Figure FDA0004096226910000035
Representing the iteration time of the client with the largest iteration time in each alliance, t i Representing the iteration time of the i-th client. />
CN202310167065.4A 2023-02-27 2023-02-27 Method for solving data heterogeneity problem in federal learning based on alliance game Pending CN116259057A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310167065.4A CN116259057A (en) 2023-02-27 2023-02-27 Method for solving data heterogeneity problem in federal learning based on alliance game

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310167065.4A CN116259057A (en) 2023-02-27 2023-02-27 Method for solving data heterogeneity problem in federal learning based on alliance game

Publications (1)

Publication Number Publication Date
CN116259057A true CN116259057A (en) 2023-06-13

Family

ID=86680547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310167065.4A Pending CN116259057A (en) 2023-02-27 2023-02-27 Method for solving data heterogeneity problem in federal learning based on alliance game

Country Status (1)

Country Link
CN (1) CN116259057A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116502709A (en) * 2023-06-26 2023-07-28 浙江大学滨江研究院 Heterogeneous federal learning method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116502709A (en) * 2023-06-26 2023-07-28 浙江大学滨江研究院 Heterogeneous federal learning method and device

Similar Documents

Publication Publication Date Title
CN109948029B (en) Neural network self-adaptive depth Hash image searching method
CN108229479B (en) Training method and device of semantic segmentation model, electronic equipment and storage medium
US11816183B2 (en) Methods and systems for mining minority-class data samples for training a neural network
CN112181666A (en) Method, system, equipment and readable storage medium for equipment evaluation and federal learning importance aggregation based on edge intelligence
CN114241282A (en) Knowledge distillation-based edge equipment scene identification method and device
CN113988314B (en) Clustering federation learning method and system for selecting clients
US20210065011A1 (en) Training and application method apparatus system and stroage medium of neural network model
CN114842267A (en) Image classification method and system based on label noise domain self-adaption
CN117523291A (en) Image classification method based on federal knowledge distillation and ensemble learning
CN112766603B (en) Traffic flow prediction method, system, computer equipment and storage medium
CN117236421B (en) Large model training method based on federal knowledge distillation
CN113987236B (en) Unsupervised training method and unsupervised training device for visual retrieval model based on graph convolution network
CN114943345A (en) Federal learning global model training method based on active learning and model compression
CN116259057A (en) Method for solving data heterogeneity problem in federal learning based on alliance game
CN111144566A (en) Neural network weight parameter training method, characteristic classification method and corresponding device
CN115331069A (en) Personalized image classification model training method based on federal learning
CN117994635B (en) Federal element learning image recognition method and system with enhanced noise robustness
CN114819143A (en) Model compression method suitable for communication network field maintenance
CN113516163B (en) Vehicle classification model compression method, device and storage medium based on network pruning
CN117454413A (en) Heterogeneous federal learning and malicious client defense method based on weighted distillation
CN115577797B (en) Federal learning optimization method and system based on local noise perception
Qiao et al. Boosting federated learning convergence with prototype regularization
CN111652021A (en) Face recognition method and system based on BP neural network
CN115965078A (en) Classification prediction model training method, classification prediction method, device and storage medium
CN115131605A (en) Structure perception graph comparison learning method based on self-adaptive sub-graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination