CN114881229B - Personalized collaborative learning method and device based on parameter gradual freezing - Google Patents

Personalized collaborative learning method and device based on parameter gradual freezing Download PDF

Info

Publication number
CN114881229B
CN114881229B CN202210792509.9A CN202210792509A CN114881229B CN 114881229 B CN114881229 B CN 114881229B CN 202210792509 A CN202210792509 A CN 202210792509A CN 114881229 B CN114881229 B CN 114881229B
Authority
CN
China
Prior art keywords
communication
model
training
local
round
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210792509.9A
Other languages
Chinese (zh)
Other versions
CN114881229A (en
Inventor
徐恪
刘泱
赵乙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202210792509.9A priority Critical patent/CN114881229B/en
Publication of CN114881229A publication Critical patent/CN114881229A/en
Application granted granted Critical
Publication of CN114881229B publication Critical patent/CN114881229B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention provides a personalized cooperative learning method and device based on parameter gradual freezing, wherein the method comprises the following steps: receiving a global model of the communication round sent by a central server at the beginning of each communication round, and splicing the global model and a local model of the previous communication round according to a mask matrix to obtain a local initial model of the communication round; determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of the training rounds; and sending the local model of the communication turn after the training is finished, and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished. The invention can reduce the number of communication rounds required by collaborative learning and reduce the communication overhead in the model training process.

Description

Personalized collaborative learning method and device based on parameter gradual freezing
Technical Field
The invention relates to the technical field of edge computing, in particular to a personalized collaborative learning method and device based on parameter gradual freezing.
Background
With the advent of the world of everything interconnection, more and more edge devices have access to the network. The data generated by various edge devices and the corresponding computing, storage, and communication requirements have all grown dramatically. Scenes such as malicious flow detection, industrial internet, smart home, automatic driving and vehicle networking and the like need to transmit, calculate, store and decide a large amount of data generated in a short time quickly and rapidly, namely a calculation paradigm with ultra-low time delay and enough intellectualization is needed to process business requirements under various complex scenes. The existing cloud computing mode has the problems of weak real-time performance, insufficient bandwidth, high energy consumption, data privacy risks and the like for edge computing scenes, edge computing is widely applied to various edge scenes as powerful supplement of cloud computing, and the capability of edge equipment for processing complex problems is greatly enhanced.
In recent years, edge calculation has the characteristics of scene diversification, problem complication, equipment scale and the like, and single-point edge calculation has no apprehension on the significance of the change trend. For example, in an internet of vehicles scenario, a vehicle needs to cooperatively communicate and make decisions with a road side scheduler, other vehicles, and a cloud server, and a computing scenario is very complex and requires multiple computing nodes to cooperatively operate. For the edge mobile network, problems such as dynamic scheduling of computing and storage resources and malicious traffic detection also require cooperation of a plurality of edge computing nodes. These problems can be solved by introducing artificial intelligence methods such as a deep neural network, and therefore collaborative learning has become an important technology in edge computation, so that edge computation can advance to edge intelligence.
There are two issues that need to be considered when the edge device (which has the same meaning as the edge node) trains the neural network model. First, the computing and communication capabilities of the edge devices are limited, and the computing and communication overhead of the edge device side needs to be reduced as much as possible during collaborative learning. Secondly, the data amount on the edge devices is small and the distribution is very uneven, i.e. the problem of non-independent and same distribution of data exists between the edge devices. Therefore, how to effectively solve the problems of communication efficiency and data non-independent and same distribution of cooperative learning in the edge intelligent scene needs to be solved urgently.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, the invention provides a personalized cooperative learning method and device based on parameter gradual freezing, which are used for solving the problems of communication efficiency and data non-independent and same distribution of cooperative learning in a marginal intelligent scene, reducing the number of communication rounds required by the cooperative learning and reducing the communication overhead in the training process.
The embodiment of the invention provides a personalized cooperative learning method based on parameter gradual freezing, which comprises the following steps:
when each communication turn starts, receiving a global model of the communication turn sent by a central server, and splicing the global model and a local model of a previous communication turn according to a mask matrix to obtain a local initial model of the communication turn;
determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of the training rounds to obtain the trained local model of the communication round;
and sending the local model of the communication turn after the training is finished, and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished.
According to the personalized collaborative learning method based on parameter gradual freezing (namely, the data environments which are not independently and uniformly distributed), the personalized models which are suitable for respective data can be obtained by training under the condition that the data distribution difference of each edge device is large, and the prediction accuracy of the models is further improved. And the model can be converged quickly, the number of communication rounds required by cooperative learning is reduced, and the communication overhead of training is reduced. In addition, the method is insensitive to the introduced hyper-parameters, so that the model can be rapidly and reliably deployed in various real complex environments.
In addition, the personalized collaborative learning method based on parameter gradual freezing according to the above embodiment of the present invention may have the following additional technical features:
further, in an embodiment of the present invention, before the beginning of each communication turn, the method further includes: in an initialization stage, an initial global model issued by the central server is received, and a mask matrix and variable parameters corresponding to the initial global model are generated, wherein the variable parameters are used for recording the number of communication rounds of edge devices participating in collaborative learning.
Further, in an embodiment of the present invention, the method further includes: before the communication round is finished, the central server is used for carrying out parameter aggregation on all the collected local models so as to update the global model in the central server, and whether the next communication round is carried out is determined according to a training cut-off condition.
Further, in an embodiment of the present invention, the determining, according to the variable parameter, the number of training rounds of the communication round, and training the local initial model of the communication round according to the number of training rounds to obtain the trained local model of the communication round includes: after the local initial model of the communication round is obtained, updating the variable parameters, and calculating the number of training rounds of the local initial model of the communication round by using the updated variable parameters through a first formula; and training the local initial model of the communication round by using a small batch gradient descent method based on the number of the training rounds obtained by calculation to obtain the trained local model of the communication round.
Further, in one embodiment of the present invention, the first formula is:
Figure 858616DEST_PATH_IMAGE001
(1)
wherein,
Figure 182281DEST_PATH_IMAGE002
meaning that the function operation is rounded up in the near sense,
Figure DEST_PATH_IMAGE003
to represent
Figure 680127DEST_PATH_IMAGE004
Is set to the initial value of (a),
Figure DEST_PATH_IMAGE005
for the Kth user C K In the first place
Figure 13020DEST_PATH_IMAGE006
The number of training rounds in a communication round.
Further, in an embodiment of the present invention, the updating the mask matrix according to the freezing parameter corresponding to the local model of the current communication turn includes: through the first step
Figure 561813DEST_PATH_IMAGE008
Individual user
Figure 72691DEST_PATH_IMAGE010
Testing the local model of the communication turn after the training is finished to generate a first testing accuracy rate; when the first test accuracy rate is greater than the last round of trainingAnd when the second test accuracy of the local model is good, calculating the proportion of newly added frozen parameters in the communication round, and updating the mask matrix according to the proportion of the newly added frozen parameters.
Further, in an embodiment of the present invention, the calculating a ratio of newly added frozen parameters in the communication round includes: calculating the proportion of the weight of the unfrozen parameter to the weight quantity of all the parameters by using a third formula; and calculating by using a fourth formula to obtain the proportion of the weight values of the newly added freezing parameters in the communication turn based on the proportion of the weight of the unfrozen parameters in the weight quantity of all the parameters.
Further, in an embodiment of the present invention, the third formula and the fourth formula are respectively:
Figure DEST_PATH_IMAGE011
(3)
Figure 339724DEST_PATH_IMAGE012
(4)
wherein, free _ ratio (rcnt) in the formula (3) k ) Representing the proportion of the not yet frozen weight to the total number of weights, B low Represents free _ ratio (rcnt) k ) The lower bound of (c); free _ ratio (rcnt) in formula (4) k ) Represents the rcnt k Value of free _ ratio in communication turn, free _ ratio (rcnt) k -1) represents the second rcnt k -1 value of free _ ratio in communication round, R fix The ratio of newly added freezing weight values in each communication turn is represented.
Further, in an embodiment of the present invention, the updating the mask matrix according to the ratio of the newly added frozen parameter includes: acquiring element values of a mask matrix; sorting the parameter values which are not frozen according to the absolute value of the parameter values corresponding to the element values; and setting the element value of the mask matrix corresponding to the weight value with the maximum absolute value proportion as the weight value proportion of the newly added freezing parameter weight value in the communication turn as a preset value according to the sorting result.
The embodiment of the invention provides an individualized collaborative learning device based on parameter gradual freezing, which comprises:
the model splicing module is used for receiving a global model of the communication turn sent by the central server when each communication turn starts, and splicing the global model and a local model of the previous communication turn according to a mask matrix to obtain a local initial model of the communication turn;
the model training module is used for determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of the training rounds to obtain the trained local model of the communication round;
and the mask updating module is used for sending the local model of the communication turn after the training is finished and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished.
According to the individualized collaborative learning device based on parameter gradual freezing, under the condition that the data distribution difference of each edge device is large (namely, the data environment with non-independent same distribution), individualized models suitable for respective data are obtained through training, and the prediction accuracy of the models is further improved. And the model can be converged quickly, the number of communication rounds required by cooperative learning is reduced, and the communication overhead of training is reduced. In addition, the method is insensitive to the introduced hyper-parameters, so that the model can be rapidly and reliably deployed in various real complex environments.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart of a personalized cooperative learning method based on parameter gradual freezing according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a filter weight value concatenation flow of 5 × 5 according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a 5 × 5 filter mask matrix updating process according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating the calculation of the number of training rounds of the communication round to train the local model of the communication round according to the embodiment of the present invention;
fig. 5 is a flowchart illustrating updating of a mask matrix according to a freezing parameter corresponding to a local model of the current communication turn according to the embodiment of the present invention;
fig. 6 is a flowchart of calculating a ratio of newly added frozen parameters of the current communication turn according to the embodiment of the present invention;
FIG. 7 is an experimental schematic of an adaptive tuning method according to an embodiment of the present invention;
fig. 8 is a flowchart of updating a mask matrix according to the ratio of newly added frozen parameters according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an apparatus for personalized collaborative learning based on gradual parameter freezing according to an embodiment of the present invention;
FIG. 10 is a schematic structural diagram of a model training module according to an embodiment of the present invention;
FIG. 11 is a block diagram of a mask update module according to an embodiment of the present invention;
FIG. 12 is a diagram illustrating a second update sub-module according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram of a global aggregation module according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative and intended to explain the present invention and should not be construed as limiting the present invention.
The following describes a personalized collaborative learning method and apparatus based on parameter gradual freezing according to an embodiment of the present invention with reference to the accompanying drawings.
Fig. 1 is a flowchart of a personalized collaborative learning method based on parameter gradual freezing according to an embodiment of the present invention.
And step S1, when each communication turn starts, receiving the global model of the communication turn sent by the central server, and splicing the global model and the local model of the previous communication turn according to the mask matrix to obtain the local initial model of the communication turn.
It can be understood that, in the initialization stage, the initial global model issued by the central server is received, and a mask matrix and variable parameters corresponding to the initial global model are generated, where the variable parameters are used to record the number of communication rounds in which the edge device participates in the collaborative learning.
In particular, the central server pair is the initial global model
Figure DEST_PATH_IMAGE013
Carry out parameter initialization and will
Figure 663258DEST_PATH_IMAGE013
Is sent to each edge device
Figure 332137DEST_PATH_IMAGE014
Specifically, according to the initial global model issued by the central server,
Figure DEST_PATH_IMAGE015
locally maintaining a mask matrix on the same scale as the model
Figure 528763DEST_PATH_IMAGE016
And transforming the variables
Figure DEST_PATH_IMAGE017
Is initialized to 0, and the variable records
Figure 705929DEST_PATH_IMAGE018
In communication rounds involving cooperative learningNumber of the cells. In one embodiment of the present invention, each edge device will use the same network as the global model, and is also issued by the central server.
And the edge equipment receives the initial global model issued by the central server and generates a mask matrix and variable parameters corresponding to the initial global model.
Further, the edge devices are randomly sampled. For N edge devices and the random sampling rate K, N × K randomly sampled edge devices participate in the round of cooperative learning, that is, the number of edge devices participating in the r-th round of cooperative learning is S = max (N × K,1), and these edge devices form the set S r ={C 1 ,……,C S }。
Further, at the beginning of each communication turn, the central server will model the global model
Figure DEST_PATH_IMAGE019
Sending the data to the edge device set obtained by sampling
Figure 380624DEST_PATH_IMAGE020
Each edge device of (1). Edge devices
Figure DEST_PATH_IMAGE021
Receiving a global model
Figure 904009DEST_PATH_IMAGE022
Then, the global model is used
Figure 520804DEST_PATH_IMAGE022
Local model of previous communication round
Figure DEST_PATH_IMAGE023
Make a splice, i.e.
Figure 700112DEST_PATH_IMAGE024
As shown in fig. 2.
In one embodiment, fig. 2 is a schematic diagram of a filter weight value concatenation flow of 5 × 5, which introduces weight value concatenation. Left plus signEdge filter characterization global model
Figure 240815DEST_PATH_IMAGE022
The filter to the right of the plus sign represents the local model of the previous round
Figure DEST_PATH_IMAGE025
The filter on the right of the equal sign represents the local initial model of the round
Figure 41543DEST_PATH_IMAGE026
. The numbers in the circles respectively represent matrices
Figure DEST_PATH_IMAGE027
And
Figure 845551DEST_PATH_IMAGE028
the value of the medium element(s),
Figure 308894DEST_PATH_IMAGE028
is a mask matrix
Figure DEST_PATH_IMAGE029
Obtaining a matrix after the element inversion,
Figure 840238DEST_PATH_IMAGE030
meaning multiplication by element. For the
Figure 338215DEST_PATH_IMAGE022
And
Figure 781966DEST_PATH_IMAGE025
only the weight value with a corresponding mask value of 1 will be assigned to the local initial model for that communication turn. It can be understood that the mask matrix
Figure 998184DEST_PATH_IMAGE027
The method and the device ensure that the weight value which is frozen in the local initial model is not covered by the weight value of the corresponding position of the global model in the communication turn.
And splicing the global model and the local model of the previous communication turn according to the mask matrix to obtain the local initial model of the current communication turn.
And step S2, determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of the training rounds to obtain the trained local model of the communication round.
It will be understood that the variables are
Figure 569105DEST_PATH_IMAGE017
Record up
Figure 921589DEST_PATH_IMAGE018
The number of communication rounds participating in cooperative learning is calculated by the calculation formula
Figure 270661DEST_PATH_IMAGE018
And training the local initial model of the communication round according to the number of the training rounds to obtain the trained local model of the communication round.
And step S3, sending the local model of the communication turn after the training is finished, and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished.
In particular, edge devices
Figure 708596DEST_PATH_IMAGE018
Complete the local training process and model
Figure 394792DEST_PATH_IMAGE026
After uploading to the central server, the mask matrix maintained locally is updated
Figure 54313DEST_PATH_IMAGE027
In one embodiment, fig. 3 is a schematic diagram of a filter mask matrix update flow of 5 × 5, and the mask matrix update is schematically illustrated. The element of the mask matrix with a value of 1 in fig. 3 indicates that the corresponding parameter value is not frozen, the element with a value of 0 and a black border line indicates that the corresponding parameter value is frozen, and the number of frozen parameters gradually increases with the change of communication turns. In each communication round of training, the model will only update the unfrozen weights. For weights that have been frozen, the model will still use them, but will not update them any more.
The embodiment of the invention can be widely applied to the edge computing nodes with limited resources, namely edge equipment, and model training and parameter updating are carried out at the edge end. The scheme of the invention enables the model trained by each edge device to enhance generalization capability through cooperative learning and maintain sensitivity to local data distribution of the edge device. Meanwhile, under the condition that the data distribution of the edge devices is extremely uneven, each edge device can train an individualized model, and the prediction capability of the edge devices on local data is enhanced. The method not only can lead the model to be fast converged and save communication and calculation cost required by collaborative learning training, but also is insensitive to the introduced hyper-parameters, and has the advantages of easy adjustment and deployment and high practical application value.
Fig. 4 is a flowchart for calculating the number of training rounds of the communication round to train the local model of the communication round, as shown in fig. 4, in an embodiment, the number of training rounds is determined according to variable parameters, and the local initial model of the communication round is trained according to the number of training rounds to obtain the trained local model of the communication round, which includes the following sub-steps:
step 401, after the local initial model of the communication round is obtained, updating the variable parameters, and calculating the number of training rounds of the local initial model of the communication round by using the updated variable parameters through a first formula.
Specifically, after the local initial model of the communication turn is obtained, the edge device is updated
Figure DEST_PATH_IMAGE031
Locally maintained variables
Figure 105445DEST_PATH_IMAGE032
I.e. by
Figure DEST_PATH_IMAGE033
. Is calculated according to equation (1)
Figure 441794DEST_PATH_IMAGE031
In the first place
Figure 666101DEST_PATH_IMAGE032
Number of training rounds in a communication round
Figure 196440DEST_PATH_IMAGE034
Figure DEST_PATH_IMAGE035
(1)
Wherein,
Figure 667741DEST_PATH_IMAGE036
representing a rounding function operation in proximity. Hyper-parameter
Figure DEST_PATH_IMAGE037
To represent
Figure 752372DEST_PATH_IMAGE038
I.e. in the first communication round
Figure 468786DEST_PATH_IMAGE039
The value of (a). In particular when
Figure 650369DEST_PATH_IMAGE040
When the value of (1) is obtained, training round
Figure 105621DEST_PATH_IMAGE041
Is exactly as
Figure 677548DEST_PATH_IMAGE042
. With following
Figure 243659DEST_PATH_IMAGE043
The value of (a) is increased by,
Figure 732278DEST_PATH_IMAGE044
will gradually decrease and eventually converge to 1.
And step 402, based on the number of training rounds of the communication round obtained by calculation, training the local initial model of the communication round by using a small batch gradient descent method to obtain the trained local model of the communication round.
In particular, an edge device
Figure 358431DEST_PATH_IMAGE031
For model
Figure 417654DEST_PATH_IMAGE026
Training is carried out by using a small batch gradient descent method, as shown in formula (2):
Figure 787455DEST_PATH_IMAGE045
(2)
wherein,
Figure 366466DEST_PATH_IMAGE046
representing the learning rate in the gradient descent algorithm,
Figure 163521DEST_PATH_IMAGE047
in order to perform the objective function of the optimization,
Figure 772357DEST_PATH_IMAGE048
representing multiplications by element correspondence position.
Optionally, for the model
Figure 883533DEST_PATH_IMAGE049
Training, can also use batch ladderTraining is carried out by a degree descent method, a random gradient descent method and the like.
And obtaining the local model of the communication turn after training through the model training.
The method according to the embodiment of the present invention, after step S4, further includes:
and step S5, before the communication round is finished, the central server is used for carrying out parameter aggregation on all the collected local models so as to update the global model in the central server, and whether the next communication round is carried out is determined according to the training cutoff condition.
Specifically, the central server collects
Figure 894214DEST_PATH_IMAGE050
And (3) performing parameter aggregation on the collected models of all the edge devices in the set or exceeding a preset waiting time, namely performing weighted average on the models according to the scale of the training data volume of each edge device.
It can be understood that the central server performs parameter aggregation on all received local models of the users, and updates the global model in the central server, so as to determine whether to perform the next communication turn according to the training cutoff condition and send the next communication turn to the users.
Fig. 5 is a flowchart illustrating updating a mask matrix according to a freezing parameter corresponding to a local model of the current communication turn according to an embodiment of the present invention, where as shown in fig. 5, in an embodiment, updating a mask matrix according to a freezing parameter includes the following sub-steps:
step 501, testing the local model of the communication turn after training is completed, and then generating a first testing accuracy.
And 502, when the first test accuracy is greater than the second test accuracy of the local model trained in the previous round, calculating the proportion of newly added frozen parameters in the communication round, and updating a mask matrix according to the proportion of the newly added frozen parameters.
In particular, the Kth user
Figure 49121DEST_PATH_IMAGE031
Local models obtained by the training of the current round after the training is finished
Figure 145253DEST_PATH_IMAGE051
And testing to obtain the testing accuracy acc. If acc is larger than the test accuracy rate last _ acc of the local model obtained in the previous round, updating the mask matrix
Figure 794540DEST_PATH_IMAGE052
. Otherwise, the updating is not carried out, and the subsequent steps are not executed.
Fig. 6 is a flowchart of calculating a ratio of newly added frozen parameters of the current communication round according to an embodiment of the present invention, and as shown in fig. 6, in an embodiment, calculating a ratio of newly added frozen parameters of the current communication round includes the following sub-steps:
step 601, calculating the proportion of the weight of the unfrozen parameter to the weight number of all the parameters by using a third formula.
Specifically, the second is calculated using formula (3)
Figure 925307DEST_PATH_IMAGE053
The proportion of the frozen parameters is newly added in the turn
Figure 752580DEST_PATH_IMAGE054
. For the purpose of freezing important parameters and preserving the information encoded by the model on the data as early as possible, and for the purpose of reducing the number of training rounds
Figure 70428DEST_PATH_IMAGE055
Suitably, the following two equations are used to determine the proportion of the newly added freezing weight value in each communication turn.
Figure 585723DEST_PATH_IMAGE056
(3)
In equation (3), free _ ratio (rcnt) k ) Represents the free weight, i.e. the weight that has not been frozen, accounts forThere is a proportion of the number of weights. B ultrasonic parameter low Represents free _ ratio (rcnt) k ) Lower boundary of (B) low The value of (d) determines the final proportion of the free weights in the model. Specifically, when rcnt k When the value of (c) is 0, free _ ratio (rcnt) k ) Is 1, which corresponds to the situation before the first mask matrix update. With rcnt k Is increased, free _ ratio (rcnt) k ) Will gradually decrease and eventually converge to B low
Step 602, based on the proportion of the weight of the unfrozen parameter to the weight of all the parameters, the proportion of the weight value of the newly added freezing parameter in the current communication turn is calculated by using a fourth formula.
Specifically, as in equation (4), free _ ratio (rcnt) k ) Represents the rcnt k Value of free _ ratio in communication turn, free _ ratio (rcnt) k -1) represents the second rcnt k -1 value of free _ ratio at communication turn. From the formula (3), free _ ratio follows rcnt k The value increases but monotonically decreases, so the fraction in equation (4) is always less than 1, R fix The value of (b) is always greater than 0. R is fix Indicating the proportion of the newly added frozen weight value in each communication turn, which also determines the number of elements whose value changes from 1 to 0 each time the mask matrix is updated.
Figure 243101DEST_PATH_IMAGE012
(4)
And (4) calculating by using a formula (4) to obtain the proportion of the weight value of the newly added freezing parameter in the communication turn.
Further, fig. 7 shows an experimental schematic diagram of the adaptive tuning method, i.e. the number of training rounds
Figure 552859DEST_PATH_IMAGE057
And the freezing parameter ratio (freezing rate) with the change of the communication turns. In FIG. 7, setting
Figure 810534DEST_PATH_IMAGE058
The value of which is 50 or more,
Figure 863941DEST_PATH_IMAGE059
the value is 0.8, which means that the freezing parameter proportion is at most 20%.
Fig. 8 is a flowchart of updating a mask matrix according to a ratio of a newly added frozen parameter according to an embodiment of the present invention, and as shown in fig. 8, in an embodiment, updating the mask matrix according to a ratio of a newly added frozen parameter includes the following sub-steps:
step 801, acquiring element values of a mask matrix;
step 802, sorting the parameter values which are not frozen according to the absolute values of the parameter values corresponding to the element values;
and 803, according to the sorting result, taking the element value of the mask matrix corresponding to the weight value with the maximum absolute value proportion as the weight value of the newly added freezing parameter weight value proportion in the communication turn as a preset numerical value.
Specifically, the embodiment of the invention updates the mask matrix layer by layer
Figure 641404DEST_PATH_IMAGE060
The value of (2). For each layer of the neural network model, sorting the parameter values which are not frozen according to the absolute values of the parameter values corresponding to the matrix elements, and setting the ratio of the maximum absolute value as
Figure 122064DEST_PATH_IMAGE061
Corresponding to the weight value of
Figure 102921DEST_PATH_IMAGE062
The element value is set to 0.
By implementing the personalized collaborative learning method based on parameter gradual freezing, the following beneficial effects can be obtained:
the method is also suitable for a scene of cooperatively learning an intelligent model by a plurality of nodes (or devices) similar to edge intelligence in the field of cooperative learning under the edge intelligence scene. Existing methods are mainly dedicated to training the same global model for all edge devices. Under the condition that the data distribution of each edge device is greatly different, all the edge devices do not need to apply the same global model, the practical effect of the model is not good, and sometimes the prediction accuracy of the model cannot even exceed that of the model which is independently trained by each edge device. In addition, frequent and massive model transmission between the central server and each edge device also hinders the landing and practical application of collaborative learning, and becomes a great bottleneck for the development of collaborative learning. The embodiment of the invention effectively solves the problems by freezing the parameters of the local model of the edge device by communication turns.
1) The personalized cooperative learning method provided by the invention continuously freezes important parameters in the local model by using a mask matrix mode, and the training process of the local model is also a process for finding and freezing the important parameters of the model. The neural network model is always over-parameterized, and the invention codes the edge device local data in the model by freezing important parameters, thereby effectively avoiding the problem that the parameters of the edge device local model are damaged or even completely covered because of parameter aggregation at the server end.
2) According to the method, enough free parameters can be left by adjusting the hyper-parameters, so that the model can participate in collaborative learning by using the parameters, and the generalization capability of the model is further enhanced. The invention also designs a method for adaptively adjusting the training turn number of each communication turn and the proportion of the newly added freezing parameters, further enhances the capability of finding and freezing important parameters, and reduces the calculation overhead required by the invention. In terms of communication efficiency, the personalized collaborative learning method provided by the invention not only can enable the model to be rapidly converged, but also enables the model to be trained by using few communication rounds so as to achieve the expected test accuracy, greatly reduces the communication and calculation overhead of collaborative learning, and realizes the collaborative learning of efficient communication.
3) The method is also suitable for the application scene of the multi-node collaborative learning model similar to the edge intelligence, can realize personalized model training, also keeps high-efficiency communication, and saves network bandwidth. In addition, the method has the characteristics of less number of hyper-parameters and easiness in tuning, is convenient and quick to deploy and use in various actual complex scenes, and has high practical value.
Next, a personalized cooperative learning apparatus based on parameter gradual freezing according to an embodiment of the present invention will be described with reference to the accompanying drawings.
Fig. 9 is a schematic structural diagram of a personalized cooperative learning apparatus based on parameter gradual freezing according to an embodiment of the present invention.
As shown in fig. 9, the apparatus 10 includes: a model stitching module 100, a model training module 200, and a mask update module 300.
The model splicing module 100 is configured to receive, when each communication turn starts, a global model of the communication turn sent by the central server, and splice the global model and a local model of a previous communication turn according to a mask matrix to obtain a local initial model of the communication turn;
the model training module 200 is configured to determine the number of training rounds of the communication round according to the variable parameters, and train the local initial model of the communication round according to the number of training rounds to obtain a trained local model of the communication round;
a mask updating module 300, configured to send the local model of the communication turn after the training is completed, and according to the freezing parameter pair corresponding to the local model of the communication turn after the training is completed
The mask matrix is updated.
Further, before the model splicing module 100, the method further includes:
and the global model initial module is used for receiving the initial global model issued by the central server in an initialization stage and generating a mask matrix and variable parameters corresponding to the initial global model, wherein the variable parameters are used for recording the number of communication rounds of the edge device participating in the collaborative learning.
Further, as shown in fig. 10, the model training module 200 includes:
the number calculating unit 201 is configured to update the variable parameter after the local initial model of the communication round is obtained, and calculate the number of training rounds of the local initial model of the communication round through a first formula by using the updated variable parameter;
and the gradient training unit 202 is configured to train the local model of the communication round by using a small batch gradient descent method based on the calculated number of training rounds of the communication round to obtain the trained local model of the communication round.
Further, as shown in fig. 11, the mask updating module 300 includes:
the first updating subunit 301 is configured to generate a first test accuracy after testing the trained local model of the communication turn;
and a second updating subunit 302, configured to calculate a ratio of newly added frozen parameters in the communication round of this time when the first test accuracy is greater than a second test accuracy of the local model trained in the previous round, and update the mask matrix according to the ratio of the newly added frozen parameters.
Further, as shown in fig. 12, the second updating subunit 302 includes:
an element value acquiring subunit 3021 configured to acquire an element value of the mask matrix;
a parameter value sorting subunit 3022, configured to sort the non-frozen parameter values according to the absolute values of the parameter values corresponding to the element values;
and the element value setting subunit 3023 is configured to, according to the sorting result, use the element value of the mask matrix corresponding to the weight value with the largest absolute value as the weight value of the newly added freezing parameter weight value proportion in the current communication round as a preset value.
Further, as shown in fig. 13, the apparatus 10 further includes:
and a global aggregation module 400, configured to perform parameter aggregation on all the collected local models by using the central server before the communication round is finished, so as to update the global model in the central server, and determine whether to perform a next communication round according to a training deadline.
Further, the first formula is:
Figure 694439DEST_PATH_IMAGE063
(1)
wherein,
Figure 654305DEST_PATH_IMAGE064
meaning that the function operation is rounded up in the near sense,
Figure 509128DEST_PATH_IMAGE065
to represent
Figure 288865DEST_PATH_IMAGE066
Is set to the initial value of (a),
Figure 684075DEST_PATH_IMAGE067
for the Kth user CK at
Figure 685398DEST_PATH_IMAGE068
The number of training rounds in a communication round.
Further, the third formula and the fourth formula are respectively:
Figure 507860DEST_PATH_IMAGE069
(3)
Figure 712576DEST_PATH_IMAGE070
(4)
wherein, free _ ratio (rcnt) in the formula (3) k ) Representing the proportion of the not yet frozen weight to the total number of weights, B low Represents free _ ratio (rcnt) k ) The lower bound of (c); free _ ratio (rcnt) in formula (4) k ) Represents the rcnt k Value of free _ ratio in communication turn, free _ ratio (rcnt) k -1) represents the second rcnt k -1 value of free _ ratio in communication round, R fix The ratio of newly added freezing weight values in each communication turn is represented.
Further, the second updating subunit 302 is further configured to calculate, by using a third formula, a ratio of the weight of the non-frozen parameter to the weight of all the parameters, and calculate, by using a fourth formula, a ratio of the weight of the newly added frozen parameter in the current communication turn, based on the ratio of the weight of the non-frozen parameter to the weight of all the parameters.
The individualized collaborative learning device based on parameter gradual freezing provided by the embodiment of the invention can be widely applied to edge computing nodes with limited resources, namely edge equipment, and model training and parameter updating are carried out at the edge end. The scheme of the invention enables the model trained by each edge device to enhance generalization capability through cooperative learning and maintain sensitivity to local data distribution of the edge device. Meanwhile, under the condition that the data distribution of the edge devices is extremely uneven, each edge device can train an individualized model, and the prediction capability of the edge devices on local data is enhanced. The method not only can lead the model to be fast converged and save communication and calculation cost required by collaborative learning training, but also is insensitive to the introduced hyper-parameters, and has the advantages of easy adjustment and deployment and high practical application value.

Claims (12)

1. A personalized cooperative learning method based on parameter gradual freezing is characterized by comprising the following steps:
when each communication turn starts, receiving a global model of the communication turn sent by a central server, and splicing the global model and a local model of the previous communication turn according to a mask matrix to obtain a local initial model of the communication turn;
determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of the training rounds to obtain the trained local model of the communication round;
sending the local model of the communication turn after the training is finished, and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished;
the updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is completed includes:
testing the local model of the communication turn after the training is finished to generate a first testing accuracy rate;
when the first test accuracy is higher than the second test accuracy of the local model trained in the previous round, calculating the proportion of newly added frozen parameters in the communication round, and updating the mask matrix according to the proportion of the newly added frozen parameters;
the calculating the proportion of the newly added frozen parameters of the communication turn comprises the following steps:
calculating the proportion of the weight of the unfrozen parameters to the weight number of all the parameters by using a third formula;
calculating the proportion of the weight values of the newly added freezing parameters in the communication turn by using a fourth formula based on the proportion of the weight of the unfrozen parameters in the weight quantity of all the parameters;
the third formula and the fourth formula are respectively:
Figure 799590DEST_PATH_IMAGE001
(3)
Figure 827589DEST_PATH_IMAGE002
(4)
wherein, in the formula (3)
Figure 9172DEST_PATH_IMAGE003
Representing the proportion of the weights that have not been frozen to the number of all weights,
Figure 198845DEST_PATH_IMAGE004
to represent
Figure 160985DEST_PATH_IMAGE003
The lower bound of (c); in the formula (4)
Figure 727095DEST_PATH_IMAGE005
Is shown as
Figure 28763DEST_PATH_IMAGE006
Time of communication turn
Figure 389338DEST_PATH_IMAGE007
The value of (a) is selected,
Figure 776457DEST_PATH_IMAGE008
is shown as
Figure 481280DEST_PATH_IMAGE009
Time of communication turn
Figure 371876DEST_PATH_IMAGE010
The value of (a) is selected,
Figure 168930DEST_PATH_IMAGE011
the ratio of newly added freezing weight values in each communication turn is represented.
2. The method of claim 1, wherein before the beginning of each communication round, further comprising: in an initialization stage, an initial global model issued by the central server is received, and a mask matrix and variable parameters corresponding to the initial global model are generated, wherein the variable parameters are used for recording the number of communication rounds of edge devices participating in collaborative learning.
3. The method of claim 1, further comprising: before the communication round is finished, the central server is used for carrying out parameter aggregation on all the collected local models so as to update the global model in the central server, and whether the next communication round is carried out is determined according to a training cut-off condition.
4. The method according to claim 1, wherein the determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of training rounds to obtain the trained local model of the communication round comprises:
after the local initial model of the communication round is obtained, updating the variable parameters, and calculating the number of training rounds of the local initial model of the communication round by using the updated variable parameters through a first formula;
and training the local initial model of the communication round by using a small batch gradient descent method based on the number of the training rounds obtained by calculation to obtain the trained local model of the communication round.
5. The method of claim 4, wherein the first formula is:
Figure 777766DEST_PATH_IMAGE012
(1)
wherein,
Figure 13576DEST_PATH_IMAGE013
meaning that the function operation is rounded up in the near sense,
Figure 24257DEST_PATH_IMAGE014
to represent
Figure 992213DEST_PATH_IMAGE015
Of the initial value of (a) is,
Figure 88345DEST_PATH_IMAGE016
for the Kth user C K In the first place
Figure 799949DEST_PATH_IMAGE017
Number of training rounds in a communication round.
6. The method of claim 1, wherein the updating the mask matrix according to the ratio of the newly added frozen parameters comprises:
acquiring element values of a mask matrix;
sorting the parameter values which are not frozen according to the absolute value of the parameter values corresponding to the element values;
according to the result of the sorting, the ratio with the maximum absolute value is
Figure 478186DEST_PATH_IMAGE011
The element value of the mask matrix corresponding to the weight value of (1) is set to 0.
7. An apparatus for personalized collaborative learning based on parameter gradual freezing, comprising:
the model splicing module is used for receiving a global model of the communication turn sent by the central server when each communication turn starts, and splicing the global model and a local model of the previous communication turn according to a mask matrix to obtain a local initial model of the communication turn;
the model training module is used for determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of the training rounds to obtain the trained local model of the communication round;
the mask updating module is used for sending the local model of the communication turn after the training is finished and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished;
the mask update module includes:
the first updating subunit is used for testing the local model of the communication turn after the training is finished and generating a first test accuracy rate;
the second updating subunit is used for calculating the proportion of newly added frozen parameters of the communication round when the first test accuracy is greater than the second test accuracy of the local model trained in the previous round, and updating the mask matrix according to the proportion of the newly added frozen parameters;
the second updating subunit is further configured to calculate, by using a third formula, a ratio of the weight of the unfrozen parameter to the weight of all the parameters, and calculate, based on the ratio of the weight of the unfrozen parameter to the weight of all the parameters, a ratio of newly added frozen parameter weight values in the current communication turn by using a fourth formula;
the third formula and the fourth formula are respectively:
Figure 617043DEST_PATH_IMAGE001
(3)
Figure 200471DEST_PATH_IMAGE002
(4)
wherein, in the formula (3)
Figure 450187DEST_PATH_IMAGE003
Representing the proportion of the weights that have not been frozen to the number of all weights,
Figure 497778DEST_PATH_IMAGE004
to represent
Figure 807536DEST_PATH_IMAGE003
The lower bound of (c); in the formula (4)
Figure 878260DEST_PATH_IMAGE005
Is shown as
Figure 931667DEST_PATH_IMAGE006
Time of communication turn
Figure 771447DEST_PATH_IMAGE007
The value of (a) is selected,
Figure 563691DEST_PATH_IMAGE008
denotes the first
Figure 856133DEST_PATH_IMAGE009
Time of communication turn
Figure 713230DEST_PATH_IMAGE010
The value of (a) is selected,
Figure 407517DEST_PATH_IMAGE011
the ratio of newly added freezing weight values in each communication turn is represented.
8. The apparatus of claim 7, further comprising, prior to the model stitching module:
and the data initialization module is used for receiving an initial global model issued by the central server in an initialization stage and generating a mask matrix and variable parameters corresponding to the initial global model, wherein the variable parameters are used for recording the number of communication rounds of the edge device participating in the collaborative learning.
9. The apparatus of claim 7, further comprising:
and the global aggregation module is used for performing parameter aggregation on all the collected local models by using the central server before the communication round is finished so as to update the global model in the central server, and simultaneously determining whether to perform the next communication round according to a training cutoff condition.
10. The apparatus of claim 7, wherein the model training module comprises:
the number calculation unit is used for updating the variable parameters after the local initial model of the communication round is obtained, and calculating the number of training rounds of the local initial model of the communication round by using the updated variable parameters through a first formula;
and the gradient training unit is used for training the local initial model of the communication round by using a small batch gradient descent method based on the number of the training rounds obtained by calculation so as to obtain the trained local model of the communication round.
11. The apparatus of claim 10, wherein the first formula is:
Figure 121395DEST_PATH_IMAGE018
(1)
wherein,
Figure 166711DEST_PATH_IMAGE019
meaning that the function operation is rounded up in the near sense,
Figure 561920DEST_PATH_IMAGE020
to represent
Figure 110713DEST_PATH_IMAGE021
Is set to the initial value of (a),
Figure 11804DEST_PATH_IMAGE022
for the Kth user C K In the first place
Figure 278838DEST_PATH_IMAGE023
The number of training rounds in a communication round.
12. The apparatus of claim 7, wherein the second updating subunit comprises:
an element value obtaining subunit, configured to obtain an element value of the mask matrix;
the parameter value sorting subunit is used for sorting the parameter values which are not frozen according to the absolute value of the parameter values corresponding to the element values;
an element value setting subunit, configured to set, according to the sorting result, the ratio with the largest absolute value as
Figure 477738DEST_PATH_IMAGE011
The element value of the mask matrix corresponding to the weight value of (1) is set to 0.
CN202210792509.9A 2022-07-07 2022-07-07 Personalized collaborative learning method and device based on parameter gradual freezing Active CN114881229B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210792509.9A CN114881229B (en) 2022-07-07 2022-07-07 Personalized collaborative learning method and device based on parameter gradual freezing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210792509.9A CN114881229B (en) 2022-07-07 2022-07-07 Personalized collaborative learning method and device based on parameter gradual freezing

Publications (2)

Publication Number Publication Date
CN114881229A CN114881229A (en) 2022-08-09
CN114881229B true CN114881229B (en) 2022-09-20

Family

ID=82683465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210792509.9A Active CN114881229B (en) 2022-07-07 2022-07-07 Personalized collaborative learning method and device based on parameter gradual freezing

Country Status (1)

Country Link
CN (1) CN114881229B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10140977B1 (en) * 2018-07-31 2018-11-27 botbotbotbot Inc. Generating additional training data for a natural language understanding engine
CN114257395A (en) * 2021-11-01 2022-03-29 清华大学 Customized network security situation perception method and device based on collaborative learning
CN114297382A (en) * 2021-12-28 2022-04-08 杭州电子科技大学 Controllable text generation method based on parameter fine adjustment of generative pre-training model
CN114372146A (en) * 2022-01-07 2022-04-19 清华大学 Method and apparatus for training concept recognition models and recognizing concepts
CN114418085A (en) * 2021-12-01 2022-04-29 清华大学 Personalized collaborative learning method and device based on neural network model pruning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10140977B1 (en) * 2018-07-31 2018-11-27 botbotbotbot Inc. Generating additional training data for a natural language understanding engine
CN114257395A (en) * 2021-11-01 2022-03-29 清华大学 Customized network security situation perception method and device based on collaborative learning
CN114418085A (en) * 2021-12-01 2022-04-29 清华大学 Personalized collaborative learning method and device based on neural network model pruning
CN114297382A (en) * 2021-12-28 2022-04-08 杭州电子科技大学 Controllable text generation method based on parameter fine adjustment of generative pre-training model
CN114372146A (en) * 2022-01-07 2022-04-19 清华大学 Method and apparatus for training concept recognition models and recognizing concepts

Also Published As

Publication number Publication date
CN114881229A (en) 2022-08-09

Similar Documents

Publication Publication Date Title
CN111160525B (en) Task unloading intelligent decision-making method based on unmanned aerial vehicle group in edge computing environment
CN112367109A (en) Incentive method for digital twin-driven federal learning in air-ground network
CN110968426B (en) Edge cloud collaborative k-means clustering model optimization method based on online learning
EP4350572A1 (en) Method, apparatus and system for generating neural network model, devices, medium and program product
Li et al. Adaptive traffic signal control model on intersections based on deep reinforcement learning
CN110956202A (en) Image training method, system, medium and intelligent device based on distributed learning
CN116523079A (en) Reinforced learning-based federal learning optimization method and system
CN109787699B (en) Wireless sensor network routing link state prediction method based on mixed depth model
CN114169543B (en) Federal learning method based on model staleness and user participation perception
CN111224905B (en) Multi-user detection method based on convolution residual error network in large-scale Internet of things
CN116450312A (en) Scheduling strategy determination method and system for pipeline parallel training
Yang et al. Deep reinforcement learning based wireless network optimization: A comparative study
CN109548044B (en) DDPG (distributed data group pg) -based bit rate optimization method for energy-collectable communication
CN115374853A (en) Asynchronous federal learning method and system based on T-Step polymerization algorithm
CN116610434A (en) Resource optimization method for hierarchical federal learning system
CN113919483A (en) Method and system for constructing and positioning radio map in wireless communication network
CN112836822A (en) Federal learning strategy optimization method and device based on width learning
CN116542296A (en) Model training method and device based on federal learning and electronic equipment
CN113516163B (en) Vehicle classification model compression method, device and storage medium based on network pruning
CN113094180B (en) Wireless federal learning scheduling optimization method and device
CN110929885A (en) Smart campus-oriented distributed machine learning model parameter aggregation method
CN114881229B (en) Personalized collaborative learning method and device based on parameter gradual freezing
CN117392483A (en) Album classification model training acceleration method, system and medium based on reinforcement learning
CN113747450A (en) Service deployment method and device in mobile network and electronic equipment
CN112241295A (en) Cloud edge cooperative computing unloading method and system based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant