CN114881229B - Personalized collaborative learning method and device based on parameter gradual freezing - Google Patents
Personalized collaborative learning method and device based on parameter gradual freezing Download PDFInfo
- Publication number
- CN114881229B CN114881229B CN202210792509.9A CN202210792509A CN114881229B CN 114881229 B CN114881229 B CN 114881229B CN 202210792509 A CN202210792509 A CN 202210792509A CN 114881229 B CN114881229 B CN 114881229B
- Authority
- CN
- China
- Prior art keywords
- communication
- model
- training
- local
- round
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000008014 freezing Effects 0.000 title claims abstract description 53
- 238000007710 freezing Methods 0.000 title claims abstract description 53
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000004891 communication Methods 0.000 claims abstract description 202
- 238000012549 training Methods 0.000 claims abstract description 111
- 239000011159 matrix material Substances 0.000 claims abstract description 61
- 238000012360 testing method Methods 0.000 claims description 24
- 238000004364 calculation method Methods 0.000 claims description 14
- 230000002776 aggregation Effects 0.000 claims description 11
- 238000004220 aggregation Methods 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 7
- 238000011478 gradient descent method Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 abstract description 7
- 238000009826 distribution Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 11
- 230000008859 change Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention provides a personalized cooperative learning method and device based on parameter gradual freezing, wherein the method comprises the following steps: receiving a global model of the communication round sent by a central server at the beginning of each communication round, and splicing the global model and a local model of the previous communication round according to a mask matrix to obtain a local initial model of the communication round; determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of the training rounds; and sending the local model of the communication turn after the training is finished, and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished. The invention can reduce the number of communication rounds required by collaborative learning and reduce the communication overhead in the model training process.
Description
Technical Field
The invention relates to the technical field of edge computing, in particular to a personalized collaborative learning method and device based on parameter gradual freezing.
Background
With the advent of the world of everything interconnection, more and more edge devices have access to the network. The data generated by various edge devices and the corresponding computing, storage, and communication requirements have all grown dramatically. Scenes such as malicious flow detection, industrial internet, smart home, automatic driving and vehicle networking and the like need to transmit, calculate, store and decide a large amount of data generated in a short time quickly and rapidly, namely a calculation paradigm with ultra-low time delay and enough intellectualization is needed to process business requirements under various complex scenes. The existing cloud computing mode has the problems of weak real-time performance, insufficient bandwidth, high energy consumption, data privacy risks and the like for edge computing scenes, edge computing is widely applied to various edge scenes as powerful supplement of cloud computing, and the capability of edge equipment for processing complex problems is greatly enhanced.
In recent years, edge calculation has the characteristics of scene diversification, problem complication, equipment scale and the like, and single-point edge calculation has no apprehension on the significance of the change trend. For example, in an internet of vehicles scenario, a vehicle needs to cooperatively communicate and make decisions with a road side scheduler, other vehicles, and a cloud server, and a computing scenario is very complex and requires multiple computing nodes to cooperatively operate. For the edge mobile network, problems such as dynamic scheduling of computing and storage resources and malicious traffic detection also require cooperation of a plurality of edge computing nodes. These problems can be solved by introducing artificial intelligence methods such as a deep neural network, and therefore collaborative learning has become an important technology in edge computation, so that edge computation can advance to edge intelligence.
There are two issues that need to be considered when the edge device (which has the same meaning as the edge node) trains the neural network model. First, the computing and communication capabilities of the edge devices are limited, and the computing and communication overhead of the edge device side needs to be reduced as much as possible during collaborative learning. Secondly, the data amount on the edge devices is small and the distribution is very uneven, i.e. the problem of non-independent and same distribution of data exists between the edge devices. Therefore, how to effectively solve the problems of communication efficiency and data non-independent and same distribution of cooperative learning in the edge intelligent scene needs to be solved urgently.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, the invention provides a personalized cooperative learning method and device based on parameter gradual freezing, which are used for solving the problems of communication efficiency and data non-independent and same distribution of cooperative learning in a marginal intelligent scene, reducing the number of communication rounds required by the cooperative learning and reducing the communication overhead in the training process.
The embodiment of the invention provides a personalized cooperative learning method based on parameter gradual freezing, which comprises the following steps:
when each communication turn starts, receiving a global model of the communication turn sent by a central server, and splicing the global model and a local model of a previous communication turn according to a mask matrix to obtain a local initial model of the communication turn;
determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of the training rounds to obtain the trained local model of the communication round;
and sending the local model of the communication turn after the training is finished, and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished.
According to the personalized collaborative learning method based on parameter gradual freezing (namely, the data environments which are not independently and uniformly distributed), the personalized models which are suitable for respective data can be obtained by training under the condition that the data distribution difference of each edge device is large, and the prediction accuracy of the models is further improved. And the model can be converged quickly, the number of communication rounds required by cooperative learning is reduced, and the communication overhead of training is reduced. In addition, the method is insensitive to the introduced hyper-parameters, so that the model can be rapidly and reliably deployed in various real complex environments.
In addition, the personalized collaborative learning method based on parameter gradual freezing according to the above embodiment of the present invention may have the following additional technical features:
further, in an embodiment of the present invention, before the beginning of each communication turn, the method further includes: in an initialization stage, an initial global model issued by the central server is received, and a mask matrix and variable parameters corresponding to the initial global model are generated, wherein the variable parameters are used for recording the number of communication rounds of edge devices participating in collaborative learning.
Further, in an embodiment of the present invention, the method further includes: before the communication round is finished, the central server is used for carrying out parameter aggregation on all the collected local models so as to update the global model in the central server, and whether the next communication round is carried out is determined according to a training cut-off condition.
Further, in an embodiment of the present invention, the determining, according to the variable parameter, the number of training rounds of the communication round, and training the local initial model of the communication round according to the number of training rounds to obtain the trained local model of the communication round includes: after the local initial model of the communication round is obtained, updating the variable parameters, and calculating the number of training rounds of the local initial model of the communication round by using the updated variable parameters through a first formula; and training the local initial model of the communication round by using a small batch gradient descent method based on the number of the training rounds obtained by calculation to obtain the trained local model of the communication round.
Further, in one embodiment of the present invention, the first formula is:
wherein,meaning that the function operation is rounded up in the near sense,to representIs set to the initial value of (a),for the Kth user C K In the first placeThe number of training rounds in a communication round.
Further, in an embodiment of the present invention, the updating the mask matrix according to the freezing parameter corresponding to the local model of the current communication turn includes: through the first stepIndividual userTesting the local model of the communication turn after the training is finished to generate a first testing accuracy rate; when the first test accuracy rate is greater than the last round of trainingAnd when the second test accuracy of the local model is good, calculating the proportion of newly added frozen parameters in the communication round, and updating the mask matrix according to the proportion of the newly added frozen parameters.
Further, in an embodiment of the present invention, the calculating a ratio of newly added frozen parameters in the communication round includes: calculating the proportion of the weight of the unfrozen parameter to the weight quantity of all the parameters by using a third formula; and calculating by using a fourth formula to obtain the proportion of the weight values of the newly added freezing parameters in the communication turn based on the proportion of the weight of the unfrozen parameters in the weight quantity of all the parameters.
Further, in an embodiment of the present invention, the third formula and the fourth formula are respectively:
wherein, free _ ratio (rcnt) in the formula (3) k ) Representing the proportion of the not yet frozen weight to the total number of weights, B low Represents free _ ratio (rcnt) k ) The lower bound of (c); free _ ratio (rcnt) in formula (4) k ) Represents the rcnt k Value of free _ ratio in communication turn, free _ ratio (rcnt) k -1) represents the second rcnt k -1 value of free _ ratio in communication round, R fix The ratio of newly added freezing weight values in each communication turn is represented.
Further, in an embodiment of the present invention, the updating the mask matrix according to the ratio of the newly added frozen parameter includes: acquiring element values of a mask matrix; sorting the parameter values which are not frozen according to the absolute value of the parameter values corresponding to the element values; and setting the element value of the mask matrix corresponding to the weight value with the maximum absolute value proportion as the weight value proportion of the newly added freezing parameter weight value in the communication turn as a preset value according to the sorting result.
The embodiment of the invention provides an individualized collaborative learning device based on parameter gradual freezing, which comprises:
the model splicing module is used for receiving a global model of the communication turn sent by the central server when each communication turn starts, and splicing the global model and a local model of the previous communication turn according to a mask matrix to obtain a local initial model of the communication turn;
the model training module is used for determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of the training rounds to obtain the trained local model of the communication round;
and the mask updating module is used for sending the local model of the communication turn after the training is finished and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished.
According to the individualized collaborative learning device based on parameter gradual freezing, under the condition that the data distribution difference of each edge device is large (namely, the data environment with non-independent same distribution), individualized models suitable for respective data are obtained through training, and the prediction accuracy of the models is further improved. And the model can be converged quickly, the number of communication rounds required by cooperative learning is reduced, and the communication overhead of training is reduced. In addition, the method is insensitive to the introduced hyper-parameters, so that the model can be rapidly and reliably deployed in various real complex environments.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart of a personalized cooperative learning method based on parameter gradual freezing according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a filter weight value concatenation flow of 5 × 5 according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a 5 × 5 filter mask matrix updating process according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating the calculation of the number of training rounds of the communication round to train the local model of the communication round according to the embodiment of the present invention;
fig. 5 is a flowchart illustrating updating of a mask matrix according to a freezing parameter corresponding to a local model of the current communication turn according to the embodiment of the present invention;
fig. 6 is a flowchart of calculating a ratio of newly added frozen parameters of the current communication turn according to the embodiment of the present invention;
FIG. 7 is an experimental schematic of an adaptive tuning method according to an embodiment of the present invention;
fig. 8 is a flowchart of updating a mask matrix according to the ratio of newly added frozen parameters according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an apparatus for personalized collaborative learning based on gradual parameter freezing according to an embodiment of the present invention;
FIG. 10 is a schematic structural diagram of a model training module according to an embodiment of the present invention;
FIG. 11 is a block diagram of a mask update module according to an embodiment of the present invention;
FIG. 12 is a diagram illustrating a second update sub-module according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram of a global aggregation module according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative and intended to explain the present invention and should not be construed as limiting the present invention.
The following describes a personalized collaborative learning method and apparatus based on parameter gradual freezing according to an embodiment of the present invention with reference to the accompanying drawings.
Fig. 1 is a flowchart of a personalized collaborative learning method based on parameter gradual freezing according to an embodiment of the present invention.
And step S1, when each communication turn starts, receiving the global model of the communication turn sent by the central server, and splicing the global model and the local model of the previous communication turn according to the mask matrix to obtain the local initial model of the communication turn.
It can be understood that, in the initialization stage, the initial global model issued by the central server is received, and a mask matrix and variable parameters corresponding to the initial global model are generated, where the variable parameters are used to record the number of communication rounds in which the edge device participates in the collaborative learning.
In particular, the central server pair is the initial global modelCarry out parameter initialization and willIs sent to each edge device。
Specifically, according to the initial global model issued by the central server,locally maintaining a mask matrix on the same scale as the modelAnd transforming the variablesIs initialized to 0, and the variable recordsIn communication rounds involving cooperative learningNumber of the cells. In one embodiment of the present invention, each edge device will use the same network as the global model, and is also issued by the central server.
And the edge equipment receives the initial global model issued by the central server and generates a mask matrix and variable parameters corresponding to the initial global model.
Further, the edge devices are randomly sampled. For N edge devices and the random sampling rate K, N × K randomly sampled edge devices participate in the round of cooperative learning, that is, the number of edge devices participating in the r-th round of cooperative learning is S = max (N × K,1), and these edge devices form the set S r ={C 1 ,……,C S }。
Further, at the beginning of each communication turn, the central server will model the global modelSending the data to the edge device set obtained by samplingEach edge device of (1). Edge devicesReceiving a global modelThen, the global model is usedLocal model of previous communication roundMake a splice, i.e.As shown in fig. 2.
In one embodiment, fig. 2 is a schematic diagram of a filter weight value concatenation flow of 5 × 5, which introduces weight value concatenation. Left plus signEdge filter characterization global modelThe filter to the right of the plus sign represents the local model of the previous roundThe filter on the right of the equal sign represents the local initial model of the round. The numbers in the circles respectively represent matricesAndthe value of the medium element(s),is a mask matrixObtaining a matrix after the element inversion,meaning multiplication by element. For theAndonly the weight value with a corresponding mask value of 1 will be assigned to the local initial model for that communication turn. It can be understood that the mask matrixThe method and the device ensure that the weight value which is frozen in the local initial model is not covered by the weight value of the corresponding position of the global model in the communication turn.
And splicing the global model and the local model of the previous communication turn according to the mask matrix to obtain the local initial model of the current communication turn.
And step S2, determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of the training rounds to obtain the trained local model of the communication round.
It will be understood that the variables areRecord upThe number of communication rounds participating in cooperative learning is calculated by the calculation formulaAnd training the local initial model of the communication round according to the number of the training rounds to obtain the trained local model of the communication round.
And step S3, sending the local model of the communication turn after the training is finished, and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished.
In particular, edge devicesComplete the local training process and modelAfter uploading to the central server, the mask matrix maintained locally is updated。
In one embodiment, fig. 3 is a schematic diagram of a filter mask matrix update flow of 5 × 5, and the mask matrix update is schematically illustrated. The element of the mask matrix with a value of 1 in fig. 3 indicates that the corresponding parameter value is not frozen, the element with a value of 0 and a black border line indicates that the corresponding parameter value is frozen, and the number of frozen parameters gradually increases with the change of communication turns. In each communication round of training, the model will only update the unfrozen weights. For weights that have been frozen, the model will still use them, but will not update them any more.
The embodiment of the invention can be widely applied to the edge computing nodes with limited resources, namely edge equipment, and model training and parameter updating are carried out at the edge end. The scheme of the invention enables the model trained by each edge device to enhance generalization capability through cooperative learning and maintain sensitivity to local data distribution of the edge device. Meanwhile, under the condition that the data distribution of the edge devices is extremely uneven, each edge device can train an individualized model, and the prediction capability of the edge devices on local data is enhanced. The method not only can lead the model to be fast converged and save communication and calculation cost required by collaborative learning training, but also is insensitive to the introduced hyper-parameters, and has the advantages of easy adjustment and deployment and high practical application value.
Fig. 4 is a flowchart for calculating the number of training rounds of the communication round to train the local model of the communication round, as shown in fig. 4, in an embodiment, the number of training rounds is determined according to variable parameters, and the local initial model of the communication round is trained according to the number of training rounds to obtain the trained local model of the communication round, which includes the following sub-steps:
Specifically, after the local initial model of the communication turn is obtained, the edge device is updatedLocally maintained variablesI.e. by. Is calculated according to equation (1)In the first placeNumber of training rounds in a communication round:
Wherein,representing a rounding function operation in proximity. Hyper-parameterTo representI.e. in the first communication roundThe value of (a). In particular whenWhen the value of (1) is obtained, training roundIs exactly as. With followingThe value of (a) is increased by,will gradually decrease and eventually converge to 1.
And step 402, based on the number of training rounds of the communication round obtained by calculation, training the local initial model of the communication round by using a small batch gradient descent method to obtain the trained local model of the communication round.
In particular, an edge deviceFor modelTraining is carried out by using a small batch gradient descent method, as shown in formula (2):
wherein,representing the learning rate in the gradient descent algorithm,in order to perform the objective function of the optimization,representing multiplications by element correspondence position.
Optionally, for the modelTraining, can also use batch ladderTraining is carried out by a degree descent method, a random gradient descent method and the like.
And obtaining the local model of the communication turn after training through the model training.
The method according to the embodiment of the present invention, after step S4, further includes:
and step S5, before the communication round is finished, the central server is used for carrying out parameter aggregation on all the collected local models so as to update the global model in the central server, and whether the next communication round is carried out is determined according to the training cutoff condition.
Specifically, the central server collectsAnd (3) performing parameter aggregation on the collected models of all the edge devices in the set or exceeding a preset waiting time, namely performing weighted average on the models according to the scale of the training data volume of each edge device.
It can be understood that the central server performs parameter aggregation on all received local models of the users, and updates the global model in the central server, so as to determine whether to perform the next communication turn according to the training cutoff condition and send the next communication turn to the users.
Fig. 5 is a flowchart illustrating updating a mask matrix according to a freezing parameter corresponding to a local model of the current communication turn according to an embodiment of the present invention, where as shown in fig. 5, in an embodiment, updating a mask matrix according to a freezing parameter includes the following sub-steps:
And 502, when the first test accuracy is greater than the second test accuracy of the local model trained in the previous round, calculating the proportion of newly added frozen parameters in the communication round, and updating a mask matrix according to the proportion of the newly added frozen parameters.
In particular, the Kth userLocal models obtained by the training of the current round after the training is finishedAnd testing to obtain the testing accuracy acc. If acc is larger than the test accuracy rate last _ acc of the local model obtained in the previous round, updating the mask matrix. Otherwise, the updating is not carried out, and the subsequent steps are not executed.
Fig. 6 is a flowchart of calculating a ratio of newly added frozen parameters of the current communication round according to an embodiment of the present invention, and as shown in fig. 6, in an embodiment, calculating a ratio of newly added frozen parameters of the current communication round includes the following sub-steps:
Specifically, the second is calculated using formula (3)The proportion of the frozen parameters is newly added in the turn. For the purpose of freezing important parameters and preserving the information encoded by the model on the data as early as possible, and for the purpose of reducing the number of training roundsSuitably, the following two equations are used to determine the proportion of the newly added freezing weight value in each communication turn.
In equation (3), free _ ratio (rcnt) k ) Represents the free weight, i.e. the weight that has not been frozen, accounts forThere is a proportion of the number of weights. B ultrasonic parameter low Represents free _ ratio (rcnt) k ) Lower boundary of (B) low The value of (d) determines the final proportion of the free weights in the model. Specifically, when rcnt k When the value of (c) is 0, free _ ratio (rcnt) k ) Is 1, which corresponds to the situation before the first mask matrix update. With rcnt k Is increased, free _ ratio (rcnt) k ) Will gradually decrease and eventually converge to B low 。
Specifically, as in equation (4), free _ ratio (rcnt) k ) Represents the rcnt k Value of free _ ratio in communication turn, free _ ratio (rcnt) k -1) represents the second rcnt k -1 value of free _ ratio at communication turn. From the formula (3), free _ ratio follows rcnt k The value increases but monotonically decreases, so the fraction in equation (4) is always less than 1, R fix The value of (b) is always greater than 0. R is fix Indicating the proportion of the newly added frozen weight value in each communication turn, which also determines the number of elements whose value changes from 1 to 0 each time the mask matrix is updated.
And (4) calculating by using a formula (4) to obtain the proportion of the weight value of the newly added freezing parameter in the communication turn.
Further, fig. 7 shows an experimental schematic diagram of the adaptive tuning method, i.e. the number of training roundsAnd the freezing parameter ratio (freezing rate) with the change of the communication turns. In FIG. 7, settingThe value of which is 50 or more,the value is 0.8, which means that the freezing parameter proportion is at most 20%.
Fig. 8 is a flowchart of updating a mask matrix according to a ratio of a newly added frozen parameter according to an embodiment of the present invention, and as shown in fig. 8, in an embodiment, updating the mask matrix according to a ratio of a newly added frozen parameter includes the following sub-steps:
and 803, according to the sorting result, taking the element value of the mask matrix corresponding to the weight value with the maximum absolute value proportion as the weight value of the newly added freezing parameter weight value proportion in the communication turn as a preset numerical value.
Specifically, the embodiment of the invention updates the mask matrix layer by layerThe value of (2). For each layer of the neural network model, sorting the parameter values which are not frozen according to the absolute values of the parameter values corresponding to the matrix elements, and setting the ratio of the maximum absolute value asCorresponding to the weight value ofThe element value is set to 0.
By implementing the personalized collaborative learning method based on parameter gradual freezing, the following beneficial effects can be obtained:
the method is also suitable for a scene of cooperatively learning an intelligent model by a plurality of nodes (or devices) similar to edge intelligence in the field of cooperative learning under the edge intelligence scene. Existing methods are mainly dedicated to training the same global model for all edge devices. Under the condition that the data distribution of each edge device is greatly different, all the edge devices do not need to apply the same global model, the practical effect of the model is not good, and sometimes the prediction accuracy of the model cannot even exceed that of the model which is independently trained by each edge device. In addition, frequent and massive model transmission between the central server and each edge device also hinders the landing and practical application of collaborative learning, and becomes a great bottleneck for the development of collaborative learning. The embodiment of the invention effectively solves the problems by freezing the parameters of the local model of the edge device by communication turns.
1) The personalized cooperative learning method provided by the invention continuously freezes important parameters in the local model by using a mask matrix mode, and the training process of the local model is also a process for finding and freezing the important parameters of the model. The neural network model is always over-parameterized, and the invention codes the edge device local data in the model by freezing important parameters, thereby effectively avoiding the problem that the parameters of the edge device local model are damaged or even completely covered because of parameter aggregation at the server end.
2) According to the method, enough free parameters can be left by adjusting the hyper-parameters, so that the model can participate in collaborative learning by using the parameters, and the generalization capability of the model is further enhanced. The invention also designs a method for adaptively adjusting the training turn number of each communication turn and the proportion of the newly added freezing parameters, further enhances the capability of finding and freezing important parameters, and reduces the calculation overhead required by the invention. In terms of communication efficiency, the personalized collaborative learning method provided by the invention not only can enable the model to be rapidly converged, but also enables the model to be trained by using few communication rounds so as to achieve the expected test accuracy, greatly reduces the communication and calculation overhead of collaborative learning, and realizes the collaborative learning of efficient communication.
3) The method is also suitable for the application scene of the multi-node collaborative learning model similar to the edge intelligence, can realize personalized model training, also keeps high-efficiency communication, and saves network bandwidth. In addition, the method has the characteristics of less number of hyper-parameters and easiness in tuning, is convenient and quick to deploy and use in various actual complex scenes, and has high practical value.
Next, a personalized cooperative learning apparatus based on parameter gradual freezing according to an embodiment of the present invention will be described with reference to the accompanying drawings.
Fig. 9 is a schematic structural diagram of a personalized cooperative learning apparatus based on parameter gradual freezing according to an embodiment of the present invention.
As shown in fig. 9, the apparatus 10 includes: a model stitching module 100, a model training module 200, and a mask update module 300.
The model splicing module 100 is configured to receive, when each communication turn starts, a global model of the communication turn sent by the central server, and splice the global model and a local model of a previous communication turn according to a mask matrix to obtain a local initial model of the communication turn;
the model training module 200 is configured to determine the number of training rounds of the communication round according to the variable parameters, and train the local initial model of the communication round according to the number of training rounds to obtain a trained local model of the communication round;
a mask updating module 300, configured to send the local model of the communication turn after the training is completed, and according to the freezing parameter pair corresponding to the local model of the communication turn after the training is completed
The mask matrix is updated.
Further, before the model splicing module 100, the method further includes:
and the global model initial module is used for receiving the initial global model issued by the central server in an initialization stage and generating a mask matrix and variable parameters corresponding to the initial global model, wherein the variable parameters are used for recording the number of communication rounds of the edge device participating in the collaborative learning.
Further, as shown in fig. 10, the model training module 200 includes:
the number calculating unit 201 is configured to update the variable parameter after the local initial model of the communication round is obtained, and calculate the number of training rounds of the local initial model of the communication round through a first formula by using the updated variable parameter;
and the gradient training unit 202 is configured to train the local model of the communication round by using a small batch gradient descent method based on the calculated number of training rounds of the communication round to obtain the trained local model of the communication round.
Further, as shown in fig. 11, the mask updating module 300 includes:
the first updating subunit 301 is configured to generate a first test accuracy after testing the trained local model of the communication turn;
and a second updating subunit 302, configured to calculate a ratio of newly added frozen parameters in the communication round of this time when the first test accuracy is greater than a second test accuracy of the local model trained in the previous round, and update the mask matrix according to the ratio of the newly added frozen parameters.
Further, as shown in fig. 12, the second updating subunit 302 includes:
an element value acquiring subunit 3021 configured to acquire an element value of the mask matrix;
a parameter value sorting subunit 3022, configured to sort the non-frozen parameter values according to the absolute values of the parameter values corresponding to the element values;
and the element value setting subunit 3023 is configured to, according to the sorting result, use the element value of the mask matrix corresponding to the weight value with the largest absolute value as the weight value of the newly added freezing parameter weight value proportion in the current communication round as a preset value.
Further, as shown in fig. 13, the apparatus 10 further includes:
and a global aggregation module 400, configured to perform parameter aggregation on all the collected local models by using the central server before the communication round is finished, so as to update the global model in the central server, and determine whether to perform a next communication round according to a training deadline.
Further, the first formula is:
wherein,meaning that the function operation is rounded up in the near sense,to representIs set to the initial value of (a),for the Kth user CK atThe number of training rounds in a communication round.
Further, the third formula and the fourth formula are respectively:
wherein, free _ ratio (rcnt) in the formula (3) k ) Representing the proportion of the not yet frozen weight to the total number of weights, B low Represents free _ ratio (rcnt) k ) The lower bound of (c); free _ ratio (rcnt) in formula (4) k ) Represents the rcnt k Value of free _ ratio in communication turn, free _ ratio (rcnt) k -1) represents the second rcnt k -1 value of free _ ratio in communication round, R fix The ratio of newly added freezing weight values in each communication turn is represented.
Further, the second updating subunit 302 is further configured to calculate, by using a third formula, a ratio of the weight of the non-frozen parameter to the weight of all the parameters, and calculate, by using a fourth formula, a ratio of the weight of the newly added frozen parameter in the current communication turn, based on the ratio of the weight of the non-frozen parameter to the weight of all the parameters.
The individualized collaborative learning device based on parameter gradual freezing provided by the embodiment of the invention can be widely applied to edge computing nodes with limited resources, namely edge equipment, and model training and parameter updating are carried out at the edge end. The scheme of the invention enables the model trained by each edge device to enhance generalization capability through cooperative learning and maintain sensitivity to local data distribution of the edge device. Meanwhile, under the condition that the data distribution of the edge devices is extremely uneven, each edge device can train an individualized model, and the prediction capability of the edge devices on local data is enhanced. The method not only can lead the model to be fast converged and save communication and calculation cost required by collaborative learning training, but also is insensitive to the introduced hyper-parameters, and has the advantages of easy adjustment and deployment and high practical application value.
Claims (12)
1. A personalized cooperative learning method based on parameter gradual freezing is characterized by comprising the following steps:
when each communication turn starts, receiving a global model of the communication turn sent by a central server, and splicing the global model and a local model of the previous communication turn according to a mask matrix to obtain a local initial model of the communication turn;
determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of the training rounds to obtain the trained local model of the communication round;
sending the local model of the communication turn after the training is finished, and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished;
the updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is completed includes:
testing the local model of the communication turn after the training is finished to generate a first testing accuracy rate;
when the first test accuracy is higher than the second test accuracy of the local model trained in the previous round, calculating the proportion of newly added frozen parameters in the communication round, and updating the mask matrix according to the proportion of the newly added frozen parameters;
the calculating the proportion of the newly added frozen parameters of the communication turn comprises the following steps:
calculating the proportion of the weight of the unfrozen parameters to the weight number of all the parameters by using a third formula;
calculating the proportion of the weight values of the newly added freezing parameters in the communication turn by using a fourth formula based on the proportion of the weight of the unfrozen parameters in the weight quantity of all the parameters;
the third formula and the fourth formula are respectively:
wherein, in the formula (3)Representing the proportion of the weights that have not been frozen to the number of all weights,to representThe lower bound of (c); in the formula (4)Is shown asTime of communication turnThe value of (a) is selected,is shown asTime of communication turnThe value of (a) is selected,the ratio of newly added freezing weight values in each communication turn is represented.
2. The method of claim 1, wherein before the beginning of each communication round, further comprising: in an initialization stage, an initial global model issued by the central server is received, and a mask matrix and variable parameters corresponding to the initial global model are generated, wherein the variable parameters are used for recording the number of communication rounds of edge devices participating in collaborative learning.
3. The method of claim 1, further comprising: before the communication round is finished, the central server is used for carrying out parameter aggregation on all the collected local models so as to update the global model in the central server, and whether the next communication round is carried out is determined according to a training cut-off condition.
4. The method according to claim 1, wherein the determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of training rounds to obtain the trained local model of the communication round comprises:
after the local initial model of the communication round is obtained, updating the variable parameters, and calculating the number of training rounds of the local initial model of the communication round by using the updated variable parameters through a first formula;
and training the local initial model of the communication round by using a small batch gradient descent method based on the number of the training rounds obtained by calculation to obtain the trained local model of the communication round.
6. The method of claim 1, wherein the updating the mask matrix according to the ratio of the newly added frozen parameters comprises:
acquiring element values of a mask matrix;
sorting the parameter values which are not frozen according to the absolute value of the parameter values corresponding to the element values;
7. An apparatus for personalized collaborative learning based on parameter gradual freezing, comprising:
the model splicing module is used for receiving a global model of the communication turn sent by the central server when each communication turn starts, and splicing the global model and a local model of the previous communication turn according to a mask matrix to obtain a local initial model of the communication turn;
the model training module is used for determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of the training rounds to obtain the trained local model of the communication round;
the mask updating module is used for sending the local model of the communication turn after the training is finished and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished;
the mask update module includes:
the first updating subunit is used for testing the local model of the communication turn after the training is finished and generating a first test accuracy rate;
the second updating subunit is used for calculating the proportion of newly added frozen parameters of the communication round when the first test accuracy is greater than the second test accuracy of the local model trained in the previous round, and updating the mask matrix according to the proportion of the newly added frozen parameters;
the second updating subunit is further configured to calculate, by using a third formula, a ratio of the weight of the unfrozen parameter to the weight of all the parameters, and calculate, based on the ratio of the weight of the unfrozen parameter to the weight of all the parameters, a ratio of newly added frozen parameter weight values in the current communication turn by using a fourth formula;
the third formula and the fourth formula are respectively:
wherein, in the formula (3)Representing the proportion of the weights that have not been frozen to the number of all weights,to representThe lower bound of (c); in the formula (4)Is shown asTime of communication turnThe value of (a) is selected,denotes the firstTime of communication turnThe value of (a) is selected,the ratio of newly added freezing weight values in each communication turn is represented.
8. The apparatus of claim 7, further comprising, prior to the model stitching module:
and the data initialization module is used for receiving an initial global model issued by the central server in an initialization stage and generating a mask matrix and variable parameters corresponding to the initial global model, wherein the variable parameters are used for recording the number of communication rounds of the edge device participating in the collaborative learning.
9. The apparatus of claim 7, further comprising:
and the global aggregation module is used for performing parameter aggregation on all the collected local models by using the central server before the communication round is finished so as to update the global model in the central server, and simultaneously determining whether to perform the next communication round according to a training cutoff condition.
10. The apparatus of claim 7, wherein the model training module comprises:
the number calculation unit is used for updating the variable parameters after the local initial model of the communication round is obtained, and calculating the number of training rounds of the local initial model of the communication round by using the updated variable parameters through a first formula;
and the gradient training unit is used for training the local initial model of the communication round by using a small batch gradient descent method based on the number of the training rounds obtained by calculation so as to obtain the trained local model of the communication round.
12. The apparatus of claim 7, wherein the second updating subunit comprises:
an element value obtaining subunit, configured to obtain an element value of the mask matrix;
the parameter value sorting subunit is used for sorting the parameter values which are not frozen according to the absolute value of the parameter values corresponding to the element values;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210792509.9A CN114881229B (en) | 2022-07-07 | 2022-07-07 | Personalized collaborative learning method and device based on parameter gradual freezing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210792509.9A CN114881229B (en) | 2022-07-07 | 2022-07-07 | Personalized collaborative learning method and device based on parameter gradual freezing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114881229A CN114881229A (en) | 2022-08-09 |
CN114881229B true CN114881229B (en) | 2022-09-20 |
Family
ID=82683465
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210792509.9A Active CN114881229B (en) | 2022-07-07 | 2022-07-07 | Personalized collaborative learning method and device based on parameter gradual freezing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114881229B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10140977B1 (en) * | 2018-07-31 | 2018-11-27 | botbotbotbot Inc. | Generating additional training data for a natural language understanding engine |
CN114257395A (en) * | 2021-11-01 | 2022-03-29 | 清华大学 | Customized network security situation perception method and device based on collaborative learning |
CN114297382A (en) * | 2021-12-28 | 2022-04-08 | 杭州电子科技大学 | Controllable text generation method based on parameter fine adjustment of generative pre-training model |
CN114372146A (en) * | 2022-01-07 | 2022-04-19 | 清华大学 | Method and apparatus for training concept recognition models and recognizing concepts |
CN114418085A (en) * | 2021-12-01 | 2022-04-29 | 清华大学 | Personalized collaborative learning method and device based on neural network model pruning |
-
2022
- 2022-07-07 CN CN202210792509.9A patent/CN114881229B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10140977B1 (en) * | 2018-07-31 | 2018-11-27 | botbotbotbot Inc. | Generating additional training data for a natural language understanding engine |
CN114257395A (en) * | 2021-11-01 | 2022-03-29 | 清华大学 | Customized network security situation perception method and device based on collaborative learning |
CN114418085A (en) * | 2021-12-01 | 2022-04-29 | 清华大学 | Personalized collaborative learning method and device based on neural network model pruning |
CN114297382A (en) * | 2021-12-28 | 2022-04-08 | 杭州电子科技大学 | Controllable text generation method based on parameter fine adjustment of generative pre-training model |
CN114372146A (en) * | 2022-01-07 | 2022-04-19 | 清华大学 | Method and apparatus for training concept recognition models and recognizing concepts |
Also Published As
Publication number | Publication date |
---|---|
CN114881229A (en) | 2022-08-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111160525B (en) | Task unloading intelligent decision-making method based on unmanned aerial vehicle group in edge computing environment | |
CN112367109A (en) | Incentive method for digital twin-driven federal learning in air-ground network | |
CN110968426B (en) | Edge cloud collaborative k-means clustering model optimization method based on online learning | |
EP4350572A1 (en) | Method, apparatus and system for generating neural network model, devices, medium and program product | |
Li et al. | Adaptive traffic signal control model on intersections based on deep reinforcement learning | |
CN110956202A (en) | Image training method, system, medium and intelligent device based on distributed learning | |
CN116523079A (en) | Reinforced learning-based federal learning optimization method and system | |
CN109787699B (en) | Wireless sensor network routing link state prediction method based on mixed depth model | |
CN114169543B (en) | Federal learning method based on model staleness and user participation perception | |
CN111224905B (en) | Multi-user detection method based on convolution residual error network in large-scale Internet of things | |
CN116450312A (en) | Scheduling strategy determination method and system for pipeline parallel training | |
Yang et al. | Deep reinforcement learning based wireless network optimization: A comparative study | |
CN109548044B (en) | DDPG (distributed data group pg) -based bit rate optimization method for energy-collectable communication | |
CN115374853A (en) | Asynchronous federal learning method and system based on T-Step polymerization algorithm | |
CN116610434A (en) | Resource optimization method for hierarchical federal learning system | |
CN113919483A (en) | Method and system for constructing and positioning radio map in wireless communication network | |
CN112836822A (en) | Federal learning strategy optimization method and device based on width learning | |
CN116542296A (en) | Model training method and device based on federal learning and electronic equipment | |
CN113516163B (en) | Vehicle classification model compression method, device and storage medium based on network pruning | |
CN113094180B (en) | Wireless federal learning scheduling optimization method and device | |
CN110929885A (en) | Smart campus-oriented distributed machine learning model parameter aggregation method | |
CN114881229B (en) | Personalized collaborative learning method and device based on parameter gradual freezing | |
CN117392483A (en) | Album classification model training acceleration method, system and medium based on reinforcement learning | |
CN113747450A (en) | Service deployment method and device in mobile network and electronic equipment | |
CN112241295A (en) | Cloud edge cooperative computing unloading method and system based on deep reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |