CN114881229B

CN114881229B - Personalized collaborative learning method and device based on parameter gradual freezing

Info

Publication number: CN114881229B
Application number: CN202210792509.9A
Authority: CN
Inventors: 徐恪; 刘泱; 赵乙
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2022-09-20
Anticipated expiration: 2042-07-07
Also published as: CN114881229A

Abstract

The invention provides a personalized cooperative learning method and device based on parameter gradual freezing, wherein the method comprises the following steps: receiving a global model of the communication round sent by a central server at the beginning of each communication round, and splicing the global model and a local model of the previous communication round according to a mask matrix to obtain a local initial model of the communication round; determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of the training rounds; and sending the local model of the communication turn after the training is finished, and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished. The invention can reduce the number of communication rounds required by collaborative learning and reduce the communication overhead in the model training process.

Description

Personalized collaborative learning method and device based on parameter gradual freezing

Technical Field

The invention relates to the technical field of edge computing, in particular to a personalized collaborative learning method and device based on parameter gradual freezing.

Background

With the advent of the world of everything interconnection, more and more edge devices have access to the network. The data generated by various edge devices and the corresponding computing, storage, and communication requirements have all grown dramatically. Scenes such as malicious flow detection, industrial internet, smart home, automatic driving and vehicle networking and the like need to transmit, calculate, store and decide a large amount of data generated in a short time quickly and rapidly, namely a calculation paradigm with ultra-low time delay and enough intellectualization is needed to process business requirements under various complex scenes. The existing cloud computing mode has the problems of weak real-time performance, insufficient bandwidth, high energy consumption, data privacy risks and the like for edge computing scenes, edge computing is widely applied to various edge scenes as powerful supplement of cloud computing, and the capability of edge equipment for processing complex problems is greatly enhanced.

In recent years, edge calculation has the characteristics of scene diversification, problem complication, equipment scale and the like, and single-point edge calculation has no apprehension on the significance of the change trend. For example, in an internet of vehicles scenario, a vehicle needs to cooperatively communicate and make decisions with a road side scheduler, other vehicles, and a cloud server, and a computing scenario is very complex and requires multiple computing nodes to cooperatively operate. For the edge mobile network, problems such as dynamic scheduling of computing and storage resources and malicious traffic detection also require cooperation of a plurality of edge computing nodes. These problems can be solved by introducing artificial intelligence methods such as a deep neural network, and therefore collaborative learning has become an important technology in edge computation, so that edge computation can advance to edge intelligence.

There are two issues that need to be considered when the edge device (which has the same meaning as the edge node) trains the neural network model. First, the computing and communication capabilities of the edge devices are limited, and the computing and communication overhead of the edge device side needs to be reduced as much as possible during collaborative learning. Secondly, the data amount on the edge devices is small and the distribution is very uneven, i.e. the problem of non-independent and same distribution of data exists between the edge devices. Therefore, how to effectively solve the problems of communication efficiency and data non-independent and same distribution of cooperative learning in the edge intelligent scene needs to be solved urgently.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, the invention provides a personalized cooperative learning method and device based on parameter gradual freezing, which are used for solving the problems of communication efficiency and data non-independent and same distribution of cooperative learning in a marginal intelligent scene, reducing the number of communication rounds required by the cooperative learning and reducing the communication overhead in the training process.

The embodiment of the invention provides a personalized cooperative learning method based on parameter gradual freezing, which comprises the following steps:

when each communication turn starts, receiving a global model of the communication turn sent by a central server, and splicing the global model and a local model of a previous communication turn according to a mask matrix to obtain a local initial model of the communication turn;

determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of the training rounds to obtain the trained local model of the communication round;

and sending the local model of the communication turn after the training is finished, and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished.

According to the personalized collaborative learning method based on parameter gradual freezing (namely, the data environments which are not independently and uniformly distributed), the personalized models which are suitable for respective data can be obtained by training under the condition that the data distribution difference of each edge device is large, and the prediction accuracy of the models is further improved. And the model can be converged quickly, the number of communication rounds required by cooperative learning is reduced, and the communication overhead of training is reduced. In addition, the method is insensitive to the introduced hyper-parameters, so that the model can be rapidly and reliably deployed in various real complex environments.

In addition, the personalized collaborative learning method based on parameter gradual freezing according to the above embodiment of the present invention may have the following additional technical features:

further, in an embodiment of the present invention, before the beginning of each communication turn, the method further includes: in an initialization stage, an initial global model issued by the central server is received, and a mask matrix and variable parameters corresponding to the initial global model are generated, wherein the variable parameters are used for recording the number of communication rounds of edge devices participating in collaborative learning.

Further, in an embodiment of the present invention, the method further includes: before the communication round is finished, the central server is used for carrying out parameter aggregation on all the collected local models so as to update the global model in the central server, and whether the next communication round is carried out is determined according to a training cut-off condition.

Further, in an embodiment of the present invention, the determining, according to the variable parameter, the number of training rounds of the communication round, and training the local initial model of the communication round according to the number of training rounds to obtain the trained local model of the communication round includes: after the local initial model of the communication round is obtained, updating the variable parameters, and calculating the number of training rounds of the local initial model of the communication round by using the updated variable parameters through a first formula; and training the local initial model of the communication round by using a small batch gradient descent method based on the number of the training rounds obtained by calculation to obtain the trained local model of the communication round.

Further, in one embodiment of the present invention, the first formula is:

（1）

wherein,

meaning that the function operation is rounded up in the near sense,

to represent

Is set to the initial value of (a),

for the Kth user C _K In the first place

The number of training rounds in a communication round.

Further, in an embodiment of the present invention, the updating the mask matrix according to the freezing parameter corresponding to the local model of the current communication turn includes: through the first step

Individual user

Testing the local model of the communication turn after the training is finished to generate a first testing accuracy rate; when the first test accuracy rate is greater than the last round of trainingAnd when the second test accuracy of the local model is good, calculating the proportion of newly added frozen parameters in the communication round, and updating the mask matrix according to the proportion of the newly added frozen parameters.

Further, in an embodiment of the present invention, the calculating a ratio of newly added frozen parameters in the communication round includes: calculating the proportion of the weight of the unfrozen parameter to the weight quantity of all the parameters by using a third formula; and calculating by using a fourth formula to obtain the proportion of the weight values of the newly added freezing parameters in the communication turn based on the proportion of the weight of the unfrozen parameters in the weight quantity of all the parameters.

Further, in an embodiment of the present invention, the third formula and the fourth formula are respectively:

（3）

（4）

wherein, free _ ratio (rcnt) in the formula (3) _k ) Representing the proportion of the not yet frozen weight to the total number of weights, B _low Represents free _ ratio (rcnt) _k ) The lower bound of (c); free _ ratio (rcnt) in formula (4) _k ) Represents the rcnt _k Value of free _ ratio in communication turn, free _ ratio (rcnt) _k -1) represents the second rcnt _k -1 value of free _ ratio in communication round, R _fix The ratio of newly added freezing weight values in each communication turn is represented.

Further, in an embodiment of the present invention, the updating the mask matrix according to the ratio of the newly added frozen parameter includes: acquiring element values of a mask matrix; sorting the parameter values which are not frozen according to the absolute value of the parameter values corresponding to the element values; and setting the element value of the mask matrix corresponding to the weight value with the maximum absolute value proportion as the weight value proportion of the newly added freezing parameter weight value in the communication turn as a preset value according to the sorting result.

The embodiment of the invention provides an individualized collaborative learning device based on parameter gradual freezing, which comprises:

the model splicing module is used for receiving a global model of the communication turn sent by the central server when each communication turn starts, and splicing the global model and a local model of the previous communication turn according to a mask matrix to obtain a local initial model of the communication turn;

the model training module is used for determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of the training rounds to obtain the trained local model of the communication round;

and the mask updating module is used for sending the local model of the communication turn after the training is finished and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished.

According to the individualized collaborative learning device based on parameter gradual freezing, under the condition that the data distribution difference of each edge device is large (namely, the data environment with non-independent same distribution), individualized models suitable for respective data are obtained through training, and the prediction accuracy of the models is further improved. And the model can be converged quickly, the number of communication rounds required by cooperative learning is reduced, and the communication overhead of training is reduced. In addition, the method is insensitive to the introduced hyper-parameters, so that the model can be rapidly and reliably deployed in various real complex environments.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart of a personalized cooperative learning method based on parameter gradual freezing according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a filter weight value concatenation flow of 5 × 5 according to an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating a 5 × 5 filter mask matrix updating process according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating the calculation of the number of training rounds of the communication round to train the local model of the communication round according to the embodiment of the present invention;

fig. 5 is a flowchart illustrating updating of a mask matrix according to a freezing parameter corresponding to a local model of the current communication turn according to the embodiment of the present invention;

fig. 6 is a flowchart of calculating a ratio of newly added frozen parameters of the current communication turn according to the embodiment of the present invention;

FIG. 7 is an experimental schematic of an adaptive tuning method according to an embodiment of the present invention;

fig. 8 is a flowchart of updating a mask matrix according to the ratio of newly added frozen parameters according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an apparatus for personalized collaborative learning based on gradual parameter freezing according to an embodiment of the present invention;

FIG. 10 is a schematic structural diagram of a model training module according to an embodiment of the present invention;

FIG. 11 is a block diagram of a mask update module according to an embodiment of the present invention;

FIG. 12 is a diagram illustrating a second update sub-module according to an embodiment of the present invention;

fig. 13 is a schematic structural diagram of a global aggregation module according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative and intended to explain the present invention and should not be construed as limiting the present invention.

The following describes a personalized collaborative learning method and apparatus based on parameter gradual freezing according to an embodiment of the present invention with reference to the accompanying drawings.

Fig. 1 is a flowchart of a personalized collaborative learning method based on parameter gradual freezing according to an embodiment of the present invention.

And step S1, when each communication turn starts, receiving the global model of the communication turn sent by the central server, and splicing the global model and the local model of the previous communication turn according to the mask matrix to obtain the local initial model of the communication turn.

It can be understood that, in the initialization stage, the initial global model issued by the central server is received, and a mask matrix and variable parameters corresponding to the initial global model are generated, where the variable parameters are used to record the number of communication rounds in which the edge device participates in the collaborative learning.

In particular, the central server pair is the initial global model

Carry out parameter initialization and will

Is sent to each edge device

。

Specifically, according to the initial global model issued by the central server,

locally maintaining a mask matrix on the same scale as the model

And transforming the variables

Is initialized to 0, and the variable records

In communication rounds involving cooperative learningNumber of the cells. In one embodiment of the present invention, each edge device will use the same network as the global model, and is also issued by the central server.

And the edge equipment receives the initial global model issued by the central server and generates a mask matrix and variable parameters corresponding to the initial global model.

Further, the edge devices are randomly sampled. For N edge devices and the random sampling rate K, N × K randomly sampled edge devices participate in the round of cooperative learning, that is, the number of edge devices participating in the r-th round of cooperative learning is S = max (N × K,1), and these edge devices form the set S _r ={C ₁ ,……，C _S }。

Further, at the beginning of each communication turn, the central server will model the global model

Sending the data to the edge device set obtained by sampling

Each edge device of (1). Edge devices

Receiving a global model

Then, the global model is used

Local model of previous communication round

Make a splice, i.e.

As shown in fig. 2.

In one embodiment, fig. 2 is a schematic diagram of a filter weight value concatenation flow of 5 × 5, which introduces weight value concatenation. Left plus signEdge filter characterization global model

The filter to the right of the plus sign represents the local model of the previous round

The filter on the right of the equal sign represents the local initial model of the round

. The numbers in the circles respectively represent matrices

And

the value of the medium element(s),

is a mask matrix

Obtaining a matrix after the element inversion,

meaning multiplication by element. For the

And

only the weight value with a corresponding mask value of 1 will be assigned to the local initial model for that communication turn. It can be understood that the mask matrix

The method and the device ensure that the weight value which is frozen in the local initial model is not covered by the weight value of the corresponding position of the global model in the communication turn.

And splicing the global model and the local model of the previous communication turn according to the mask matrix to obtain the local initial model of the current communication turn.

And step S2, determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of the training rounds to obtain the trained local model of the communication round.

It will be understood that the variables are

Record up

The number of communication rounds participating in cooperative learning is calculated by the calculation formula

And training the local initial model of the communication round according to the number of the training rounds to obtain the trained local model of the communication round.

And step S3, sending the local model of the communication turn after the training is finished, and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished.

In particular, edge devices

Complete the local training process and model

After uploading to the central server, the mask matrix maintained locally is updated

。

In one embodiment, fig. 3 is a schematic diagram of a filter mask matrix update flow of 5 × 5, and the mask matrix update is schematically illustrated. The element of the mask matrix with a value of 1 in fig. 3 indicates that the corresponding parameter value is not frozen, the element with a value of 0 and a black border line indicates that the corresponding parameter value is frozen, and the number of frozen parameters gradually increases with the change of communication turns. In each communication round of training, the model will only update the unfrozen weights. For weights that have been frozen, the model will still use them, but will not update them any more.

The embodiment of the invention can be widely applied to the edge computing nodes with limited resources, namely edge equipment, and model training and parameter updating are carried out at the edge end. The scheme of the invention enables the model trained by each edge device to enhance generalization capability through cooperative learning and maintain sensitivity to local data distribution of the edge device. Meanwhile, under the condition that the data distribution of the edge devices is extremely uneven, each edge device can train an individualized model, and the prediction capability of the edge devices on local data is enhanced. The method not only can lead the model to be fast converged and save communication and calculation cost required by collaborative learning training, but also is insensitive to the introduced hyper-parameters, and has the advantages of easy adjustment and deployment and high practical application value.

Fig. 4 is a flowchart for calculating the number of training rounds of the communication round to train the local model of the communication round, as shown in fig. 4, in an embodiment, the number of training rounds is determined according to variable parameters, and the local initial model of the communication round is trained according to the number of training rounds to obtain the trained local model of the communication round, which includes the following sub-steps:

step 401, after the local initial model of the communication round is obtained, updating the variable parameters, and calculating the number of training rounds of the local initial model of the communication round by using the updated variable parameters through a first formula.

Specifically, after the local initial model of the communication turn is obtained, the edge device is updated

Locally maintained variables

I.e. by

. Is calculated according to equation (1)

In the first place

Number of training rounds in a communication round

：

（1）

Wherein,

representing a rounding function operation in proximity. Hyper-parameter

To represent

I.e. in the first communication round

The value of (a). In particular when

When the value of (1) is obtained, training round

Is exactly as

. With following

The value of (a) is increased by,

will gradually decrease and eventually converge to 1.

And step 402, based on the number of training rounds of the communication round obtained by calculation, training the local initial model of the communication round by using a small batch gradient descent method to obtain the trained local model of the communication round.

In particular, an edge device

For model

Training is carried out by using a small batch gradient descent method, as shown in formula (2):

（2）

wherein,

representing the learning rate in the gradient descent algorithm,

in order to perform the objective function of the optimization,

representing multiplications by element correspondence position.

Optionally, for the model

Training, can also use batch ladderTraining is carried out by a degree descent method, a random gradient descent method and the like.

And obtaining the local model of the communication turn after training through the model training.

The method according to the embodiment of the present invention, after step S4, further includes:

and step S5, before the communication round is finished, the central server is used for carrying out parameter aggregation on all the collected local models so as to update the global model in the central server, and whether the next communication round is carried out is determined according to the training cutoff condition.

Specifically, the central server collects

And (3) performing parameter aggregation on the collected models of all the edge devices in the set or exceeding a preset waiting time, namely performing weighted average on the models according to the scale of the training data volume of each edge device.

It can be understood that the central server performs parameter aggregation on all received local models of the users, and updates the global model in the central server, so as to determine whether to perform the next communication turn according to the training cutoff condition and send the next communication turn to the users.

Fig. 5 is a flowchart illustrating updating a mask matrix according to a freezing parameter corresponding to a local model of the current communication turn according to an embodiment of the present invention, where as shown in fig. 5, in an embodiment, updating a mask matrix according to a freezing parameter includes the following sub-steps:

step 501, testing the local model of the communication turn after training is completed, and then generating a first testing accuracy.

And 502, when the first test accuracy is greater than the second test accuracy of the local model trained in the previous round, calculating the proportion of newly added frozen parameters in the communication round, and updating a mask matrix according to the proportion of the newly added frozen parameters.

In particular, the Kth user

Local models obtained by the training of the current round after the training is finished

And testing to obtain the testing accuracy acc. If acc is larger than the test accuracy rate last _ acc of the local model obtained in the previous round, updating the mask matrix

. Otherwise, the updating is not carried out, and the subsequent steps are not executed.

Fig. 6 is a flowchart of calculating a ratio of newly added frozen parameters of the current communication round according to an embodiment of the present invention, and as shown in fig. 6, in an embodiment, calculating a ratio of newly added frozen parameters of the current communication round includes the following sub-steps:

step 601, calculating the proportion of the weight of the unfrozen parameter to the weight number of all the parameters by using a third formula.

Specifically, the second is calculated using formula (3)

The proportion of the frozen parameters is newly added in the turn

. For the purpose of freezing important parameters and preserving the information encoded by the model on the data as early as possible, and for the purpose of reducing the number of training rounds

Suitably, the following two equations are used to determine the proportion of the newly added freezing weight value in each communication turn.

（3）

In equation (3), free _ ratio (rcnt) _k ) Represents the free weight, i.e. the weight that has not been frozen, accounts forThere is a proportion of the number of weights. B ultrasonic parameter _low Represents free _ ratio (rcnt) _k ) Lower boundary of (B) _low The value of (d) determines the final proportion of the free weights in the model. Specifically, when rcnt _k When the value of (c) is 0, free _ ratio (rcnt) _k ) Is 1, which corresponds to the situation before the first mask matrix update. With rcnt _k Is increased, free _ ratio (rcnt) _k ) Will gradually decrease and eventually converge to B _low 。

Step 602, based on the proportion of the weight of the unfrozen parameter to the weight of all the parameters, the proportion of the weight value of the newly added freezing parameter in the current communication turn is calculated by using a fourth formula.

Specifically, as in equation (4), free _ ratio (rcnt) _k ) Represents the rcnt _k Value of free _ ratio in communication turn, free _ ratio (rcnt) _k -1) represents the second rcnt _k -1 value of free _ ratio at communication turn. From the formula (3), free _ ratio follows rcnt _k The value increases but monotonically decreases, so the fraction in equation (4) is always less than 1, R _fix The value of (b) is always greater than 0. R is _fix Indicating the proportion of the newly added frozen weight value in each communication turn, which also determines the number of elements whose value changes from 1 to 0 each time the mask matrix is updated.

（4）

And (4) calculating by using a formula (4) to obtain the proportion of the weight value of the newly added freezing parameter in the communication turn.

Further, fig. 7 shows an experimental schematic diagram of the adaptive tuning method, i.e. the number of training rounds

And the freezing parameter ratio (freezing rate) with the change of the communication turns. In FIG. 7, setting

The value of which is 50 or more,

the value is 0.8, which means that the freezing parameter proportion is at most 20%.

Fig. 8 is a flowchart of updating a mask matrix according to a ratio of a newly added frozen parameter according to an embodiment of the present invention, and as shown in fig. 8, in an embodiment, updating the mask matrix according to a ratio of a newly added frozen parameter includes the following sub-steps:

step 801, acquiring element values of a mask matrix;

step 802, sorting the parameter values which are not frozen according to the absolute values of the parameter values corresponding to the element values;

and 803, according to the sorting result, taking the element value of the mask matrix corresponding to the weight value with the maximum absolute value proportion as the weight value of the newly added freezing parameter weight value proportion in the communication turn as a preset numerical value.

Specifically, the embodiment of the invention updates the mask matrix layer by layer

The value of (2). For each layer of the neural network model, sorting the parameter values which are not frozen according to the absolute values of the parameter values corresponding to the matrix elements, and setting the ratio of the maximum absolute value as

Corresponding to the weight value of

The element value is set to 0.

By implementing the personalized collaborative learning method based on parameter gradual freezing, the following beneficial effects can be obtained:

the method is also suitable for a scene of cooperatively learning an intelligent model by a plurality of nodes (or devices) similar to edge intelligence in the field of cooperative learning under the edge intelligence scene. Existing methods are mainly dedicated to training the same global model for all edge devices. Under the condition that the data distribution of each edge device is greatly different, all the edge devices do not need to apply the same global model, the practical effect of the model is not good, and sometimes the prediction accuracy of the model cannot even exceed that of the model which is independently trained by each edge device. In addition, frequent and massive model transmission between the central server and each edge device also hinders the landing and practical application of collaborative learning, and becomes a great bottleneck for the development of collaborative learning. The embodiment of the invention effectively solves the problems by freezing the parameters of the local model of the edge device by communication turns.

1) The personalized cooperative learning method provided by the invention continuously freezes important parameters in the local model by using a mask matrix mode, and the training process of the local model is also a process for finding and freezing the important parameters of the model. The neural network model is always over-parameterized, and the invention codes the edge device local data in the model by freezing important parameters, thereby effectively avoiding the problem that the parameters of the edge device local model are damaged or even completely covered because of parameter aggregation at the server end.

2) According to the method, enough free parameters can be left by adjusting the hyper-parameters, so that the model can participate in collaborative learning by using the parameters, and the generalization capability of the model is further enhanced. The invention also designs a method for adaptively adjusting the training turn number of each communication turn and the proportion of the newly added freezing parameters, further enhances the capability of finding and freezing important parameters, and reduces the calculation overhead required by the invention. In terms of communication efficiency, the personalized collaborative learning method provided by the invention not only can enable the model to be rapidly converged, but also enables the model to be trained by using few communication rounds so as to achieve the expected test accuracy, greatly reduces the communication and calculation overhead of collaborative learning, and realizes the collaborative learning of efficient communication.

3) The method is also suitable for the application scene of the multi-node collaborative learning model similar to the edge intelligence, can realize personalized model training, also keeps high-efficiency communication, and saves network bandwidth. In addition, the method has the characteristics of less number of hyper-parameters and easiness in tuning, is convenient and quick to deploy and use in various actual complex scenes, and has high practical value.

Next, a personalized cooperative learning apparatus based on parameter gradual freezing according to an embodiment of the present invention will be described with reference to the accompanying drawings.

Fig. 9 is a schematic structural diagram of a personalized cooperative learning apparatus based on parameter gradual freezing according to an embodiment of the present invention.

As shown in fig. 9, the apparatus 10 includes: a model stitching module 100, a model training module 200, and a mask update module 300.

The model splicing module 100 is configured to receive, when each communication turn starts, a global model of the communication turn sent by the central server, and splice the global model and a local model of a previous communication turn according to a mask matrix to obtain a local initial model of the communication turn;

the model training module 200 is configured to determine the number of training rounds of the communication round according to the variable parameters, and train the local initial model of the communication round according to the number of training rounds to obtain a trained local model of the communication round;

a mask updating module 300, configured to send the local model of the communication turn after the training is completed, and according to the freezing parameter pair corresponding to the local model of the communication turn after the training is completed

The mask matrix is updated.

Further, before the model splicing module 100, the method further includes:

and the global model initial module is used for receiving the initial global model issued by the central server in an initialization stage and generating a mask matrix and variable parameters corresponding to the initial global model, wherein the variable parameters are used for recording the number of communication rounds of the edge device participating in the collaborative learning.

Further, as shown in fig. 10, the model training module 200 includes:

the number calculating unit 201 is configured to update the variable parameter after the local initial model of the communication round is obtained, and calculate the number of training rounds of the local initial model of the communication round through a first formula by using the updated variable parameter;

and the gradient training unit 202 is configured to train the local model of the communication round by using a small batch gradient descent method based on the calculated number of training rounds of the communication round to obtain the trained local model of the communication round.

Further, as shown in fig. 11, the mask updating module 300 includes:

the first updating subunit 301 is configured to generate a first test accuracy after testing the trained local model of the communication turn;

and a second updating subunit 302, configured to calculate a ratio of newly added frozen parameters in the communication round of this time when the first test accuracy is greater than a second test accuracy of the local model trained in the previous round, and update the mask matrix according to the ratio of the newly added frozen parameters.

Further, as shown in fig. 12, the second updating subunit 302 includes:

an element value acquiring subunit 3021 configured to acquire an element value of the mask matrix;

a parameter value sorting subunit 3022, configured to sort the non-frozen parameter values according to the absolute values of the parameter values corresponding to the element values;

and the element value setting subunit 3023 is configured to, according to the sorting result, use the element value of the mask matrix corresponding to the weight value with the largest absolute value as the weight value of the newly added freezing parameter weight value proportion in the current communication round as a preset value.

Further, as shown in fig. 13, the apparatus 10 further includes:

and a global aggregation module 400, configured to perform parameter aggregation on all the collected local models by using the central server before the communication round is finished, so as to update the global model in the central server, and determine whether to perform a next communication round according to a training deadline.

Further, the first formula is:

（1）

wherein,

meaning that the function operation is rounded up in the near sense,

to represent

Is set to the initial value of (a),

for the Kth user CK at

The number of training rounds in a communication round.

Further, the third formula and the fourth formula are respectively:

（3）

（4）

Further, the second updating subunit 302 is further configured to calculate, by using a third formula, a ratio of the weight of the non-frozen parameter to the weight of all the parameters, and calculate, by using a fourth formula, a ratio of the weight of the newly added frozen parameter in the current communication turn, based on the ratio of the weight of the non-frozen parameter to the weight of all the parameters.

The individualized collaborative learning device based on parameter gradual freezing provided by the embodiment of the invention can be widely applied to edge computing nodes with limited resources, namely edge equipment, and model training and parameter updating are carried out at the edge end. The scheme of the invention enables the model trained by each edge device to enhance generalization capability through cooperative learning and maintain sensitivity to local data distribution of the edge device. Meanwhile, under the condition that the data distribution of the edge devices is extremely uneven, each edge device can train an individualized model, and the prediction capability of the edge devices on local data is enhanced. The method not only can lead the model to be fast converged and save communication and calculation cost required by collaborative learning training, but also is insensitive to the introduced hyper-parameters, and has the advantages of easy adjustment and deployment and high practical application value.

Claims

1. A personalized cooperative learning method based on parameter gradual freezing is characterized by comprising the following steps:

when each communication turn starts, receiving a global model of the communication turn sent by a central server, and splicing the global model and a local model of the previous communication turn according to a mask matrix to obtain a local initial model of the communication turn;

sending the local model of the communication turn after the training is finished, and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished;

the updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is completed includes:

testing the local model of the communication turn after the training is finished to generate a first testing accuracy rate;

when the first test accuracy is higher than the second test accuracy of the local model trained in the previous round, calculating the proportion of newly added frozen parameters in the communication round, and updating the mask matrix according to the proportion of the newly added frozen parameters;

the calculating the proportion of the newly added frozen parameters of the communication turn comprises the following steps:

calculating the proportion of the weight of the unfrozen parameters to the weight number of all the parameters by using a third formula;

calculating the proportion of the weight values of the newly added freezing parameters in the communication turn by using a fourth formula based on the proportion of the weight of the unfrozen parameters in the weight quantity of all the parameters;

the third formula and the fourth formula are respectively:

（3）

（4）

wherein, in the formula (3)

Representing the proportion of the weights that have not been frozen to the number of all weights,

to represent

The lower bound of (c); in the formula (4)

Is shown as

Time of communication turn

The value of (a) is selected,

is shown as

Time of communication turn

The value of (a) is selected,

the ratio of newly added freezing weight values in each communication turn is represented.

2. The method of claim 1, wherein before the beginning of each communication round, further comprising: in an initialization stage, an initial global model issued by the central server is received, and a mask matrix and variable parameters corresponding to the initial global model are generated, wherein the variable parameters are used for recording the number of communication rounds of edge devices participating in collaborative learning.

3. The method of claim 1, further comprising: before the communication round is finished, the central server is used for carrying out parameter aggregation on all the collected local models so as to update the global model in the central server, and whether the next communication round is carried out is determined according to a training cut-off condition.

4. The method according to claim 1, wherein the determining the number of training rounds of the communication round according to the variable parameters, and training the local initial model of the communication round according to the number of training rounds to obtain the trained local model of the communication round comprises:

after the local initial model of the communication round is obtained, updating the variable parameters, and calculating the number of training rounds of the local initial model of the communication round by using the updated variable parameters through a first formula;

and training the local initial model of the communication round by using a small batch gradient descent method based on the number of the training rounds obtained by calculation to obtain the trained local model of the communication round.

5. The method of claim 4, wherein the first formula is:

（1）

wherein,

meaning that the function operation is rounded up in the near sense,

to represent

Of the initial value of (a) is,

for the Kth user C _K In the first place

Number of training rounds in a communication round.

6. The method of claim 1, wherein the updating the mask matrix according to the ratio of the newly added frozen parameters comprises:

acquiring element values of a mask matrix;

sorting the parameter values which are not frozen according to the absolute value of the parameter values corresponding to the element values;

according to the result of the sorting, the ratio with the maximum absolute value is

The element value of the mask matrix corresponding to the weight value of (1) is set to 0.

7. An apparatus for personalized collaborative learning based on parameter gradual freezing, comprising:

the mask updating module is used for sending the local model of the communication turn after the training is finished and updating the mask matrix according to the freezing parameters corresponding to the local model of the communication turn after the training is finished;

the mask update module includes:

the first updating subunit is used for testing the local model of the communication turn after the training is finished and generating a first test accuracy rate;

the second updating subunit is used for calculating the proportion of newly added frozen parameters of the communication round when the first test accuracy is greater than the second test accuracy of the local model trained in the previous round, and updating the mask matrix according to the proportion of the newly added frozen parameters;

the second updating subunit is further configured to calculate, by using a third formula, a ratio of the weight of the unfrozen parameter to the weight of all the parameters, and calculate, based on the ratio of the weight of the unfrozen parameter to the weight of all the parameters, a ratio of newly added frozen parameter weight values in the current communication turn by using a fourth formula;

the third formula and the fourth formula are respectively:

（3）

（4）

wherein, in the formula (3)

to represent

The lower bound of (c); in the formula (4)

Is shown as

Time of communication turn

The value of (a) is selected,

denotes the first

Time of communication turn

The value of (a) is selected,

8. The apparatus of claim 7, further comprising, prior to the model stitching module:

and the data initialization module is used for receiving an initial global model issued by the central server in an initialization stage and generating a mask matrix and variable parameters corresponding to the initial global model, wherein the variable parameters are used for recording the number of communication rounds of the edge device participating in the collaborative learning.

9. The apparatus of claim 7, further comprising:

and the global aggregation module is used for performing parameter aggregation on all the collected local models by using the central server before the communication round is finished so as to update the global model in the central server, and simultaneously determining whether to perform the next communication round according to a training cutoff condition.

10. The apparatus of claim 7, wherein the model training module comprises:

the number calculation unit is used for updating the variable parameters after the local initial model of the communication round is obtained, and calculating the number of training rounds of the local initial model of the communication round by using the updated variable parameters through a first formula;

and the gradient training unit is used for training the local initial model of the communication round by using a small batch gradient descent method based on the number of the training rounds obtained by calculation so as to obtain the trained local model of the communication round.

11. The apparatus of claim 10, wherein the first formula is:

（1）

wherein,

meaning that the function operation is rounded up in the near sense,

to represent

Is set to the initial value of (a),

for the Kth user C _K In the first place

The number of training rounds in a communication round.

12. The apparatus of claim 7, wherein the second updating subunit comprises:

an element value obtaining subunit, configured to obtain an element value of the mask matrix;

the parameter value sorting subunit is used for sorting the parameter values which are not frozen according to the absolute value of the parameter values corresponding to the element values;

an element value setting subunit, configured to set, according to the sorting result, the ratio with the largest absolute value as