CN114781654A

CN114781654A - Federal transfer learning method, device, computer equipment and medium

Info

Publication number: CN114781654A
Application number: CN202210702928.9A
Authority: CN
Inventors: 司世景; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-06-21
Filing date: 2022-06-21
Publication date: 2022-07-22

Abstract

The application relates to a method, a device, computer equipment and a medium for Federation transfer learning. In the method, a local training model of a client comprises at least two convolution layers, an adaptation layer is arranged between two adjacent convolution layers, model parameters of the local training model are fixed, a loss function is calculated by using a local training set, updating local adaptation parameters of the adaptation layer by using a loss function through an optimizer algorithm, sending the updated local adaptation parameters to the server, and obtaining global adaptation parameters of the adaptation layer calculated by the server, updating local adaptive parameters of corresponding adaptive layers, using the updated model parameters of the preset adjusting algorithm, returning to execute training until convergence, adopting the additional adaptive layers to convert the model parameters of the convolutional layers into adaptive parameters communicated with the server, to reduce communication overhead, to fix model parameter training and to use global adaptation parameters to fine-tune the model parameters backwards, the difference between the model parameters and the adaptive parameters is small, and the use of the trained model on a target scene is not influenced.

Description

Federal transfer learning method, device, computer equipment and medium

Technical Field

The application is applicable to the technical field of artificial intelligence, and particularly relates to a method and a device for federated migration learning, computer equipment and a medium.

Background

Currently, federal learning is a learning mode that jointly utilizes different client data for joint modeling under the condition of protecting data privacy, and a model depending on federal learning can be trained in a decentralized and private mode. The federally trained model is distributed on a plurality of clients, each client has a local group of data to carry out distributed training, so that the model can benefit from each client without directly accessing the client, the training result is moved to a central server to carry out global training, or the data is shared among the clients to carry out combined training, and the trained model is finally obtained.

In each training round, the parameters of the model must be exchanged between the client and the central server, the communication overhead of sending such many parameters may exceed the power budget of the client, and the utility of federal learning in real scenes is greatly reduced in consideration of efficiency and the like. For reducing communication overhead among parameters, the data volume of parameters needs to be reduced, which leads to the problem of inaccurate model training, and for this reason, the adoption of transfer learning to compress the data volume of model parameters becomes a main method for solving the problem, however, the model is trained and verified by using data of a target scene, and generally, scenes other than a training data set cannot be generalized well, so that the difference between the parameters of transfer learning and the original model parameters is large, and the applicability of the trained model to the target use scene is poor. Therefore, how to reasonably migrate the model parameters to ensure the applicability of the trained model to the target scene on the basis of reducing the communication overhead of the model parameters becomes an urgent problem to be solved.

Disclosure of Invention

In view of this, embodiments of the present application provide a method, an apparatus, a computer device, and a medium for federated migration learning, so as to solve the problem of reasonably migrating model parameters, so as to ensure applicability of a trained model to a target scene on the basis of reducing model parameter communication overhead.

In a first aspect, an embodiment of the present application provides a federated migration learning method, which is applied to a distributed training system, where the distributed training system includes at least one client and one server, a local training model of each client includes at least two convolution layers, and an adaptation layer is disposed between two adjacent convolution layers, and the federated migration learning method includes:

for any client, fixing the initial model parameters of the corresponding local training model, and calculating the loss function of the local training model with the fixed model parameters by using the corresponding local training set;

updating local adaptive parameters of each adaptive layer through an optimizer algorithm according to the loss function, and sending the updated local adaptive parameters to the server;

the server calculates and obtains global adaptation parameters of each adaptation layer according to the obtained updated local adaptation parameters sent by all the clients, and sends the global adaptation parameters to all the clients;

for any client, updating the local adaptation parameters of the corresponding adaptation layers by using the global adaptation parameters, and obtaining updated model parameters by using the initial model parameters updated by using a preset adjustment algorithm;

and taking the updated model parameters as the initial model parameters, returning to the step of executing the initial model parameters of the fixed corresponding local training model, and calculating the loss function of the local training model with the fixed model parameters by using the corresponding local training set until convergence to obtain the trained local training model.

In one embodiment, before fixing the initial model parameters of the corresponding local training model, the method further includes:

loading a local training model at a client, and initializing model parameters of the local training model to obtain initialized model parameters of the local training model;

calculating a loss function for the local training model of the fixed model parameters using the corresponding local training set comprises:

inputting the local training set of the client into a local training model with fixed model parameters, and outputting a true value and a predicted value of each training data in the local training set after passing through each convolution layer and each adaptation layer;

and calculating to obtain a corresponding loss function according to the real values and the predicted values of all the training data and by combining a preset loss function formula.

In one embodiment, the optimizer algorithm comprises a gradient descent algorithm, updating local adaptation parameters of each adaptation layer through the optimizer algorithm according to the loss function, and sending the updated local adaptation parameters to the server comprises:

acquiring the levels and the number of layers of adaptation layers in the local training model, wherein the levels of the adaptation layers close to the model output are higher;

performing gradient descent on the loss function by using the gradient descent algorithm to obtain updating loss corresponding to the number of adaptation layers;

updating local adaptation parameters from the adaptation layer at the highest level to the adaptation layer at the lowest level in sequence according to the updating loss from large to small;

and sending the updated local adaptation parameters of each adaptation layer to the server.

In an embodiment, after the server obtains the updated local adaptation parameters sent by all the clients, the method further includes:

classifying the updated local adaptation parameters of the adaptation layers with the same hierarchy into one class to obtain a group of local adaptation parameter sets corresponding to each hierarchy;

calculating to obtain a global adaptation parameter of each adaptation layer, and sending the global adaptation parameter to all clients includes:

for any level, carrying out weighted averaging on a group of corresponding local adaptation parameter sets, and determining the average value as a global adaptation parameter of the corresponding level;

and sending the global adaptation parameters of each level to the adaptation layer of the corresponding level of each client.

In one embodiment, the updating the initial model parameters by using the preset tuning algorithm, and obtaining the updated model parameters includes:

and carrying out fine tuning processing on the initial model parameters by using a fine tuning algorithm, and determining the fine tuned model parameters as updated model parameters.

In one embodiment, the adaptation layer includes a lower projection layer, an active layer and an upper projection layer, and the data processing between any two adjacent convolution layers includes:

inputting original data into the lower projection layer for dimension reduction mapping on the lower convolution layer, and inputting the result of the dimension reduction mapping into the activation layer for excitation to obtain an excitation result;

and inputting the excitation result into the upper projection layer for ascending-dimension mapping to obtain updated data with the same dimension as the original data, and inputting the updated data into the upper convolution layer.

In a second aspect, an embodiment of the present application provides a federated migration learning apparatus, which is applied to a distributed training system, where the distributed training system includes at least one client and one server, a local training model of each client includes at least two convolution layers, and an adaptation layer is disposed between two adjacent convolution layers, and the federated migration learning method includes:

the fixed training module is used for fixing the initial model parameters of the corresponding local training model aiming at any client and calculating the loss function of the local training model with the fixed model parameters by using the corresponding local training set;

the adaptive parameter updating module is used for updating the local adaptive parameters of each adaptive layer through an optimizer algorithm according to the loss function and sending the updated local adaptive parameters to the server;

the global training module is used for the server to calculate and obtain the global adaptation parameters of each adaptation layer according to the obtained updated local adaptation parameters sent by all the clients and send the global adaptation parameters to all the clients;

the model parameter updating module is used for updating the local adaptation parameters of the corresponding adaptation layers by using the global adaptation parameters and updating the initial model parameters by using a preset adjusting algorithm aiming at any client to obtain updated model parameters;

and the return execution training module is used for taking the updated model parameters as the initial model parameters, returning to execute the initial model parameters of the corresponding fixed local training model, and calculating the loss function of the local training model with the fixed model parameters by using the corresponding local training set until convergence to obtain the trained local training model.

In one embodiment, the federal migration learning device further comprises:

the model initialization module is used for loading the local training model at the client before fixing the initial model parameters of the corresponding local training model, and initializing the model parameters of the local training model to obtain the initial model parameters of the local training model;

the stationary training module includes:

the model operation unit is used for inputting the local training set of the client into a local training model with fixed model parameters, and outputting a true value and a predicted value of each training data in the local training set after passing through each convolution layer and each adaptation layer;

and the loss calculation unit is used for calculating to obtain a corresponding loss function according to the real values and the predicted values of all the training data and by combining a preset loss function formula.

In one embodiment, the optimizer algorithm comprises a gradient descent algorithm, and the adaptation parameter update module comprises:

the adaptation layer information acquisition unit is used for acquiring the hierarchy and the number of layers of the adaptation layer in the local training model, wherein the hierarchy of the adaptation layer close to the model output is higher;

the gradient descent unit is used for performing gradient descent on the loss function by using the gradient descent algorithm to obtain updating loss corresponding to the number of adaptation layers;

the adaptive parameter updating unit is used for sequentially updating local adaptive parameters from the adaptive layer at the highest level to the adaptive layer at the lowest level according to the updating loss from large to small;

and the adaptation parameter sending unit is used for sending the updated local adaptation parameters of each adaptation layer to the server.

In one embodiment, the federal migration learning device further comprises:

the adaptation parameter classification module is used for classifying the updated local adaptation parameters of the adaptation layers with the same hierarchy into one class after the server acquires the updated local adaptation parameters sent by all the clients, so as to obtain a group of local adaptation parameter sets corresponding to each hierarchy;

the global training module comprises:

the global parameter calculation unit is used for carrying out weighting and averaging on a group of corresponding local adaptation parameter sets aiming at any hierarchy, and determining the average value as the global adaptation parameter of the corresponding hierarchy;

and the global parameter sending unit is used for sending the global adaptation parameters of each hierarchy to the adaptation layer of the corresponding hierarchy of each client.

In one embodiment, the model parameter update module comprises:

and the model parameter updating unit is used for carrying out fine adjustment processing on the initial model parameters by using a fine adjustment algorithm and determining the fine-adjusted model parameters as updated model parameters.

the adaptive layer dimension reduction processing module is used for inputting original data into the lower projection layer for dimension reduction mapping on the lower convolution layer, and inputting the result of the dimension reduction mapping into the activation layer for excitation to obtain an excitation result;

and the adaptation layer dimension-increasing processing module is used for inputting the excitation result into the upper projection layer for dimension-increasing mapping to obtain the updated data with the same dimension as the original data, and inputting the updated data into the upper convolution layer.

In a third aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor, a memory, and a computer program stored in the memory and executable on the processor, and the processor, when executing the computer program, implements the federal migration learning method according to the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium storing a computer program, which when executed by a processor, implements the federal migration learning method as defined in the first aspect.

Compared with the prior art, the embodiment of the application has the advantages that: the application is applied to a distributed training system, which comprises at least one client and a server, wherein a local training model of each client comprises at least two convolution layers, adaptation layers are arranged between two adjacent convolution layers, initial model parameters of the corresponding local training model are fixed aiming at any client, a loss function of the local training model with the fixed model parameters is calculated by using a corresponding local training set, the local adaptation parameters of each adaptation layer are updated through an optimizer algorithm according to the loss function, the updated local adaptation parameters are sent to the server, the server calculates the global adaptation parameters of each adaptation layer according to the obtained updated local adaptation parameters sent by all the clients, the global adaptation parameters are sent to all the clients, the local adaptation parameters of the corresponding adaptation layers are updated by using the global adaptation parameters aiming at any client, the method comprises the steps of obtaining updated model parameters by using initial model parameters updated by a preset adjusting algorithm, returning to execute training until convergence to obtain a trained local training model, converting the model parameters of the convolutional layer into adaptation parameters communicated with a server by adopting a mode of increasing an adaptation layer, thereby transferring and learning the model parameters to reduce communication overhead, meanwhile, fixing the model parameters in each training, only training the adaptation layer parameters, and then reversely fine-tuning the model parameters by using the global adaptation parameters, so that the difference between the model parameters and the adaptation parameters is small, and the use of the trained model on a target scene is not influenced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the embodiments or the prior art description will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings may be obtained according to these drawings without inventive labor.

Fig. 1 is a schematic diagram of an application environment of a federated migration learning method provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of a bang migration learning method according to a second embodiment of the present application;

fig. 3 is a schematic flowchart of a bang migration learning method according to a third embodiment of the present application;

fig. 4 is a schematic structural diagram of a bang migration learning device according to a fourth embodiment of the present application;

fig. 5 is a schematic structural diagram of a computer device according to a fifth embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

It should be understood that, the sequence numbers of the steps in the following embodiments do not mean the execution sequence, and the execution sequence of each process should be determined by the function and the inherent logic of the process, and should not constitute any limitation to the implementation process of the embodiments of the present application.

In order to explain the technical means of the present application, the following description will be given by way of specific examples.

The federated migratory learning method provided in the embodiment of the present application can be applied to an application environment as shown in fig. 1, where a client communicates with a server to form a distributed training system. The client includes, but is not limited to, a palm top computer, a desktop computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cloud computer device, a Personal Digital Assistant (PDA), and other computer devices. The server side can be implemented by an independent server or a server cluster formed by a plurality of servers.

Referring to fig. 2, a schematic flow chart of a federated migration learning method provided in the second embodiment of the present application is shown, where the federated migration learning method is applied to the distributed training system shown in fig. 1, the distributed training system includes at least one client and one server, a local training model of each client includes at least two convolution layers, and a server in an adaptation layer is disposed between two adjacent convolution layers. The server communicates with the clients, the clients do not communicate with each other, each client is provided with a local training model and a local training set and used for local training, and the server is provided with a global model and used for global calculation. As shown in fig. 2, the federal migration learning method may include the following steps:

step S201, for any client, fixing the initial model parameters of the corresponding local training model, and calculating the loss function of the local training model with the fixed model parameters by using the corresponding local training set.

In the application, if a server is connected with one client, the distributed training system is a system for training through one client, generally, the server is connected with more than two clients, each client has the same local training model, but the training set in each client is local, that is, the training set cannot be shared with other clients, so that the training of the local training model of the client needs to adopt the local training set.

LocalThe training model comprises at least two convolution layers, an adaptation layer is arranged between every two adjacent convolution layers, and data output by the lower convolution layer can reach the upper convolution layer after passing through the adaptation layer. In one embodiment, the local training model is a ResNet network having 5 convolutional layers, wherein the ResNet network can be expressed as a clientiThe above 5 convolutional layers are (conv 1, conv2_ x, con3_ x, conv4_ x, conv _5 x), each convolutional layer is composed of a certain number of BasicBlock, correspondingly, there are 4 adaptation layers, and the adaptation layers can be represented, wherein, the representation client represents all parameters to be trained in the modeliTo (1) ajAdaptation parameters of the layer.

The initial model parameters of the fixed local training model mean that after the local training set is input into the local training model, the initial model parameters are adopted for calculation when each convolution layer is calculated. For example, each piece of training data is encoded using initial encoding parameters at the time of encoding, and each piece of training data is decoded using initial decoding parameters at the time of decoding.

Calculating the loss function means processing the local training set according to the local training model, and then calculating the processing result by combining the loss function calculation formula to obtain the loss function, wherein the loss function at this time may be a loss value or an expression which is not finally calculated as a digital quantity.

In step S201, the local training model is not trained, but is used as an initial model to perform a calculation to obtain a loss function, and the training of the local training model is performed in combination with subsequent steps.

Optionally, before fixing the initial model parameters of the corresponding local training model, the method further includes:

inputting a local training set of a client into a local training model with fixed model parameters, and outputting a true value and a predicted value of each training data in the local training set after passing through each convolution layer and each adaptation layer;

The loading may refer to putting the model into a cache, a container, or an operator to set a process, so as to start the model, and after the model is started, the loaded model is initialized. The initialization processes used by different clients may be the same or different, so that the initial model parameters of different clients after the initialization of the local training model may be different, and even if the initialization processes used by the clients are the same, the initial model parameters after the initialization may also be different, so that the model parameters have more possibilities or randomness, which is beneficial to improving the convergence rate of the model during the training.

And inputting the local training set into a local training model, and outputting a true value and a predicted value corresponding to each training data after the operation of each rolling set layer and each adaptation layer, so as to be used for calculating a loss function by combining a preset loss function formula. The preset loss function can be a cross entropy loss function, a mean square error loss function and the like, is related to a training algorithm based on a local training model, and can be set according to actual scene requirements and the like.

Optionally, the adaptation layer includes a lower projection layer, an active layer and an upper projection layer, and the data processing between any two adjacent convolution layers includes:

inputting original data into a lower projection layer for dimension reduction mapping on the lower convolution layer, and inputting the result of the dimension reduction mapping into an activation layer for excitation to obtain an excitation result;

and inputting the excitation result into an upper projection layer for performing dimension-increasing mapping to obtain updated data with the same dimension as the original data, and inputting the updated data into an upper convolution layer.

Wherein the adaptation layer hides an input hidden vectorzAnd mapping the dimension d to the dimension m, wherein m is smaller than d, then remapping the dimension m to the dimension d, wherein m is the hidden layer size of the adaptation layer, and m, d and the like can be used as local adaptation parameters of the adaptation layer.

Using residual concatenation in adaptation layer, active layer employs active functiontanh() Thus, given an input hidden vectorzCorresponding output vectorz ₀The calculation formula of (2) is as follows:

z=f ₂(tanhf ₁ (z))+z

in the formula (I), the compound is shown in the specification,f ₁is the projective transformation function of the lower projection layer,f ₂is the projective transformation function of the upper projection layer.

And step S202, updating the local adaptive parameters of each adaptive layer through an optimizer algorithm according to the loss function, and sending the updated local adaptive parameters to a server.

In the present application, the Optimizer algorithm includes a basic Gradient Descent algorithm, a Momentum optimization algorithm and an adaptive learning rate optimization algorithm, wherein the basic Gradient Descent algorithm includes a standard Gradient Descent method (GD), a random Gradient Descent method (SGD), a Batch Gradient Descent method (BGD), and a micro-Batch Gradient Descent method (mbi-Batch Gradient Descent, MBGD), the Momentum optimization algorithm includes a standard Momentum Optimizer method (Momentum Optimizer), a newton accelerated Gradient Momentum optimization method (NAG), etc., and the adaptive learning rate optimization algorithm includes an AdaGrad algorithm, an adaelta algorithm, an RMSProp algorithm, an Adam algorithm, etc.

The loss function can be optimized by using an optimizer algorithm, and the optimized loss function can be used for replacing corresponding parameters for retraining, wherein the optimized loss function is used for replacing local adaptation parameters of adaptation layers, each adaptation layer corresponds to one optimized loss function, and the optimized loss functions of the adaptation layers are different.

And after the local adaptation parameters of the adaptation layer are replaced and updated, sending the corresponding updated local adaptation parameters to the server, wherein if the number of the adaptation layers is 4, the number of the local adaptation parameters is also 4, and each group of the local adaptation parameters needs to be sent to the server.

Optionally, the optimizer algorithm includes a gradient descent algorithm, and updating the local adaptation parameter of each adaptation layer through the optimizer algorithm according to the loss function, and sending the updated local adaptation parameter to the server includes:

acquiring the levels and the number of adaptation layers in a local training model, wherein the higher the level of the adaptation layer close to the model output is;

performing gradient descent on the loss function by using a gradient descent algorithm to obtain update loss corresponding to the number of adaptation layers;

updating the local adaptation parameters of the adaptation layer at the highest level to the adaptation layer at the lowest level in sequence from the maximum update loss to the minimum update loss;

The method includes selecting a gradient descent algorithm as a method for updating local adaptation parameters, and determining an optimization target before optimizing a loss function by using the gradient descent algorithm, so that the number of adaptation layers in a local training model and the level of each adaptation layer need to be obtained, the adaptation layer corresponding to the higher the level is specified and the output close to the local training model, that is, the levels of the adaptation layers are sequentially arranged from input to output.

The number of adaptation layers determines that several update losses are generated, for example, 4 adaptation layers, and the gradient descent algorithm is used to perform gradient descent on the loss function to obtain 4 gradient-descended loss functions.

The updating function is obtained by adopting a gradient descending mode, the updating function has a size sequence, the local adaptation parameter of the adaptation layer with the highest updating loss updating level is used, the local adaptation parameter of the adaptation layer with the lowest updating loss updating level is used, the middle updating loss and the adaptation layer need to be in one-to-one correspondence according to the level size and the updating loss size, and the higher the level is, the larger the updating loss is.

Step S203, the server calculates the global adaptation parameters of each adaptation layer according to the obtained updated local adaptation parameters sent by all the clients, and sends the global adaptation parameters to all the clients.

According to the application, the server can obtain updated local adaptation parameters sent by the client and perform global operation on the local adaptation parameters, wherein the global operation can be model training by using a global model preset in the server to obtain the global adaptation parameters, and the global operation can also be calculation by using a calculation model preset in the server without training to obtain the global adaptation parameters. It should be noted that the global adaptation parameter is to fuse local adaptation parameters sent by different clients, and the application does not limit which fusion method is used to obtain the global adaptation parameter.

Since each adaptation layer has a corresponding local adaptation parameter, at this time, the number of global adaptation parameters may be identical to the number of layers of the adaptation layer.

And S204, for any client, updating the local adaptation parameters of the corresponding adaptation layers by using the global adaptation parameters, and obtaining updated model parameters by using the initial model parameters updated by using a preset adjusting algorithm.

In the present application, the global adaptation parameters update the local adaptation parameters of the corresponding adaptation layers, and then the initial model parameters are updated through a preset adjustment algorithm, so as to obtain a model with new model parameters and adaptation parameters, and step S201 is executed for the model.

The preset adjusting algorithm is an algorithm set according to requirements and used for adjusting model parameters, such as a gradient descent algorithm, a fine adjustment algorithm and the like, and an adjusting algorithm which is beneficial to improving training efficiency and accuracy can be obtained through the preset adjusting algorithm, and further description is not provided.

Optionally, the obtaining the updated model parameter by using the initial model parameter updated by the preset adjusting algorithm includes:

and carrying out fine tuning processing on the initial model parameters by using a fine tuning algorithm, and determining the fine-tuned model parameters as updated model parameters.

The initial model parameters are adjusted by adopting a fine adjustment algorithm, and the adjusted model parameters update the initial model parameters, for example, the initial model parameters are added with a first-order differential term according to a differential mode, so that the fine adjustment of the parameters is realized.

And S205, taking the updated model parameters as initial model parameters, returning to the step of executing the initial model parameters of the corresponding fixed local training model, and calculating the loss function of the local training model with the fixed model parameters by using the corresponding local training set until convergence to obtain the trained local training model.

In the present application, for the local training model of any client, the training needs to repeatedly perform steps S201 to S204 until convergence, where the convergence may refer to convergence of a loss function, and may also refer to reaching a preset number of iterations. Through repeated adjustment, updating and calculation, the corresponding local training model is finally determined to be well trained during convergence, namely, the model parameters and the adaptive parameters can be used in an actual scene.

The embodiment of the application is applied to a distributed training system, the system comprises at least one client and a server, a local training model of each client comprises at least two convolution layers, adaptation layers are arranged between every two adjacent convolution layers, initial model parameters of the corresponding local training model are fixed for any client, a loss function of the local training model with the fixed model parameters is calculated by using the corresponding local training set, the local adaptation parameters of each adaptation layer are updated through an optimizer algorithm according to the loss function, the updated local adaptation parameters are sent to the server, the server calculates the global adaptation parameters of each adaptation layer according to the obtained updated local adaptation parameters sent by all the clients, the global adaptation parameters are sent to all the clients, the local adaptation parameters of the corresponding adaptation layers are updated by using the global adaptation parameters for any client, the method comprises the steps of obtaining updated model parameters by using initial model parameters updated by a preset adjusting algorithm, returning to execute training until convergence to obtain a trained local training model, converting the model parameters of the convolutional layer into adaptation parameters communicated with a server by adopting a mode of increasing an adaptation layer, thereby transferring and learning the model parameters to reduce communication overhead, meanwhile, fixing the model parameters in each training, only training the adaptation layer parameters, and then reversely fine-tuning the model parameters by using the global adaptation parameters, so that the difference between the model parameters and the adaptation parameters is small, and the use of the trained model on a target scene is not influenced.

Referring to fig. 3, which is a schematic flow chart of a federal migration learning method provided in the third embodiment of the present application, as shown in fig. 3, the federal migration learning method may include the following steps:

step S301, aiming at any client, fixing the initial model parameters of the corresponding local training model, and calculating the loss function of the local training model with the fixed model parameters by using the corresponding local training set.

Step S302, according to the loss function, local adaptation parameters of each adaptation layer are updated through an optimizer algorithm, and the updated local adaptation parameters are sent to a server.

The contents of the steps S301 to S302 are the same as the contents of the steps S201 to S202 in the second embodiment, and reference may be made to the description of the steps S201 and S202, which are not repeated herein.

Step S303, the server classifies the updated local adaptation parameters of adaptation layers having the same hierarchy into a class according to the obtained updated local adaptation parameters sent by all the clients, so as to obtain a group of local adaptation parameter sets corresponding to each hierarchy.

In the application, in order to facilitate the adaptation layer of each level to be provided with the corresponding global adaptation parameters, the server classifies the local adaptation parameters after obtaining the local adaptation parameters, i.e. the local adaptation parameters of the same level are classified into one class for subsequent processing.

Step S304, aiming at any hierarchy, a group of corresponding local adaptation parameter sets are weighted and averaged, and the average value is determined to be the global adaptation parameter of the corresponding hierarchy.

In the present application, the weight in the weighted averaging may be set according to actual requirements, for example, for a client, the weight may be distributed according to the importance degree, the data volume, and the like of the client, where the higher the importance degree of the client is, or the larger the data volume is, the larger the corresponding weight is.

In an embodiment, the weights of all the clients are set to be 1, so that the fairness of the distributed training can be embodied to a certain extent.

Step S305, sending the global adaptation parameter of each tier to the adaptation layer of the corresponding tier of each client.

In the present application, the number of global adaptation parameters is equal to the number of adaptation layers, so that an adaptation layer of one level corresponds to a global adaptation parameter of one level.

Step S306, aiming at any client, the global adaptation parameters are used for updating the local adaptation parameters of the corresponding adaptation layers, and the initial model parameters updated by the preset adjusting algorithm are used for obtaining the updated model parameters.

And S307, taking the updated model parameters as initial model parameters, returning to the step of executing the initial model parameters of the corresponding fixed local training model, and calculating the loss function of the local training model with the fixed model parameters by using the corresponding local training set until convergence to obtain the trained local training model.

The contents of the steps S306 to S307 are the same as the contents of the steps S204 to S205 in the second embodiment, and reference may be made to the description of the steps S204 and S205, which are not repeated herein.

The embodiment of the application is applied to a distributed training system, the system comprises at least one client and a server, a local training model of each client comprises at least two convolution layers, adaptation layers are arranged between every two adjacent convolution layers, model parameters of the local training model are fixed, a loss function is calculated by using a local training set, the local adaptation parameters of the adaptation layers are updated by using the loss function through an optimizer algorithm, the updated local adaptation parameters are sent to the server, the updated local adaptation parameters of the adaptation layers with the same level are classified into one class, a group of local adaptation parameter sets corresponding to each level are obtained, for any level, the corresponding group of local adaptation parameter sets are weighted and averaged, the average value is determined to be the global adaptation parameters of the corresponding level, the global adaptation parameters of each level are sent to the adaptation layer of the corresponding level of each client, to update the local adaptation parameters of the corresponding adaptation layer, to use the model parameters updated by the preset adjustment algorithm, to return to execute the training until convergence without training the global model in the server, and the global analysis is carried out by adopting a weighting fusion mode, so that the clients can learn corresponding contents, the working time of the server is shortened, the model training efficiency is improved, in addition, the essence of the method is still to adopt a mode of adding an adaptation layer to convert the model parameters of the convolution layer into adaptation parameters communicated with the server, thereby transferring and learning the model parameters to reduce the communication overhead, simultaneously, fixing the model parameters during each training, only training the adaptation layer parameters, then reversely fine-tuning the model parameters by using the global adaptation parameters, therefore, the difference between the model parameters and the adaptive parameters is small, and the use of the trained model on a target scene is not influenced.

Corresponding to the federal migration learning method in the foregoing embodiment, fig. 4 shows a structural block diagram of a federal migration learning apparatus provided in the fourth embodiment of the present application, where the apparatus is applied to the distributed training system shown in fig. 1, the distributed training system includes at least one client and one server, a local training model of each client includes at least two convolution layers, and a server in an adaptation layer is disposed between two adjacent convolution layers. The server communicates with the clients, the clients do not communicate with each other, each client is provided with a local training model and a local training set and used for local training, and the server is provided with a global model and used for global calculation. For convenience of explanation, only portions related to the embodiments of the present application are shown.

Referring to fig. 4, the federal migration learning apparatus includes:

a fixed training module 41, configured to fix an initial model parameter of a corresponding local training model for any client, and calculate a loss function of the local training model with the fixed model parameter by using a corresponding local training set;

an adaptation parameter updating module 42, configured to update the local adaptation parameter of each adaptation layer through an optimizer algorithm according to the loss function, and send the updated local adaptation parameter to the server;

the global training module 43 is configured to calculate, by the server, a global adaptation parameter of each adaptation layer according to the obtained updated local adaptation parameters sent by all the clients, and send the global adaptation parameters to all the clients;

the model parameter updating module 44 is configured to update, for any client, the local adaptation parameters of the corresponding adaptation layer using the global adaptation parameters, and obtain updated model parameters using the initial model parameters updated by the preset adjustment algorithm;

and a return execution training module 45, configured to use the updated model parameters as initial model parameters, return to execute the initial model parameters of the corresponding fixed local training model, and calculate the loss function of the local training model with the fixed model parameters by using the corresponding local training set until convergence, so as to obtain a trained local training model.

Optionally, the federal migration learning apparatus further includes:

the model initialization module is used for loading the local training model at the client before fixing the initial model parameters of the corresponding local training model, and initializing the model parameters of the local training model to obtain the initialized model parameters of the local training model;

the fixed training module 41 includes:

the model operation unit is used for inputting the local training set of the client into the local training model with fixed model parameters, and outputting the true value and the predicted value of each training data in the local training set after passing through each convolution layer and each adaptation layer;

Optionally, the optimizer algorithm includes a gradient descent algorithm, and the adaptive parameter updating module 42 includes:

the adaptive layer information acquisition unit is used for acquiring the levels and the number of layers of adaptive layers in the local training model, wherein the higher the level of the adaptive layer close to the model output is;

the gradient descending unit is used for carrying out gradient descending on the loss function by using a gradient descending algorithm to obtain updating loss corresponding to the number of the adaptation layers;

Optionally, the federal migration learning apparatus further includes:

the global training module 43 includes:

the global parameter calculation unit is used for carrying out weighted averaging on a group of corresponding local adaptation parameter sets aiming at any hierarchy, and determining the average value as the global adaptation parameter of the corresponding hierarchy;

Optionally, the model parameter updating module 44 includes:

Optionally, the adaptation layer includes a lower projection layer, an active layer, and an upper projection layer, and the data processing between any two adjacent convolution layers includes:

the adaptive layer dimension reduction processing module is used for inputting original data into a lower projection layer for dimension reduction mapping in the lower convolution layer and inputting the result of the dimension reduction mapping into an activation layer for excitation to obtain an excitation result;

and the adaptation layer dimension-increasing processing module is used for inputting the excitation result into the upper projection layer for dimension-increasing mapping to obtain updated data with the same dimension as the original data, and inputting the updated data into the upper convolution layer.

It should be noted that, because the contents of information interaction, execution process, and the like between the modules are based on the same concept as that of the embodiment of the method of the present application, specific functions and technical effects thereof may be specifically referred to a part of the embodiment of the method, and details are not described here.

Fig. 5 is a schematic structural diagram of a computer device according to a fifth embodiment of the present application. As shown in fig. 5, the computer apparatus of this embodiment includes: at least one processor (only one shown in fig. 5), a memory, and a computer program stored in the memory and executable on the at least one processor, the processor when executing the computer program implementing the steps in any of the various federal migration learning method embodiments described above.

The computer device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that fig. 5 is merely an example of a computer device and is not intended to be limiting, and that a computer device may include more or fewer components than those shown, or some components may be combined, or different components may be included, such as a network interface, a display screen, and input devices, etc.

The Processor may be a CPU, and the Processor may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory includes readable storage media, internal memory, etc., wherein the internal memory may be the internal memory of the computer device, and the internal memory provides an environment for the operating system and the execution of the computer-readable instructions in the readable storage media. The readable storage medium may be a hard disk of the computer device, and in other embodiments may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the computer device. Further, the memory may also include both internal storage units and external storage devices of the computer device. The memory is used for storing an operating system, application programs, a BootLoader (BootLoader), data, and other programs, such as program codes of a computer program, and the like. The memory may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the above-mentioned apparatus may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the method of the embodiments described above may be implemented by instructing relevant hardware by a computer program, and the computer program may be stored in a computer readable storage medium, and when executed by a processor, the computer program may implement the steps of the embodiments of the method described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code, recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier signals, telecommunications signals, and software distribution media. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

When the computer program product runs on a computer device, the computer device is enabled to implement the steps in the method embodiments.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/computer device and method may be implemented in other ways. For example, the above-described apparatus/computer device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present application, and they should be construed as being included in the present application.

Claims

1. The federated migration learning method is characterized by being applied to a distributed training system, wherein the distributed training system comprises at least one client and a server, a local training model of each client comprises at least two convolution layers, and an adaptation layer is arranged between two adjacent convolution layers, and the federated migration learning method comprises the following steps:

2. The federal migration learning method as claimed in claim 1, further comprising, before fixing the initial model parameters of the corresponding local training model:

3. The federal migration learning method as claimed in claim 1, wherein the optimizer algorithm comprises a gradient descent algorithm, the local adaptation parameters of each adaptation layer are updated by the optimizer algorithm according to the loss function, and the sending of the updated local adaptation parameters to the server comprises:

4. The federal migration learning method as claimed in claim 1, wherein the server, according to the obtained updated local adaptation parameters sent by all clients, further comprises:

5. The federal shift learning method as claimed in claim 1, wherein the obtaining updated model parameters using the initial model parameters updated by the preset adjustment algorithm comprises:

6. The federal migration learning method as claimed in any one of claims 1 to 5, wherein the adaptation layer comprises a lower projection layer, an activation layer and an upper projection layer, and the data processing between any two adjacent convolution layers comprises:

7. The utility model provides a federated migratory learning device which characterized in that is applied to distributed training system, distributed training system includes at least one client and a server, and the local training model of every client includes two at least convolution layers, and is provided with the adaptation layer between two adjacent convolution layers, the federated migratory learning method includes:

8. The federal migration learning device of claim 7, further comprising:

the stationary training module includes:

9. A computer device comprising a processor, a memory, and a computer program stored in the memory and executable on the processor, the processor implementing the federal migration learning method as claimed in any of claims 1 to 6 when executed.

10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the federal migration learning method as claimed in any of claims 1 to 6.