CN112348199B

CN112348199B - Model training method based on federal learning and multi-task learning

Info

Publication number: CN112348199B
Application number: CN202011194414.4A
Authority: CN
Inventors: 谢在鹏; 陈瑞锋; 叶保留; 朱晓瑞; 屈志昊; 徐媛媛
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2022-08-30
Anticipated expiration: 2040-10-30
Also published as: CN112348199A

Abstract

The invention relates to a model training method based on federal learning and multitask learning, which comprises the steps of dividing all full connection layers in a target neural network into a rear model of the target neural network and dividing the rest of the full connection layers into a front model of the target neural network; the parameter server is responsible for updating the front model of each target neural network, and each working node terminal in the switching network is responsible for the rear model of each target neural network, so that a federal learning framework is applied to train the target neural network, and a common or related network layer is trained for several tasks at the same time, so that the training accuracy is mutually promoted by a plurality of tasks, the convergence rate and the generalization capability of the models are improved, the privacy and the safety of basic data can be ensured while the high-efficiency training of the target neural network is obtained.

Description

Model training method based on federal learning and multi-task learning

Technical Field

The invention relates to a model training method based on federal learning and multi-task learning, and belongs to the technical field of data processing.

Background

In the field of machine learning, data collection and processing are a great difficulty, and as the relationship between mobile equipment and people becomes more and more compact, a large amount of valuable and private data exists in the mobile equipment. The traditional data processing mode is that a service provider collects data of a user to a centralized server and then performs cleaning processing through the server, but with the continuous improvement of related laws, the method may have legal risks.

In order to effectively and safely utilize the data of the user, a federal learning method is provided, and the federal learning model can enable the user to effectively utilize the data of the user for training only by uploading the gradient after the local training of the user on the premise that the user does not need to upload the data of the user, so that a unified model is trained together, and the privacy safety of the user is protected to a certain extent. For example: the framework can be used for solving the problem that an android mobile phone terminal user updates a model locally, and the design goal is to carry out efficient machine learning among multiple parties or multiple computing nodes on the premise of guaranteeing information safety during big data exchange, protecting terminal data and personal data privacy and guaranteeing legal compliance.

However, due to the current situation of unbalanced bandwidth, the existing federal learning needs to compromise, the framework is composed of a parameter server and a plurality of working nodes, synchronous communication is performed, synchronous updating iteration is performed only through synchronous parameter averaging for a plurality of times every day, updating frequency is low, and under the existing federal learning condition, the framework has the problems of slow nodes and the like, the bottleneck of the whole system training efficiency is caused, and partial communication resources are wasted without limiting the bandwidth of the nodes. This in turn leads to more training nodes to fall behind, the second is that traditional federal learning trains a model to fit the data of all nodes, which is not the optimal solution for a single user. And thirdly, the accurate and high-generalization-performance model cannot be obtained due to the fact that the data volume of some tasks is small, and the traditional federal learning is not utilized to a certain extent aiming at multi-task learning.

Disclosure of Invention

The invention aims to solve the technical problem of providing a model training method based on federal learning and multi-task learning, which is used for carrying out forward and backward division on a target neural network, applying a federal learning framework to obtain efficient model training and simultaneously ensuring the privacy and the safety of basic data.

The invention adopts the following technical scheme for solving the technical problems: the invention designs a model training method based on federal learning and multi-task learning, which is used for synchronously realizing parametric training for at least one target neural network, and all target neural networks have full connection layers with the same structure; based on the parameter server and each working node terminal, synchronously realizing the parameterized training of each target neural network according to the following steps A to C;

step A, aiming at each target neural network, dividing each fully connected layer into a rear model of the target neural network and dividing the rest into a front model of the target neural network, and then entering step B;

b, the parameter server constructs a switching network consisting of all working node terminals meeting the preset parameter requirements according to the parameter attributes of all the working node terminals, the parameter server is responsible for the front model of each target neural network, all the working node terminals in the switching network are jointly responsible for the rear model of each target neural network, and then the step C is carried out;

and C, carrying out parametric training on each target neural network by the parameter server and each working node terminal in the switching network according to sample training data respectively corresponding to each target model by applying a multi-task learning mode to obtain each trained target neural network.

As a preferred technical scheme of the invention: in the step B, the parameter server implements the construction of the switching network, the responsibility of the parameter server to the front model of each target neural network, and the responsibility of each working node terminal in the switching network to the rear model of each target neural network by executing a parameter averaging process and a network list management process.

As a preferred technical scheme of the invention: the parameter server executes a parameter averaging process according to the following steps I1 to I10;

i1. the parameter server receives the parameter list from each working node terminal, the parameter list includes the working node terminal

And calculating their strength

Electric quantity

Bandwidth of

Simultaneously starting a receiving and joining exchange network application monitoring thread and a request for a exchange network list monitoring thread, and then entering step I2;

step I2, the parameter server selects a preset number n of working node terminals by a probability weighting method, sends confirmation information to each working node terminal, and then enters step I3;

step I3. the parameter server receives confirmation information from each working node terminal interacted with in step I2, wherein, if overtime happens

The parameter server retransmits the confirmation information; thus, a switching network composed of all the working node terminals meeting the preset parameter requirements is constructed, and then the step I4 is carried out;

i4., the parameter server distributes the parameters of the back model of each target neural network to each working node terminal in the switching network, and then the step I5 is carried out;

i5., the parameter server starts monitoring the front model parameters in each target neural network received by each working node terminal in the switching network, and then the step I6 is carried out;

step I6. the parameter server initializes the receiving list to empty, and then proceeds to step I7;

i7., the parameter server receives the parameters of the front model in each target neural network sent by each working node terminal in the switching network, records the received working node terminals by using a receiving list, and then enters the step I8;

step I8. the parameter server determines whether the number of received work node terminals is greater than or equal to

If yes, go to step I10; otherwise, entering the step I9;

step I9. is to judge whether the parameter server receives the parameters of the front model in each target neural network sent by each working node terminalWhether it is too long or not

If yes, returning to the step I2; otherwise, returning to the step I7;

and step I10, the parameter server calculates the front model parameters by using averaging aiming at the front model parameters in each target neural network sent by each working node terminal in the received switching network, and distributes the averaged front model parameters to each working node terminal in the switching network.

As a preferred technical scheme of the invention: the parameter server performs a network list management process as follows from step II1 to step II 7;

step II1. parameter server initialization node exchange dictionary

And proceeds to step II 2;

step II2, the parameter server starts a monitoring thread applying for joining the switching network, and then the step II3 is carried out;

step II3, the parameter server judges whether a message for applying for joining the switching network is received, if yes, the step II4 is carried out; otherwise, continuing to execute step II 3;

step II4. the parameter server adds the application to the IP address of the working node terminal corresponding to the exchange network message respectively aiming at each received application to the exchange network message

Joining a node exchange dictionary as a key

And initializing the value corresponding to the key to be

Then step II5 is entered;

step II5, the parameter server exchanges the node dictionary

Returning to each working node terminal corresponding to each received application joining switching network message, determining the number n of the working node terminals corresponding to each received application joining switching network message, and then entering step II 6;

step II6. parameter server exchanges dictionary for node

Respectively performing self-subtraction (1/n) to update, and then entering step II 7;

step II7, the parameter server deletes the node exchange dictionary

The key-value with the middle value of 0, and then returns to step II3.

As a preferred technical scheme of the invention: in the execution process from step B to step C, each working node terminal in the switching network executes a process of sending the model parameters to other working node terminals and a process of receiving the model parameters sent by other working node terminals, respectively.

As a preferred technical scheme of the invention: each working node terminal in the switching network executes the process of sending the model parameters to other working node terminals according to the following steps III1 to III 23;

step III1, the working node terminal sends the calculation power to the parameter server

Electric quantity

Bandwidth of

Then go to step III 2;

step III2, the working node terminal polls and waits for confirmation information from the parameter server, and then the step III3 is carried out;

step III3, the working node terminal initializes the corresponding rejection node dictionary

Empty, then proceed to step III 4;

step III4, the working node terminal starts monitoring on the parameter server side, receives the parameters of the front models of the target neural networks from the parameter server, and then enters step III 5;

step III5, the working node terminal receives the parameters of the front models of the target neural networks sent by the parameter server, updates the parameters of the front models of the target neural networks in the working node terminal, and then enters step III 6;

step III6, the working node terminal trains preset Cn wheels by applying self data to each received target neural network, and then the step III7 is carried out;

step III7, the working node terminal sends the parameters of the front model of each target neural network to a parameter server, and then the step III8 is carried out;

step III8, the working node terminal judges whether the electric quantity and the communication resource are sufficient, namely that

Whether or not greater than

And is

Whether or not greater than

If yes, go to step III 9; otherwise, returning to the step III 17;

step III9, the working node terminal sends an application for joining the switching network to the parameter server, and then the step III10 is carried out;

step III10, the working node terminal sends an application for requesting to join the network list to the parameter server, and then the step III11 is carried out;

step III11, the working node terminal judges whether a network list sent by the parameter server is received, if so, the working node terminal starts a process of receiving model parameters sent by other working node terminals and starts a process of refusing to receive the parameters sent by other working node terminals, and the step III12 is carried out; otherwise, returning to the step III 10;

step III12, the working node terminal randomly selects one other working node terminal from the received network list, and the step III13 is carried out;

step III13. the working node terminal judges whether other selected working node terminals exist in the rejection node dictionary

If yes, the step III12 is returned; otherwise, go to step III 14;

step III14, the working node terminal sends the received parameters of the rear model in each target neural network to other selected working node terminals, and then the step III15 is carried out;

step III15, the working node terminal judges whether the refusing message from the other selected working node terminal is received, if yes, the IP address of the other selected working node terminal is used

Adding a refusing node dictionary corresponding to a working node terminal

And defining a reject node dictionary

In the middle of

The corresponding value is the rejection number

Then go to step III 16; otherwise go directly to step III 16;

step III16, the working node terminal judges whether the number of other working node terminals to which the data sent by the working node terminal reaches is larger than a preset number threshold value

If yes, go to step III 17; otherwise, returning to the step III 12;

step III17, rejecting node dictionary corresponding to working node terminal

Each value of is updated from minus 1, and then the step III18 is carried out;

step III18. deleting refusing node dictionary for working node terminal

Middle value equals 0 key-value, then proceed to step III 19;

step III19, performing Cm training by using the data of the working node terminal, and then entering the step III 20;

step III20. the working node terminal tests the accuracy of each target neural network received by the working node terminal

Loss of power

Then go to step III 21;

step III21, the working node terminal closes all monitoring and then enters step III 22;

step III22, the working node terminal judges the accuracy of each target neural network received by the working node terminal

Whether the value is greater than a preset accuracy threshold value

While at the same time, with loss

Less than a predetermined loss threshold

If so, the working node terminal completes the training of the rear model of each target neural network received by the working node terminal; otherwise go to step III 23;

step III23, the working node terminal judges whether the received update of the model parameter of the front part of each target neural network from the parameter server terminal exists, if yes, the working node terminal returns to the step III 4; otherwise, the step III8 is returned.

As a preferred technical scheme of the invention: each working node terminal in the switching network executes a process of receiving model parameters sent by other working node terminals according to the following steps IV 1-IV 10;

step IV1, the working node terminal starts to receive the rear model parameters of each target neural network sent by other working node terminals, and the step IV2 is carried out;

step IV2, the working node terminal judges whether the working node terminal receives the rear model parameters of each target neural network sent by other working node terminals, if yes, the step IV3 is carried out; otherwise, returning to the step IV 1;

and IV3. the working node terminal evaluates the model parameters of the rear part of each target neural network sent by other working node terminals and calculates the acceptance value of the model parameters of the rear part of each target neural network by using the enhanced learning DQN

Then go to step IV 4;

step IV4. the working node terminal judges the acceptance value of the rear model parameters of each target neural network

Whether the value is larger than the preset acceptance value lower limit threshold value

If yes, receiving the rear model parameters of each target neural network, and then entering the step IV 5; otherwise, sending a rejection notice to other working node terminals which send the model parameters of the rear part of each target neural network, and entering a step IV 5;

and step IV5, the terminal of the working node updates each target neural network received by the terminal of the working node and returns to the step IV1.

As a preferred technical scheme of the invention: the target neural network is a neural network for prediction, or a neural network for classification.

Compared with the prior art, the model training method based on the federal learning and the multi-task learning has the following technical effects:

(1) the invention designs a model training method based on federal learning and multi-task learning, which divides all full connection layers in a target neural network into a rear model of the target neural network and divides the rest of the full connection layers into a front model of the target neural network; the parameter server is responsible for the front model of each target neural network, and each working node terminal in the switching network is responsible for the rear model of each target neural network, so that a federal learning framework is applied to train the target neural network, and a common or related network layer is trained for several tasks at the same time, so that the training accuracy is mutually promoted by a plurality of tasks, the convergence rate and the generalization capability of the models are improved, and the privacy and the safety of basic data can be ensured while the high-efficiency training of the target neural network is obtained.

(2) In the model training method based on the federal learning and the multi-task learning, all tasks are enabled to train the same front part of the model together, and the tasks with less training data can achieve better effect.

Drawings

FIG. 1 is a schematic diagram of a hardware, network architecture environment in an application designed by the present invention;

FIG. 2 is a flow chart of a server according to an embodiment of the present invention;

FIG. 3 is a flow chart of a server accepting to join a switched network for snooping according to an embodiment of the present invention;

FIG. 4 is a client flow diagram of an embodiment of the present invention;

FIG. 5 is a flow chart of an embodiment of the present invention for a client to accept other work node parameters;

FIG. 6 is a schematic overall flow chart of an embodiment of the present invention.

Detailed Description

The following description will explain embodiments of the present invention in further detail with reference to the accompanying drawings.

The invention designs a model training method based on federal learning and multi-task learning, which is used for synchronously realizing parametric training for at least one target neural network, and all target neural networks have full connection layers with the same structure; in practical applications, the target neural network is a neural network for prediction or a neural network for classification.

As shown in fig. 1, a parameter server and each working node terminal assume that there are n working node terminals in a system, m different tasks (n > m) are run on the whole network, and the working node terminals do not have global node information and do not know that the same tasks are the same as those of the working node terminals, the parameter server does not have authority to view training data of the working node terminals, and the data volumes stored in the different working node terminals are not necessarily equal. Certain correlation exists among tasks, and the overall efficiency is higher or a better convergence effect can be achieved through a multi-task learning framework than that of a single training task.

In actual implementation, as shown in fig. 6, the parameterized training of each target neural network is synchronously implemented according to the following steps a to C.

And step A, aiming at each target neural network, dividing each fully connected layer into a rear model of the target neural network, dividing the rest into a front model of the target neural network, and then entering the step B.

And step B, the parameter server constructs a switching network consisting of all working node terminals meeting the requirement of preset parameters according to the parameter attributes of all working node terminals, the parameter server is responsible for the front model of all target neural networks, all working node terminals in the switching network are jointly responsible for the rear model of all target neural networks, and then the step C is carried out.

In the invention, in the process of dividing the front model and the rear model of the target neural network, the parameter server is only responsible for the front model, the front model is hard parameter sharing, namely the front model of the target neural network is hard parameter sharing in multitask learning, the parameter server maintains a working node list added in exchange, nodes with the same training purpose and sufficient communication resources and electric quantity resources adaptively form a switching network, and the nodes are connected or not and are disconnected by judging whether the link generates gain or not through soft parameters sharing the rear model of the target neural network, namely all parameters of the rear model.

The advantages of this are mainly the following four aspects: firstly, the parameter server is only responsible for a front model of the synchronous target neural network, so that the user privacy can be protected to a certain degree, secondly, the architecture can simultaneously train a plurality of different image classification tasks, so that part of tasks with a small amount of data can achieve a higher effect, and nodes with sufficient communication resources are added with asynchronous soft parameter sharing to improve the convergence speed of the whole target neural network. In addition, the communication node directory is maintained through the parameter server, so that a large amount of communication loads generated when the nodes broadcast and search other nodes can be reduced, and certain safety control can be performed on the network through the parameter server.

Different from the traditional multitask learning, the method carries out parameter sharing in different modes on all parameters of the target neural network, wherein the front model realizes hard parameter sharing of low-frequency synchronous communication through a parameter server, the rear model realizes soft parameter sharing of high-frequency asynchronous communication, two parameter sharing schemes are respectively used for extracting more extensive basic characteristics in consideration of the first layers of networks, and soft parameter sharing is used later, so that the target neural network is more personalized, and the target neural network is ensured to be more suitable for user requirements.

In the practical application of the step B, the parameter server implements the construction of the switching network, the responsibility of the parameter server to the front model of each target neural network, and the responsibility of each working node terminal in the switching network to the rear model of each target neural network by executing a parameter averaging process and a network list management process, and the two processes are independent and do not interfere with each other.

Specifically, as shown in fig. 2, the parameter server executes the parameter averaging process according to the following steps I1 to I10.

I1. the parameter server receives the parameter list sent by each working node terminal, the parameter list includes the working node terminal

And the calculated power thereof

Electric quantity

Bandwidth of

Meanwhile, a receiving joining exchange network application listening thread and a request exchange network list listening thread are started, and then the step I2 is carried out.

And step I2, the parameter server selects a preset number n of working node terminals through a weighted probability method, sends confirmation information to each working node terminal, and then enters step I3.

Step I3. the parameter server receives the various tools interacted with in step I2Making confirmation information of node terminal, wherein, if overtime

The parameter server retransmits the confirmation information; thereby constructing a switching network consisting of the respective working node terminals satisfying the preset parameter requirements, and then proceeding to step I4.

Step I4. the parameter server distributes the parameters of the back model of each target neural network to each working node terminal in the switching network, and then proceeds to step I5.

Step I5. the parameter server initiates listening for the front model parameters in each target neural network received by each worker node terminal in the switching network and then proceeds to step I6.

In practical application, the working node terminal needs to use two parts of the front and back models of the target neural network at the same time, but only the front model of the target neural network receives model averaging update from the parameter server.

Step I6. the parameter server initializes the received list to an empty list and then proceeds to step I7.

Step I7. the parameter server receives the parameters of the front model in each target neural network sent by each working node terminal in the switching network, and records the received each working node terminal by using the receiving list, and then proceeds to step I8.

Step I8. the parameter server judges whether the number of the received working node terminals is larger than the number of the received working node terminals

If yes, go to step I10; otherwise, go to step I9.

Step I9. is to determine whether the time length of the parameter server receiving the front model parameters in each target neural network sent by each working node terminal is overtime

If yes, returning to the step I2; otherwise, the procedure returns to the step I7.

And as shown in fig. 3, the parameter server performs a network list management process as follows in step II1 through step II7.

Step II1. parameter server initialization node exchange dictionary

And proceeds to step II2.

And step II2, the parameter server starts a monitoring thread applying for joining the switching network, and then the step II3 is carried out.

Step II3, the parameter server judges whether a message for applying for joining the switching network is received, if yes, the step II4 is carried out; otherwise, step II3 is continued.

Joining a node exchange dictionary as a key

And initializing the value corresponding to the key to be

For indexing the corresponding node, and then proceeds to step II5.

The dictionary is a key-value pair data structure, namely key-value, the value can be found through the key, and the value needs to be stored in pairs when being stored.

Step II5, the parameter server exchanges the node dictionary

Returning to its respective received application join exchangeAnd each working node terminal corresponding to the network message, determining the number n of the received working node terminals corresponding to each application joining the switching network message, and then entering step II6.

Step II6. parameter server exchanges dictionary for node

Performs self-subtraction (1/n) updating respectively, and then proceeds to step II7.

Step II7, the parameter server deletes the node exchange dictionary

The key-value with the middle value of 0, and then returns to step II3.

And C, performing parameterized training on each target neural network by the parameter server and each working node terminal in the switching network according to sample training data respectively corresponding to each target model by applying a multitask learning mode to obtain each trained target neural network.

In the actual execution process of steps B to C, two processes are mainly run on each working node terminal, which are respectively: executing a process of sending the model parameters to other working node terminals and a process of receiving the model parameters sent by other working node terminals, wherein the function of the process of sending the model parameters to other working node terminals is to avoid the working node terminals in the rejection list, send the parameters to other working node terminals, and receive rejection messages possibly sent by other working node terminals, as shown in fig. 4; the function of the process for receiving the model parameters sent by other working node terminals is to receive the parameters sent by other working node terminals, determine whether to accept the parameters, and if the received gradient is useless for the local task, send a rejection message to the working node terminal to prevent the working node terminal from sending the parameters to the working node terminal again, as shown in fig. 5. Seen from a single working node terminal, the two working node terminals are mutually independent, one of the two working node terminals is sent to the other person, and the other working node terminal is used for receiving data of the other person.

As shown in fig. 4, each worker node terminal in the switching network executes a process of sending model parameters to other worker node terminals in steps III1 to III23, respectively, as follows.

Electric quantity

Bandwidth of

Step III2 is then entered.

Step III2. the working node terminal polls for confirmation information from the parameter server and then proceeds to step III3.

Empty, then proceed to step III4.

And step III4, the working node terminal starts monitoring on the parameter server side, receives the parameters of the front model of each target neural network from the parameter server, and then enters step III5.

And step III5, the working node terminal receives the parameters of the front models of the target neural networks sent by the parameter server, updates the parameters of the front models of the target neural networks in the working node terminal, and then enters step III6.

And step III6, the working node terminal trains preset Cn rounds aiming at the received target neural networks by applying the data of the working node terminal, and then the step III7 is carried out.

The data of the device is private data held by the device under federal learning, and if the application scene is picture classification, the sample is correspondingly a picture in the working node terminal.

And step III7, the working node terminal sends the parameters of the front model of each target neural network to the parameter server, and then the step III8 is carried out.

Step III8, the working node terminal judges whether the self electric quantity and the communication resource are sufficient, namely that

Whether or not greater than

And is

Whether or not greater than

If yes, go to step III 9; otherwise, the step III17 is returned.

And step III9, the working node terminal sends a request for joining the switching network to the parameter server, and then the step III10 is carried out.

And III10, the working node terminal sends an application for requesting to join the network list to the parameter server, and then the step III11 is carried out.

In practical application, because the parameter server does not know the state of each working node terminal, if the working node terminal state is better, the working node terminal needs to find the parameter server to apply for joining the network list.

Step III11, the working node terminal judges whether a network list sent by the parameter server is received, if so, the working node terminal starts a process of receiving model parameters sent by other working node terminals and starts a process of refusing to receive the parameters sent by other working node terminals, and the step III12 is carried out; otherwise, the step III10 is returned.

And step III12, the working node terminal randomly selects one other working node terminal from the received network list and enters step III13.

If yes, the step III12 is returned; otherwise go to step III14.

And step III14, the working node terminal sends the parameters of the posterior model in each target neural network received by the working node terminal to other selected working node terminals, and then the working node terminal enters the step III15.

Adding a refusing node dictionary corresponding to a working node terminal

And defining a reject node dictionary

In the middle of

The corresponding value is the rejection number

Then go to step III 16; otherwise proceed directly to step III16.

If yes, go to step III 17; otherwise, the step III12 is returned.

Step III17, rejecting node dictionary corresponding to working node terminal

Respectively, from minus 1, and then proceeds to step III18.

Step III18, deleting the refused node word by the working node terminalDian (Chinese character)

Medium value equals key-value of 0, then step III19.

And step III19, performing Cm training by using the data of the terminal of the working node, and then entering step III20.

Loss of power

Then proceed to step III21.

And step III21, the working node terminal closes all monitoring and then enters step III22.

Whether the value is greater than a preset accuracy threshold value

While at the same time, and loss

Less than a predetermined loss threshold

If so, the working node terminal completes the training of the rear model of each target neural network received by the working node terminal; otherwise proceed to step III23.

Step III23, the working node terminal judges whether the received update of the model parameters of the front part of each target neural network from the parameter server terminal exists, if yes, the working node terminal returns to the step III 4; otherwise, the step III8 is returned.

Further, as shown in fig. 5, each of the working node terminals in the switching network executes a process of receiving the model parameters transmitted from the other working node terminals in accordance with the following steps IV1 to IV10, respectively.

And step IV1, the working node terminals start to receive the rear model parameters of each target neural network sent by other working node terminals, and the step IV2 is carried out.

Step IV2, the working node terminal judges whether the working node terminal receives the rear model parameters of each target neural network sent by other working node terminals, if yes, the step IV3 is carried out; otherwise, the step IV1 is returned.

And step IV3, the working node terminal evaluates the model parameters of the rear part of each target neural network, which are sent by other working node terminals, and calculates the acceptance value of the model parameters of the rear part of each target neural network by using the enhanced learning DQN

Step IV4 is then entered.

If yes, receiving the rear model parameters of each target neural network, and then entering the step IV 5; otherwise, a rejection notice is sent to other working node terminals sending the model parameters of the target neural networks, and the process goes to step IV5.

The invention designs a model training method based on federal learning and multi-task learning, which divides all full connection layers in a target neural network into a rear model of the target neural network and divides the rest of the full connection layers into a front model of the target neural network; the parameter server is responsible for the front model of each target neural network, and each working node terminal in the switching network is responsible for the rear model of each target neural network, so that a federal learning framework is applied to train the target neural network, and a common or related network layer is trained for several tasks at the same time, so that the training accuracy is mutually promoted by a plurality of tasks, the convergence rate and the generalization capability of the models are improved, and the privacy and the safety of basic data can be ensured while the high-efficiency training of the target neural network is obtained.

A plurality of tasks share a structure, and parameters in the structure are shared by all the tasks during optimization. The parameter sharing of the model is mainly divided into hard parameter sharing and soft parameter sharing, wherein the hard parameter sharing means that the sharing layer enforces the same parameter to be used, and the soft parameter sharing means that the sharing layer tends to be the same through a certain constraint but does not impose the constraint.

In the application, all tasks are enabled to train the same front part of the model together, so that the tasks with less training data can achieve better effect, the target neural network model is split, updated and transmitted, the privacy protection capability of a user is improved compared with the traditional federal learning, the rear part model in the target neural network model is updated in a self-adaptive mode according to the relevance of the tasks, in order to ensure the individuation of the model, hard parameter sharing is not used, soft parameter sharing is adopted, the model is more suitable for the data of the user, the whole model has better individuation, and the model is added into asynchronous model rear half section parameter sharing, so that bandwidth-rich nodes are utilized more effectively.

The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims

1. A model training method based on federal learning and multitask learning is used for synchronously realizing parameterized training aiming at least one target neural network, and all target neural networks have full connection layers with the same structure; the method is characterized in that: based on the parameter server and each working node terminal, synchronously realizing the parametric training of each target neural network according to the following steps A to C;

step A, aiming at each target neural network, dividing each full connection layer into a rear model of the target neural network and dividing the rest of the full connection layer into a front model of the target neural network, and then entering step B;

step C, carrying out parametric training on each target neural network by the parameter server and each working node terminal in the switching network according to sample training data respectively corresponding to each target model by applying a multi-task learning mode to obtain each trained target neural network;

in the execution process from step B to step C, each working node terminal in the switching network respectively executes the process of sending the model parameters to other working node terminals and the process of receiving the model parameters sent by other working node terminals; wherein, each working node terminal in the switching network executes the process of sending the model parameters to other working node terminals according to the following steps III1 to III 23;

step III1, the working node terminal sends self calculation force to the parameter server

Electric quantity

Bandwidth of

Then go to step III 2;

Empty, then proceed to step III 4;

step III4, the working node terminal starts monitoring of the working node terminal on the parameter server side, receives parameters of front models of the target neural networks from the parameter server, and then enters step III 5;

step III6, the working node terminal trains preset Cn wheels by applying self data aiming at each received target neural network, and then the step III7 is carried out;

Whether or not greater than

And is

Whether or not greater than

If yes, go to step III 9; otherwise, returning to the step III 17;

step III11, the working node terminal judges whether a network list sent by the parameter server is received, if so, the working node terminal starts a process for receiving the model parameters sent by other working node terminals and starts a process for refusing to receive the parameters sent by other working node terminals, and the step III12 is carried out; otherwise, returning to the step III 10;

step III12, the working node terminal randomly selects one other working node terminal from the received network list, and the step III13 is entered;

If yes, the step III12 is returned; otherwise go to step III 14;

Adding a refused node dictionary corresponding to a working node terminal

And defining a reject node dictionary

In the middle of

The corresponding value is the rejection number

Then go to step III 16; otherwise go directly to step III 16;

If yes, go to step III 17; otherwise, returning to the step III 12;

step III17, rejecting node dictionary corresponding to working node terminal

Each value of is updated from minus 1, and then the step III18 is carried out;

step III18. deleting refusing node dictionary for working node terminal

Medium value equals key-value of 0, then go to step III 19;

step III19, performing Cm training by using the data of the working node terminal, and then entering step III 20;

Loss of power

Then go to step III 21;

Whether the value is greater than a preset accuracy threshold value

And loss of

Less than a predetermined loss threshold

2. The method of claim 1, wherein the method comprises the following steps: in the step B, the parameter server realizes the construction of the switching network, the charge of the parameter server to the front model of each target neural network and the charge of each working node terminal in the switching network to the rear model of each target neural network by executing a parameter averaging process and a network list management process.

3. The method of claim 2, wherein the model training method based on federated learning and multitask learning comprises the following steps: the parameter server executes a parameter averaging process according to the following steps I1 to I10;

And calculating their strength

Electric quantity

Bandwidth of

step I2, the parameter server selects a preset number n of working node terminals by a method according to probability weighting, sends confirmation information to each working node terminal, and then enters step I3;

i5., the parameter server starts the monitoring of the front model parameters in each target neural network received by each working node terminal in the switching network, and then the step I6 is entered;

step I6., the parameter server initializes the receiving list to be an empty list, and then enters step I7;

step I8. the parameter Server determines the identity of the received working node terminalWhether a number is greater than

If yes, go to step I10; otherwise, entering the step I9;

If yes, returning to the step I2; otherwise, returning to the step I7;

4. The method of claim 2, wherein the model training method based on federated learning and multitask learning comprises the following steps: the parameter server performs a network list management process as follows from step II1 to step II 7;

II1. parameter server initialization null node exchange dictionary

And proceeds to step II 2;

Join as keyNode switching dictionary

And initializing the value corresponding to the key to be

Then step II5 is entered;

step II5, the parameter server exchanges the node dictionary

step II6, the parameter server exchanges dictionaries aiming at the nodes

step II7, the parameter server deletes the node exchange dictionary

The key-value with the middle value of 0, and then returns to step II3.

5. The method for model training based on federated learning and multitask learning according to claim 1, characterized in that: each working node terminal in the switching network executes a process of receiving model parameters sent by other working node terminals according to the following steps IV 1-IV 10;

Then go to step IV 4;

If yes, receiving the model parameters of the rear part of each target neural network, and then entering a step IV 5; otherwise, sending a rejection notice to other working node terminals which send the model parameters of the rear part of each target neural network, and entering a step IV 5;

and step IV5, the terminal of the working node updates each received target neural network and returns to the step IV1.

6. The method for model training based on federated learning and multitask learning according to any one of claims 1-5, characterized in that: the target neural network is a neural network for regression or a neural network for classification.