CN112348199B - Model training method based on federal learning and multi-task learning - Google Patents
Model training method based on federal learning and multi-task learning Download PDFInfo
- Publication number
- CN112348199B CN112348199B CN202011194414.4A CN202011194414A CN112348199B CN 112348199 B CN112348199 B CN 112348199B CN 202011194414 A CN202011194414 A CN 202011194414A CN 112348199 B CN112348199 B CN 112348199B
- Authority
- CN
- China
- Prior art keywords
- working node
- node terminal
- target neural
- neural network
- parameter server
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a model training method based on federal learning and multitask learning, which comprises the steps of dividing all full connection layers in a target neural network into a rear model of the target neural network and dividing the rest of the full connection layers into a front model of the target neural network; the parameter server is responsible for updating the front model of each target neural network, and each working node terminal in the switching network is responsible for the rear model of each target neural network, so that a federal learning framework is applied to train the target neural network, and a common or related network layer is trained for several tasks at the same time, so that the training accuracy is mutually promoted by a plurality of tasks, the convergence rate and the generalization capability of the models are improved, the privacy and the safety of basic data can be ensured while the high-efficiency training of the target neural network is obtained.
Description
Technical Field
The invention relates to a model training method based on federal learning and multi-task learning, and belongs to the technical field of data processing.
Background
In the field of machine learning, data collection and processing are a great difficulty, and as the relationship between mobile equipment and people becomes more and more compact, a large amount of valuable and private data exists in the mobile equipment. The traditional data processing mode is that a service provider collects data of a user to a centralized server and then performs cleaning processing through the server, but with the continuous improvement of related laws, the method may have legal risks.
In order to effectively and safely utilize the data of the user, a federal learning method is provided, and the federal learning model can enable the user to effectively utilize the data of the user for training only by uploading the gradient after the local training of the user on the premise that the user does not need to upload the data of the user, so that a unified model is trained together, and the privacy safety of the user is protected to a certain extent. For example: the framework can be used for solving the problem that an android mobile phone terminal user updates a model locally, and the design goal is to carry out efficient machine learning among multiple parties or multiple computing nodes on the premise of guaranteeing information safety during big data exchange, protecting terminal data and personal data privacy and guaranteeing legal compliance.
However, due to the current situation of unbalanced bandwidth, the existing federal learning needs to compromise, the framework is composed of a parameter server and a plurality of working nodes, synchronous communication is performed, synchronous updating iteration is performed only through synchronous parameter averaging for a plurality of times every day, updating frequency is low, and under the existing federal learning condition, the framework has the problems of slow nodes and the like, the bottleneck of the whole system training efficiency is caused, and partial communication resources are wasted without limiting the bandwidth of the nodes. This in turn leads to more training nodes to fall behind, the second is that traditional federal learning trains a model to fit the data of all nodes, which is not the optimal solution for a single user. And thirdly, the accurate and high-generalization-performance model cannot be obtained due to the fact that the data volume of some tasks is small, and the traditional federal learning is not utilized to a certain extent aiming at multi-task learning.
Disclosure of Invention
The invention aims to solve the technical problem of providing a model training method based on federal learning and multi-task learning, which is used for carrying out forward and backward division on a target neural network, applying a federal learning framework to obtain efficient model training and simultaneously ensuring the privacy and the safety of basic data.
The invention adopts the following technical scheme for solving the technical problems: the invention designs a model training method based on federal learning and multi-task learning, which is used for synchronously realizing parametric training for at least one target neural network, and all target neural networks have full connection layers with the same structure; based on the parameter server and each working node terminal, synchronously realizing the parameterized training of each target neural network according to the following steps A to C;
step A, aiming at each target neural network, dividing each fully connected layer into a rear model of the target neural network and dividing the rest into a front model of the target neural network, and then entering step B;
b, the parameter server constructs a switching network consisting of all working node terminals meeting the preset parameter requirements according to the parameter attributes of all the working node terminals, the parameter server is responsible for the front model of each target neural network, all the working node terminals in the switching network are jointly responsible for the rear model of each target neural network, and then the step C is carried out;
and C, carrying out parametric training on each target neural network by the parameter server and each working node terminal in the switching network according to sample training data respectively corresponding to each target model by applying a multi-task learning mode to obtain each trained target neural network.
As a preferred technical scheme of the invention: in the step B, the parameter server implements the construction of the switching network, the responsibility of the parameter server to the front model of each target neural network, and the responsibility of each working node terminal in the switching network to the rear model of each target neural network by executing a parameter averaging process and a network list management process.
As a preferred technical scheme of the invention: the parameter server executes a parameter averaging process according to the following steps I1 to I10;
i1. the parameter server receives the parameter list from each working node terminal, the parameter list includes the working node terminalAnd calculating their strengthElectric quantityBandwidth ofSimultaneously starting a receiving and joining exchange network application monitoring thread and a request for a exchange network list monitoring thread, and then entering step I2;
step I2, the parameter server selects a preset number n of working node terminals by a probability weighting method, sends confirmation information to each working node terminal, and then enters step I3;
step I3. the parameter server receives confirmation information from each working node terminal interacted with in step I2, wherein, if overtime happensThe parameter server retransmits the confirmation information; thus, a switching network composed of all the working node terminals meeting the preset parameter requirements is constructed, and then the step I4 is carried out;
i4., the parameter server distributes the parameters of the back model of each target neural network to each working node terminal in the switching network, and then the step I5 is carried out;
i5., the parameter server starts monitoring the front model parameters in each target neural network received by each working node terminal in the switching network, and then the step I6 is carried out;
step I6. the parameter server initializes the receiving list to empty, and then proceeds to step I7;
i7., the parameter server receives the parameters of the front model in each target neural network sent by each working node terminal in the switching network, records the received working node terminals by using a receiving list, and then enters the step I8;
step I8. the parameter server determines whether the number of received work node terminals is greater than or equal toIf yes, go to step I10; otherwise, entering the step I9;
step I9. is to judge whether the parameter server receives the parameters of the front model in each target neural network sent by each working node terminalWhether it is too long or notIf yes, returning to the step I2; otherwise, returning to the step I7;
and step I10, the parameter server calculates the front model parameters by using averaging aiming at the front model parameters in each target neural network sent by each working node terminal in the received switching network, and distributes the averaged front model parameters to each working node terminal in the switching network.
As a preferred technical scheme of the invention: the parameter server performs a network list management process as follows from step II1 to step II 7;
step II2, the parameter server starts a monitoring thread applying for joining the switching network, and then the step II3 is carried out;
step II3, the parameter server judges whether a message for applying for joining the switching network is received, if yes, the step II4 is carried out; otherwise, continuing to execute step II 3;
step II4. the parameter server adds the application to the IP address of the working node terminal corresponding to the exchange network message respectively aiming at each received application to the exchange network messageJoining a node exchange dictionary as a keyAnd initializing the value corresponding to the key to beThen step II5 is entered;
step II5, the parameter server exchanges the node dictionaryReturning to each working node terminal corresponding to each received application joining switching network message, determining the number n of the working node terminals corresponding to each received application joining switching network message, and then entering step II 6;
step II6. parameter server exchanges dictionary for nodeRespectively performing self-subtraction (1/n) to update, and then entering step II 7;
step II7, the parameter server deletes the node exchange dictionaryThe key-value with the middle value of 0, and then returns to step II3.
As a preferred technical scheme of the invention: in the execution process from step B to step C, each working node terminal in the switching network executes a process of sending the model parameters to other working node terminals and a process of receiving the model parameters sent by other working node terminals, respectively.
As a preferred technical scheme of the invention: each working node terminal in the switching network executes the process of sending the model parameters to other working node terminals according to the following steps III1 to III 23;
step III1, the working node terminal sends the calculation power to the parameter serverElectric quantityBandwidth ofThen go to step III 2;
step III2, the working node terminal polls and waits for confirmation information from the parameter server, and then the step III3 is carried out;
step III3, the working node terminal initializes the corresponding rejection node dictionaryEmpty, then proceed to step III 4;
step III4, the working node terminal starts monitoring on the parameter server side, receives the parameters of the front models of the target neural networks from the parameter server, and then enters step III 5;
step III5, the working node terminal receives the parameters of the front models of the target neural networks sent by the parameter server, updates the parameters of the front models of the target neural networks in the working node terminal, and then enters step III 6;
step III6, the working node terminal trains preset Cn wheels by applying self data to each received target neural network, and then the step III7 is carried out;
step III7, the working node terminal sends the parameters of the front model of each target neural network to a parameter server, and then the step III8 is carried out;
step III8, the working node terminal judges whether the electric quantity and the communication resource are sufficient, namely thatWhether or not greater thanAnd isWhether or not greater thanIf yes, go to step III 9; otherwise, returning to the step III 17;
step III9, the working node terminal sends an application for joining the switching network to the parameter server, and then the step III10 is carried out;
step III10, the working node terminal sends an application for requesting to join the network list to the parameter server, and then the step III11 is carried out;
step III11, the working node terminal judges whether a network list sent by the parameter server is received, if so, the working node terminal starts a process of receiving model parameters sent by other working node terminals and starts a process of refusing to receive the parameters sent by other working node terminals, and the step III12 is carried out; otherwise, returning to the step III 10;
step III12, the working node terminal randomly selects one other working node terminal from the received network list, and the step III13 is carried out;
step III13. the working node terminal judges whether other selected working node terminals exist in the rejection node dictionaryIf yes, the step III12 is returned; otherwise, go to step III 14;
step III14, the working node terminal sends the received parameters of the rear model in each target neural network to other selected working node terminals, and then the step III15 is carried out;
step III15, the working node terminal judges whether the refusing message from the other selected working node terminal is received, if yes, the IP address of the other selected working node terminal is usedAdding a refusing node dictionary corresponding to a working node terminalAnd defining a reject node dictionaryIn the middle ofThe corresponding value is the rejection numberThen go to step III 16; otherwise go directly to step III 16;
step III16, the working node terminal judges whether the number of other working node terminals to which the data sent by the working node terminal reaches is larger than a preset number threshold valueIf yes, go to step III 17; otherwise, returning to the step III 12;
step III17, rejecting node dictionary corresponding to working node terminalEach value of is updated from minus 1, and then the step III18 is carried out;
step III18. deleting refusing node dictionary for working node terminalMiddle value equals 0 key-value, then proceed to step III 19;
step III19, performing Cm training by using the data of the working node terminal, and then entering the step III 20;
step III20. the working node terminal tests the accuracy of each target neural network received by the working node terminalLoss of powerThen go to step III 21;
step III21, the working node terminal closes all monitoring and then enters step III 22;
step III22, the working node terminal judges the accuracy of each target neural network received by the working node terminalWhether the value is greater than a preset accuracy threshold valueWhile at the same time, with lossLess than a predetermined loss thresholdIf so, the working node terminal completes the training of the rear model of each target neural network received by the working node terminal; otherwise go to step III 23;
step III23, the working node terminal judges whether the received update of the model parameter of the front part of each target neural network from the parameter server terminal exists, if yes, the working node terminal returns to the step III 4; otherwise, the step III8 is returned.
As a preferred technical scheme of the invention: each working node terminal in the switching network executes a process of receiving model parameters sent by other working node terminals according to the following steps IV 1-IV 10;
step IV1, the working node terminal starts to receive the rear model parameters of each target neural network sent by other working node terminals, and the step IV2 is carried out;
step IV2, the working node terminal judges whether the working node terminal receives the rear model parameters of each target neural network sent by other working node terminals, if yes, the step IV3 is carried out; otherwise, returning to the step IV 1;
and IV3. the working node terminal evaluates the model parameters of the rear part of each target neural network sent by other working node terminals and calculates the acceptance value of the model parameters of the rear part of each target neural network by using the enhanced learning DQNThen go to step IV 4;
step IV4. the working node terminal judges the acceptance value of the rear model parameters of each target neural networkWhether the value is larger than the preset acceptance value lower limit threshold valueIf yes, receiving the rear model parameters of each target neural network, and then entering the step IV 5; otherwise, sending a rejection notice to other working node terminals which send the model parameters of the rear part of each target neural network, and entering a step IV 5;
and step IV5, the terminal of the working node updates each target neural network received by the terminal of the working node and returns to the step IV1.
As a preferred technical scheme of the invention: the target neural network is a neural network for prediction, or a neural network for classification.
Compared with the prior art, the model training method based on the federal learning and the multi-task learning has the following technical effects:
(1) the invention designs a model training method based on federal learning and multi-task learning, which divides all full connection layers in a target neural network into a rear model of the target neural network and divides the rest of the full connection layers into a front model of the target neural network; the parameter server is responsible for the front model of each target neural network, and each working node terminal in the switching network is responsible for the rear model of each target neural network, so that a federal learning framework is applied to train the target neural network, and a common or related network layer is trained for several tasks at the same time, so that the training accuracy is mutually promoted by a plurality of tasks, the convergence rate and the generalization capability of the models are improved, and the privacy and the safety of basic data can be ensured while the high-efficiency training of the target neural network is obtained.
(2) In the model training method based on the federal learning and the multi-task learning, all tasks are enabled to train the same front part of the model together, and the tasks with less training data can achieve better effect.
Drawings
FIG. 1 is a schematic diagram of a hardware, network architecture environment in an application designed by the present invention;
FIG. 2 is a flow chart of a server according to an embodiment of the present invention;
FIG. 3 is a flow chart of a server accepting to join a switched network for snooping according to an embodiment of the present invention;
FIG. 4 is a client flow diagram of an embodiment of the present invention;
FIG. 5 is a flow chart of an embodiment of the present invention for a client to accept other work node parameters;
FIG. 6 is a schematic overall flow chart of an embodiment of the present invention.
Detailed Description
The following description will explain embodiments of the present invention in further detail with reference to the accompanying drawings.
The invention designs a model training method based on federal learning and multi-task learning, which is used for synchronously realizing parametric training for at least one target neural network, and all target neural networks have full connection layers with the same structure; in practical applications, the target neural network is a neural network for prediction or a neural network for classification.
As shown in fig. 1, a parameter server and each working node terminal assume that there are n working node terminals in a system, m different tasks (n > m) are run on the whole network, and the working node terminals do not have global node information and do not know that the same tasks are the same as those of the working node terminals, the parameter server does not have authority to view training data of the working node terminals, and the data volumes stored in the different working node terminals are not necessarily equal. Certain correlation exists among tasks, and the overall efficiency is higher or a better convergence effect can be achieved through a multi-task learning framework than that of a single training task.
In actual implementation, as shown in fig. 6, the parameterized training of each target neural network is synchronously implemented according to the following steps a to C.
And step A, aiming at each target neural network, dividing each fully connected layer into a rear model of the target neural network, dividing the rest into a front model of the target neural network, and then entering the step B.
And step B, the parameter server constructs a switching network consisting of all working node terminals meeting the requirement of preset parameters according to the parameter attributes of all working node terminals, the parameter server is responsible for the front model of all target neural networks, all working node terminals in the switching network are jointly responsible for the rear model of all target neural networks, and then the step C is carried out.
In the invention, in the process of dividing the front model and the rear model of the target neural network, the parameter server is only responsible for the front model, the front model is hard parameter sharing, namely the front model of the target neural network is hard parameter sharing in multitask learning, the parameter server maintains a working node list added in exchange, nodes with the same training purpose and sufficient communication resources and electric quantity resources adaptively form a switching network, and the nodes are connected or not and are disconnected by judging whether the link generates gain or not through soft parameters sharing the rear model of the target neural network, namely all parameters of the rear model.
The advantages of this are mainly the following four aspects: firstly, the parameter server is only responsible for a front model of the synchronous target neural network, so that the user privacy can be protected to a certain degree, secondly, the architecture can simultaneously train a plurality of different image classification tasks, so that part of tasks with a small amount of data can achieve a higher effect, and nodes with sufficient communication resources are added with asynchronous soft parameter sharing to improve the convergence speed of the whole target neural network. In addition, the communication node directory is maintained through the parameter server, so that a large amount of communication loads generated when the nodes broadcast and search other nodes can be reduced, and certain safety control can be performed on the network through the parameter server.
Different from the traditional multitask learning, the method carries out parameter sharing in different modes on all parameters of the target neural network, wherein the front model realizes hard parameter sharing of low-frequency synchronous communication through a parameter server, the rear model realizes soft parameter sharing of high-frequency asynchronous communication, two parameter sharing schemes are respectively used for extracting more extensive basic characteristics in consideration of the first layers of networks, and soft parameter sharing is used later, so that the target neural network is more personalized, and the target neural network is ensured to be more suitable for user requirements.
In the practical application of the step B, the parameter server implements the construction of the switching network, the responsibility of the parameter server to the front model of each target neural network, and the responsibility of each working node terminal in the switching network to the rear model of each target neural network by executing a parameter averaging process and a network list management process, and the two processes are independent and do not interfere with each other.
Specifically, as shown in fig. 2, the parameter server executes the parameter averaging process according to the following steps I1 to I10.
I1. the parameter server receives the parameter list sent by each working node terminal, the parameter list includes the working node terminalAnd the calculated power thereofElectric quantityBandwidth ofMeanwhile, a receiving joining exchange network application listening thread and a request exchange network list listening thread are started, and then the step I2 is carried out.
And step I2, the parameter server selects a preset number n of working node terminals through a weighted probability method, sends confirmation information to each working node terminal, and then enters step I3.
Step I3. the parameter server receives the various tools interacted with in step I2Making confirmation information of node terminal, wherein, if overtimeThe parameter server retransmits the confirmation information; thereby constructing a switching network consisting of the respective working node terminals satisfying the preset parameter requirements, and then proceeding to step I4.
Step I4. the parameter server distributes the parameters of the back model of each target neural network to each working node terminal in the switching network, and then proceeds to step I5.
Step I5. the parameter server initiates listening for the front model parameters in each target neural network received by each worker node terminal in the switching network and then proceeds to step I6.
In practical application, the working node terminal needs to use two parts of the front and back models of the target neural network at the same time, but only the front model of the target neural network receives model averaging update from the parameter server.
Step I6. the parameter server initializes the received list to an empty list and then proceeds to step I7.
Step I7. the parameter server receives the parameters of the front model in each target neural network sent by each working node terminal in the switching network, and records the received each working node terminal by using the receiving list, and then proceeds to step I8.
Step I8. the parameter server judges whether the number of the received working node terminals is larger than the number of the received working node terminalsIf yes, go to step I10; otherwise, go to step I9.
Step I9. is to determine whether the time length of the parameter server receiving the front model parameters in each target neural network sent by each working node terminal is overtimeIf yes, returning to the step I2; otherwise, the procedure returns to the step I7.
And step I10, the parameter server calculates the front model parameters by using averaging aiming at the front model parameters in each target neural network sent by each working node terminal in the received switching network, and distributes the averaged front model parameters to each working node terminal in the switching network.
And as shown in fig. 3, the parameter server performs a network list management process as follows in step II1 through step II7.
And step II2, the parameter server starts a monitoring thread applying for joining the switching network, and then the step II3 is carried out.
Step II3, the parameter server judges whether a message for applying for joining the switching network is received, if yes, the step II4 is carried out; otherwise, step II3 is continued.
Step II4. the parameter server adds the application to the IP address of the working node terminal corresponding to the exchange network message respectively aiming at each received application to the exchange network messageJoining a node exchange dictionary as a keyAnd initializing the value corresponding to the key to beFor indexing the corresponding node, and then proceeds to step II5.
The dictionary is a key-value pair data structure, namely key-value, the value can be found through the key, and the value needs to be stored in pairs when being stored.
Step II5, the parameter server exchanges the node dictionaryReturning to its respective received application join exchangeAnd each working node terminal corresponding to the network message, determining the number n of the received working node terminals corresponding to each application joining the switching network message, and then entering step II6.
Step II6. parameter server exchanges dictionary for nodePerforms self-subtraction (1/n) updating respectively, and then proceeds to step II7.
Step II7, the parameter server deletes the node exchange dictionaryThe key-value with the middle value of 0, and then returns to step II3.
And C, performing parameterized training on each target neural network by the parameter server and each working node terminal in the switching network according to sample training data respectively corresponding to each target model by applying a multitask learning mode to obtain each trained target neural network.
In the actual execution process of steps B to C, two processes are mainly run on each working node terminal, which are respectively: executing a process of sending the model parameters to other working node terminals and a process of receiving the model parameters sent by other working node terminals, wherein the function of the process of sending the model parameters to other working node terminals is to avoid the working node terminals in the rejection list, send the parameters to other working node terminals, and receive rejection messages possibly sent by other working node terminals, as shown in fig. 4; the function of the process for receiving the model parameters sent by other working node terminals is to receive the parameters sent by other working node terminals, determine whether to accept the parameters, and if the received gradient is useless for the local task, send a rejection message to the working node terminal to prevent the working node terminal from sending the parameters to the working node terminal again, as shown in fig. 5. Seen from a single working node terminal, the two working node terminals are mutually independent, one of the two working node terminals is sent to the other person, and the other working node terminal is used for receiving data of the other person.
As shown in fig. 4, each worker node terminal in the switching network executes a process of sending model parameters to other worker node terminals in steps III1 to III23, respectively, as follows.
Step III1, the working node terminal sends the calculation power to the parameter serverElectric quantityBandwidth ofStep III2 is then entered.
Step III2. the working node terminal polls for confirmation information from the parameter server and then proceeds to step III3.
Step III3, the working node terminal initializes the corresponding rejection node dictionaryEmpty, then proceed to step III4.
And step III4, the working node terminal starts monitoring on the parameter server side, receives the parameters of the front model of each target neural network from the parameter server, and then enters step III5.
And step III5, the working node terminal receives the parameters of the front models of the target neural networks sent by the parameter server, updates the parameters of the front models of the target neural networks in the working node terminal, and then enters step III6.
And step III6, the working node terminal trains preset Cn rounds aiming at the received target neural networks by applying the data of the working node terminal, and then the step III7 is carried out.
The data of the device is private data held by the device under federal learning, and if the application scene is picture classification, the sample is correspondingly a picture in the working node terminal.
And step III7, the working node terminal sends the parameters of the front model of each target neural network to the parameter server, and then the step III8 is carried out.
Step III8, the working node terminal judges whether the self electric quantity and the communication resource are sufficient, namely thatWhether or not greater thanAnd isWhether or not greater thanIf yes, go to step III 9; otherwise, the step III17 is returned.
And step III9, the working node terminal sends a request for joining the switching network to the parameter server, and then the step III10 is carried out.
And III10, the working node terminal sends an application for requesting to join the network list to the parameter server, and then the step III11 is carried out.
In practical application, because the parameter server does not know the state of each working node terminal, if the working node terminal state is better, the working node terminal needs to find the parameter server to apply for joining the network list.
Step III11, the working node terminal judges whether a network list sent by the parameter server is received, if so, the working node terminal starts a process of receiving model parameters sent by other working node terminals and starts a process of refusing to receive the parameters sent by other working node terminals, and the step III12 is carried out; otherwise, the step III10 is returned.
And step III12, the working node terminal randomly selects one other working node terminal from the received network list and enters step III13.
Step III13. the working node terminal judges whether other selected working node terminals exist in the rejection node dictionaryIf yes, the step III12 is returned; otherwise go to step III14.
And step III14, the working node terminal sends the parameters of the posterior model in each target neural network received by the working node terminal to other selected working node terminals, and then the working node terminal enters the step III15.
Step III15, the working node terminal judges whether the refusing message from the other selected working node terminal is received, if yes, the IP address of the other selected working node terminal is usedAdding a refusing node dictionary corresponding to a working node terminalAnd defining a reject node dictionaryIn the middle ofThe corresponding value is the rejection numberThen go to step III 16; otherwise proceed directly to step III16.
Step III16, the working node terminal judges whether the number of other working node terminals to which the data sent by the working node terminal reaches is larger than a preset number threshold valueIf yes, go to step III 17; otherwise, the step III12 is returned.
Step III17, rejecting node dictionary corresponding to working node terminalRespectively, from minus 1, and then proceeds to step III18.
Step III18, deleting the refused node word by the working node terminalDian (Chinese character)Medium value equals key-value of 0, then step III19.
And step III19, performing Cm training by using the data of the terminal of the working node, and then entering step III20.
Step III20. the working node terminal tests the accuracy of each target neural network received by the working node terminalLoss of powerThen proceed to step III21.
And step III21, the working node terminal closes all monitoring and then enters step III22.
Step III22, the working node terminal judges the accuracy of each target neural network received by the working node terminalWhether the value is greater than a preset accuracy threshold valueWhile at the same time, and lossLess than a predetermined loss thresholdIf so, the working node terminal completes the training of the rear model of each target neural network received by the working node terminal; otherwise proceed to step III23.
Step III23, the working node terminal judges whether the received update of the model parameters of the front part of each target neural network from the parameter server terminal exists, if yes, the working node terminal returns to the step III 4; otherwise, the step III8 is returned.
Further, as shown in fig. 5, each of the working node terminals in the switching network executes a process of receiving the model parameters transmitted from the other working node terminals in accordance with the following steps IV1 to IV10, respectively.
And step IV1, the working node terminals start to receive the rear model parameters of each target neural network sent by other working node terminals, and the step IV2 is carried out.
Step IV2, the working node terminal judges whether the working node terminal receives the rear model parameters of each target neural network sent by other working node terminals, if yes, the step IV3 is carried out; otherwise, the step IV1 is returned.
And step IV3, the working node terminal evaluates the model parameters of the rear part of each target neural network, which are sent by other working node terminals, and calculates the acceptance value of the model parameters of the rear part of each target neural network by using the enhanced learning DQNStep IV4 is then entered.
Step IV4. the working node terminal judges the acceptance value of the rear model parameters of each target neural networkWhether the value is larger than the preset acceptance value lower limit threshold valueIf yes, receiving the rear model parameters of each target neural network, and then entering the step IV 5; otherwise, a rejection notice is sent to other working node terminals sending the model parameters of the target neural networks, and the process goes to step IV5.
And step IV5, the terminal of the working node updates each target neural network received by the terminal of the working node and returns to the step IV1.
The invention designs a model training method based on federal learning and multi-task learning, which divides all full connection layers in a target neural network into a rear model of the target neural network and divides the rest of the full connection layers into a front model of the target neural network; the parameter server is responsible for the front model of each target neural network, and each working node terminal in the switching network is responsible for the rear model of each target neural network, so that a federal learning framework is applied to train the target neural network, and a common or related network layer is trained for several tasks at the same time, so that the training accuracy is mutually promoted by a plurality of tasks, the convergence rate and the generalization capability of the models are improved, and the privacy and the safety of basic data can be ensured while the high-efficiency training of the target neural network is obtained.
A plurality of tasks share a structure, and parameters in the structure are shared by all the tasks during optimization. The parameter sharing of the model is mainly divided into hard parameter sharing and soft parameter sharing, wherein the hard parameter sharing means that the sharing layer enforces the same parameter to be used, and the soft parameter sharing means that the sharing layer tends to be the same through a certain constraint but does not impose the constraint.
In the application, all tasks are enabled to train the same front part of the model together, so that the tasks with less training data can achieve better effect, the target neural network model is split, updated and transmitted, the privacy protection capability of a user is improved compared with the traditional federal learning, the rear part model in the target neural network model is updated in a self-adaptive mode according to the relevance of the tasks, in order to ensure the individuation of the model, hard parameter sharing is not used, soft parameter sharing is adopted, the model is more suitable for the data of the user, the whole model has better individuation, and the model is added into asynchronous model rear half section parameter sharing, so that bandwidth-rich nodes are utilized more effectively.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.
Claims (6)
1. A model training method based on federal learning and multitask learning is used for synchronously realizing parameterized training aiming at least one target neural network, and all target neural networks have full connection layers with the same structure; the method is characterized in that: based on the parameter server and each working node terminal, synchronously realizing the parametric training of each target neural network according to the following steps A to C;
step A, aiming at each target neural network, dividing each full connection layer into a rear model of the target neural network and dividing the rest of the full connection layer into a front model of the target neural network, and then entering step B;
b, the parameter server constructs a switching network consisting of all working node terminals meeting the preset parameter requirements according to the parameter attributes of all the working node terminals, the parameter server is responsible for the front model of each target neural network, all the working node terminals in the switching network are jointly responsible for the rear model of each target neural network, and then the step C is carried out;
step C, carrying out parametric training on each target neural network by the parameter server and each working node terminal in the switching network according to sample training data respectively corresponding to each target model by applying a multi-task learning mode to obtain each trained target neural network;
in the execution process from step B to step C, each working node terminal in the switching network respectively executes the process of sending the model parameters to other working node terminals and the process of receiving the model parameters sent by other working node terminals; wherein, each working node terminal in the switching network executes the process of sending the model parameters to other working node terminals according to the following steps III1 to III 23;
step III1, the working node terminal sends self calculation force to the parameter serverElectric quantityBandwidth ofThen go to step III 2;
step III2, the working node terminal polls and waits for confirmation information from the parameter server, and then the step III3 is carried out;
step III3, the working node terminal initializes the corresponding rejection node dictionaryEmpty, then proceed to step III 4;
step III4, the working node terminal starts monitoring of the working node terminal on the parameter server side, receives parameters of front models of the target neural networks from the parameter server, and then enters step III 5;
step III5, the working node terminal receives the parameters of the front models of the target neural networks sent by the parameter server, updates the parameters of the front models of the target neural networks in the working node terminal, and then enters step III 6;
step III6, the working node terminal trains preset Cn wheels by applying self data aiming at each received target neural network, and then the step III7 is carried out;
step III7, the working node terminal sends the parameters of the front model of each target neural network to a parameter server, and then the step III8 is carried out;
step III8, the working node terminal judges whether the electric quantity and the communication resource are sufficient, namely thatWhether or not greater thanAnd isWhether or not greater thanIf yes, go to step III 9; otherwise, returning to the step III 17;
step III9, the working node terminal sends an application for joining the switching network to the parameter server, and then the step III10 is carried out;
step III10, the working node terminal sends an application for requesting to join the network list to the parameter server, and then the step III11 is carried out;
step III11, the working node terminal judges whether a network list sent by the parameter server is received, if so, the working node terminal starts a process for receiving the model parameters sent by other working node terminals and starts a process for refusing to receive the parameters sent by other working node terminals, and the step III12 is carried out; otherwise, returning to the step III 10;
step III12, the working node terminal randomly selects one other working node terminal from the received network list, and the step III13 is entered;
step III13. the working node terminal judges whether other selected working node terminals exist in the rejection node dictionaryIf yes, the step III12 is returned; otherwise go to step III 14;
step III14, the working node terminal sends the received parameters of the rear model in each target neural network to other selected working node terminals, and then the step III15 is carried out;
step III15, the working node terminal judges whether the refusing message from the other selected working node terminal is received, if yes, the IP address of the other selected working node terminal is usedAdding a refused node dictionary corresponding to a working node terminalAnd defining a reject node dictionaryIn the middle ofThe corresponding value is the rejection numberThen go to step III 16; otherwise go directly to step III 16;
step III16, the working node terminal judges whether the number of other working node terminals to which the data sent by the working node terminal reaches is larger than a preset number threshold valueIf yes, go to step III 17; otherwise, returning to the step III 12;
step III17, rejecting node dictionary corresponding to working node terminalEach value of is updated from minus 1, and then the step III18 is carried out;
step III18. deleting refusing node dictionary for working node terminalMedium value equals key-value of 0, then go to step III 19;
step III19, performing Cm training by using the data of the working node terminal, and then entering step III 20;
step III20. the working node terminal tests the accuracy of each target neural network received by the working node terminalLoss of powerThen go to step III 21;
step III21, the working node terminal closes all monitoring and then enters step III 22;
step III22, the working node terminal judges the accuracy of each target neural network received by the working node terminalWhether the value is greater than a preset accuracy threshold valueAnd loss ofLess than a predetermined loss thresholdIf so, the working node terminal completes the training of the rear model of each target neural network received by the working node terminal; otherwise go to step III 23;
step III23, the working node terminal judges whether the received update of the model parameter of the front part of each target neural network from the parameter server terminal exists, if yes, the working node terminal returns to the step III 4; otherwise, the step III8 is returned.
2. The method of claim 1, wherein the method comprises the following steps: in the step B, the parameter server realizes the construction of the switching network, the charge of the parameter server to the front model of each target neural network and the charge of each working node terminal in the switching network to the rear model of each target neural network by executing a parameter averaging process and a network list management process.
3. The method of claim 2, wherein the model training method based on federated learning and multitask learning comprises the following steps: the parameter server executes a parameter averaging process according to the following steps I1 to I10;
i1. the parameter server receives the parameter list from each working node terminal, the parameter list includes the working node terminalAnd calculating their strengthElectric quantityBandwidth ofSimultaneously starting a receiving and joining exchange network application monitoring thread and a request for a exchange network list monitoring thread, and then entering step I2;
step I2, the parameter server selects a preset number n of working node terminals by a method according to probability weighting, sends confirmation information to each working node terminal, and then enters step I3;
step I3. the parameter server receives confirmation information from each working node terminal interacted with in step I2, wherein, if overtime happensThe parameter server retransmits the confirmation information; thus, a switching network composed of all the working node terminals meeting the preset parameter requirements is constructed, and then the step I4 is carried out;
i4., the parameter server distributes the parameters of the back model of each target neural network to each working node terminal in the switching network, and then the step I5 is carried out;
i5., the parameter server starts the monitoring of the front model parameters in each target neural network received by each working node terminal in the switching network, and then the step I6 is entered;
step I6., the parameter server initializes the receiving list to be an empty list, and then enters step I7;
i7., the parameter server receives the parameters of the front model in each target neural network sent by each working node terminal in the switching network, records the received working node terminals by using a receiving list, and then enters the step I8;
step I8. the parameter Server determines the identity of the received working node terminalWhether a number is greater thanIf yes, go to step I10; otherwise, entering the step I9;
step I9. is to determine whether the time length of the parameter server receiving the front model parameters in each target neural network sent by each working node terminal is overtimeIf yes, returning to the step I2; otherwise, returning to the step I7;
and step I10, the parameter server calculates the front model parameters by using averaging aiming at the front model parameters in each target neural network sent by each working node terminal in the received switching network, and distributes the averaged front model parameters to each working node terminal in the switching network.
4. The method of claim 2, wherein the model training method based on federated learning and multitask learning comprises the following steps: the parameter server performs a network list management process as follows from step II1 to step II 7;
step II2, the parameter server starts a monitoring thread applying for joining the switching network, and then the step II3 is carried out;
step II3, the parameter server judges whether a message for applying for joining the switching network is received, if yes, the step II4 is carried out; otherwise, continuing to execute step II 3;
step II4. the parameter server adds the application to the IP address of the working node terminal corresponding to the exchange network message respectively aiming at each received application to the exchange network messageJoin as keyNode switching dictionaryAnd initializing the value corresponding to the key to beThen step II5 is entered;
step II5, the parameter server exchanges the node dictionaryReturning to each working node terminal corresponding to each received application joining switching network message, determining the number n of the working node terminals corresponding to each received application joining switching network message, and then entering step II 6;
step II6, the parameter server exchanges dictionaries aiming at the nodesRespectively performing self-subtraction (1/n) to update, and then entering step II 7;
5. The method for model training based on federated learning and multitask learning according to claim 1, characterized in that: each working node terminal in the switching network executes a process of receiving model parameters sent by other working node terminals according to the following steps IV 1-IV 10;
step IV1, the working node terminal starts to receive the rear model parameters of each target neural network sent by other working node terminals, and the step IV2 is carried out;
step IV2, the working node terminal judges whether the working node terminal receives the rear model parameters of each target neural network sent by other working node terminals, if yes, the step IV3 is carried out; otherwise, returning to the step IV 1;
and IV3. the working node terminal evaluates the model parameters of the rear part of each target neural network sent by other working node terminals and calculates the acceptance value of the model parameters of the rear part of each target neural network by using the enhanced learning DQNThen go to step IV 4;
step IV4. the working node terminal judges the acceptance value of the rear model parameters of each target neural networkWhether the value is larger than the preset acceptance value lower limit threshold valueIf yes, receiving the model parameters of the rear part of each target neural network, and then entering a step IV 5; otherwise, sending a rejection notice to other working node terminals which send the model parameters of the rear part of each target neural network, and entering a step IV 5;
and step IV5, the terminal of the working node updates each received target neural network and returns to the step IV1.
6. The method for model training based on federated learning and multitask learning according to any one of claims 1-5, characterized in that: the target neural network is a neural network for regression or a neural network for classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011194414.4A CN112348199B (en) | 2020-10-30 | 2020-10-30 | Model training method based on federal learning and multi-task learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011194414.4A CN112348199B (en) | 2020-10-30 | 2020-10-30 | Model training method based on federal learning and multi-task learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112348199A CN112348199A (en) | 2021-02-09 |
CN112348199B true CN112348199B (en) | 2022-08-30 |
Family
ID=74356840
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011194414.4A Active CN112348199B (en) | 2020-10-30 | 2020-10-30 | Model training method based on federal learning and multi-task learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112348199B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113051608A (en) * | 2021-03-11 | 2021-06-29 | 佳讯飞鸿(北京)智能科技研究院有限公司 | Method for transmitting virtualized sharing model for federated learning |
CN113032835B (en) * | 2021-04-21 | 2024-02-23 | 支付宝(杭州)信息技术有限公司 | Model training method, system and device for privacy protection |
CN113516249B (en) * | 2021-06-18 | 2023-04-07 | 重庆大学 | Federal learning method, system, server and medium based on semi-asynchronization |
US11797611B2 (en) | 2021-07-07 | 2023-10-24 | International Business Machines Corporation | Non-factoid question answering across tasks and domains |
CN113516250B (en) | 2021-07-13 | 2023-11-03 | 北京百度网讯科技有限公司 | Federal learning method, device, equipment and storage medium |
CN115481752B (en) * | 2022-09-23 | 2024-03-19 | 中国电信股份有限公司 | Model training method, device, electronic equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165725A (en) * | 2018-08-10 | 2019-01-08 | 深圳前海微众银行股份有限公司 | Neural network federation modeling method, equipment and storage medium based on transfer learning |
CN110874484A (en) * | 2019-10-16 | 2020-03-10 | 众安信息技术服务有限公司 | Data processing method and system based on neural network and federal learning |
-
2020
- 2020-10-30 CN CN202011194414.4A patent/CN112348199B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165725A (en) * | 2018-08-10 | 2019-01-08 | 深圳前海微众银行股份有限公司 | Neural network federation modeling method, equipment and storage medium based on transfer learning |
CN110874484A (en) * | 2019-10-16 | 2020-03-10 | 众安信息技术服务有限公司 | Data processing method and system based on neural network and federal learning |
Also Published As
Publication number | Publication date |
---|---|
CN112348199A (en) | 2021-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112348199B (en) | Model training method based on federal learning and multi-task learning | |
Hu et al. | Ready player one: UAV-clustering-based multi-task offloading for vehicular VR/AR gaming | |
Wang et al. | Cross-layer rate control for end-to-end proportional fairness in wireless networks with random access | |
EP3651485B1 (en) | Bluetooth cluster online upgrade method and apparatus | |
CN111447660B (en) | Network roaming control method and device of terminal equipment and cloud control equipment | |
He et al. | QoE-based cooperative task offloading with deep reinforcement learning in mobile edge networks | |
CN107889082A (en) | A kind of D2D method for discovering equipment using social networks between user | |
CN109547351A (en) | Method for routing based on Q study and trust model in Ad Hoc network | |
CN112416881A (en) | Intelligent terminal storage sharing method, device, medium and equipment based on block chain | |
CN112929223B (en) | Method and system for training neural network model based on federal learning mode | |
CN112737770A (en) | PUF-based network bidirectional authentication and key agreement method and device | |
CN103167479B (en) | Realize that terminal is distant gets killed or the distant methods, devices and systems opened | |
Li et al. | Respipe: Resilient model-distributed dnn training at edge networks | |
Zhang et al. | Joint sensing, communication, and computation resource allocation for cooperative perception in fog-based vehicular networks | |
CN105224550A (en) | Distributed stream computing system and method | |
Tang et al. | Dynamic computation offloading with imperfect state information in energy harvesting small cell networks: A partially observable stochastic game | |
CN116976468A (en) | Safe and reliable distributed learning method | |
CN109041065B (en) | Node trust management method for two-hop multi-copy ad hoc network | |
CN111464367A (en) | Method, device, computer equipment and storage medium for establishing virtual communication connection | |
WO2017054102A1 (en) | Method and device for managing user equipment | |
Premaratne et al. | Location information-aided task-oriented self-organization of ad-hoc sensor systems | |
CN107148034A (en) | Radio sensing network management method and system based on geographic grid and mobile agent | |
Zhang et al. | Trust-based intermediary vehicle election provisioning with resilience under information asymmetry | |
CN107733767A (en) | A kind of method for building up of social networks net, device and system | |
Cheng et al. | Dynamic multiagent load balancing using distributed constraint optimization techniques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |