CN114298326A - Model training method and device and model training system - Google Patents

Model training method and device and model training system Download PDF

Info

Publication number
CN114298326A
CN114298326A CN202111641151.1A CN202111641151A CN114298326A CN 114298326 A CN114298326 A CN 114298326A CN 202111641151 A CN202111641151 A CN 202111641151A CN 114298326 A CN114298326 A CN 114298326A
Authority
CN
China
Prior art keywords
node
model
model parameters
machine learning
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111641151.1A
Other languages
Chinese (zh)
Inventor
王鹏
沈海珍
王讯
浦世亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202111641151.1A priority Critical patent/CN114298326A/en
Publication of CN114298326A publication Critical patent/CN114298326A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Feedback Control In General (AREA)

Abstract

The embodiment of the invention provides a model training method, a model training device and a model training system, which are applied to the technical field of machine learning. The method is applied to a first node in a model training system and comprises the following steps: and receiving the model parameters sent by the at least one second node, fusing the model parameters with the model parameters of the machine learning model deployed in the first node by using the received model parameters to obtain the machine learning model containing the fused model parameters, training the obtained machine learning model based on the local data set of the first node, and returning to the step of receiving the training information sent by the at least one second node. By the scheme, the training effect of the machine learning model in each node in the federal learning system can be improved.

Description

Model training method and device and model training system
Technical Field
The invention relates to the technical field of machine learning, in particular to a model training method, a model training device and a model training system.
Background
Federal Learning (federal Learning) refers to a distributed machine Learning paradigm in which data is distributed to multiple nodes (e.g., edge devices, mobile terminals, servers, etc.) and not shared, and is combined across multiple distributed nodes to build a machine Learning model.
In the related art, the federal learning system generally comprises a server and a plurality of nodes, wherein each node is deployed with an organic machine learning model, and each node has a respective data set. In the model training process, each time, after the node trains the machine learning model based on the data set of the node, uploading the trained model parameters to a server; and after receiving the model parameters sent by each node, the server performs parameter fusion on the received model parameters and sends the fused model parameters to each node. Correspondingly, after each node receives the fused model parameters, the model parameters of the machine learning model of the node are updated, and the next training of the machine learning model is performed.
In the related art, model parameters of each node after each model training is finished are the same, so that the effect of machine learning models in the nodes in different application scenarios is different, which means that the training effect of machine learning models in some nodes is low in the federal learning system.
Disclosure of Invention
The embodiment of the invention aims to provide a model training method, a model training device and a model training system, so as to improve the training effect of a machine learning model in each node in a federal learning system. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a model training method, which is applied to a first node, where the first node is any node in a federated learning system, and the method includes:
receiving training information sent by at least one second node; wherein the training information of each second node comprises: the latest model parameters of the second node, the at least one second node is: other nodes in the federated learning system other than the first node;
fusing the received model parameters with model parameters of a machine learning model deployed in the first node to obtain fused model parameters;
updating the model parameters of the machine learning model by using the fused model parameters to obtain a machine learning model containing the fused model parameters;
and training the obtained machine learning model based on the data set of the first node, and returning to the step of receiving the training information sent by at least one second node.
Optionally, the fusing the received model parameters with the model parameters of the machine learning model deployed in the first node to obtain fused model parameters includes:
determining a parameter fusion mode to be utilized;
and according to a parameter fusion mode to be utilized, fusing the received model parameters and the model parameters of the machine learning model deployed in the first node to obtain fused model parameters.
Optionally, the fusing the received model parameters and the model parameters of the machine learning model deployed in the first node according to a parameter fusion mode to be utilized to obtain fused model parameters, including:
and if the parameter fusion mode to be utilized is the congruent fusion mode, calculating the mean value of the received model parameters and the model parameters of the machine learning model deployed in the first node, and taking the mean value as the fused model parameters.
Optionally, the fusing the received model parameters and the model parameters of the machine learning model deployed in the first node according to a parameter fusion mode to be utilized to obtain fused model parameters, including:
if the parameter fusion mode to be utilized is a node control mode, determining the fusion weight of each node in the federated learning system aiming at the first node;
and calculating the average value of the received model parameters and the weighted values of the model parameters of the machine learning model deployed in the first node as the fused model parameters based on the fusion weight of each node.
Optionally, the determining the fusion weight of each node in the federated learning system for the first node includes:
acquiring the recent history weight of the first node; the recent historical weight of each node includes: when the node is subjected to parameter fusion, fusion weights of all nodes in the federated learning system are aimed at the node;
for each second node, determining a model parameter of the second node, and taking a training contribution degree of the second node to a machine learning model in the first node as the training contribution degree of the second node;
and updating the fusion weight of each node aiming at the first node in the latest historical weight of the first node based on the training contribution degree of each second node.
Optionally, the training information of each second node further includes: when the second node performs parameter fusion, the first node performs fusion weight aiming at the second node;
the updating, based on the training contribution of the second nodes, the fusion weight of each node with respect to the first node in the recent history weight of the first node includes:
updating the fusion weight of each node aiming at the first node in the latest historical weight of the first node based on the training contribution degree of the model parameters of each second node and the fusion weight of the first node aiming at each second node.
Optionally, the determining, for each second node, a model parameter of the second node, and a training contribution degree of the machine learning model in the first node include:
calculating the mean value of the model parameters of the machine learning model and the model parameters of each second node to obtain first parameters; testing the accuracy of the machine learning model when the model parameter is the first parameter, and taking the accuracy as a first accuracy;
calculating the mean value of the model parameters of the machine learning model and the model parameters of the third nodes aiming at each second node to obtain a second parameter; when the model parameter is the second parameter, the accuracy of the machine learning model is tested and used as a second accuracy corresponding to the second node; wherein the third node is: among the second nodes, nodes other than the second node;
and calculating the difference value of the first accuracy and the second accuracy corresponding to each second node as the model parameter of the second node and the training contribution degree of the machine learning model.
Optionally, the determining a parameter fusion mode to be utilized includes:
randomly selecting a parameter fusion mode from a plurality of parameter fusion modes as a parameter fusion mode to be utilized;
alternatively, the first and second electrodes may be,
and taking the parameter fusion mode with the most selected times from the parameter fusion modes selected for the last N times as the parameter fusion mode to be utilized.
Optionally, the fusing the received model parameters and the model parameters of the machine learning model deployed in the first node according to a parameter fusion mode to be utilized to obtain fused model parameters, including:
if the parameter fusion mode to be utilized is a data weight mode, determining the data weight of each node based on the data volume of each node;
and calculating the average value of the received model parameters and the weighted values of the model parameters of the machine learning model deployed in the first node as the fused model parameters based on the data weight of each node.
In a second aspect, an embodiment of the present invention provides a model training system, where the model training system includes a plurality of nodes; wherein the content of the first and second substances,
the first node is configured to receive training information sent by at least one second node, where the training information of each second node includes: the latest model parameters of the second node; fusing the received model parameters with model parameters of a machine learning model deployed in the first node to obtain fused model parameters; updating the model parameters of the machine learning model by using the fused model parameters to obtain a machine learning model containing the fused model parameters; training the obtained machine learning model based on the data set of the first node, sending the training information of the first node to the at least one second node, and returning to execute the step of receiving the training information sent by the at least one second node; the first node is any one of the nodes in the federal learning system, and the at least one second node is: other nodes in the federated learning system other than the first node;
each second node is configured to send the training information to the first node.
In a third aspect, an embodiment of the present invention provides a model training apparatus, which is applied to a first node, where the first node is any node in a federated learning system, and the method includes:
the information receiving module is used for receiving training information sent by at least one second node; wherein the training information of each second node comprises: the latest model parameters of the second node, the at least one second node is: other nodes in the federated learning system other than the first node;
the parameter fusion module is used for fusing the received model parameters with model parameters of a machine learning model deployed in the first node to obtain fused model parameters;
the parameter updating module is used for updating the model parameters of the machine learning model by using the fused model parameters to obtain the machine learning model containing the fused model parameters;
and the model training module is used for training the obtained machine learning model based on the data set of the first node and returning to execute the information receiving module.
Optionally, the parameter fusion module includes:
the mode determination submodule is used for determining a parameter fusion mode to be utilized;
and the parameter fusion sub-module is used for fusing the received model parameters and the model parameters of the machine learning model deployed in the first node according to a parameter fusion mode to be utilized to obtain fused model parameters.
Optionally, the parameter fusion submodule is specifically configured to, if the parameter fusion mode to be utilized is a congruent fusion mode, calculate a mean value of the received model parameter and a model parameter of a machine learning model deployed in the first node, and use the mean value as a fused model parameter.
Optionally, the parameter fusion sub-module is specifically configured to determine a fusion weight of each node in the federated learning system for the first node if the parameter fusion mode to be utilized is a node control mode; and calculating the average value of the received model parameters and the weighted values of the model parameters of the machine learning model deployed in the first node as the fused model parameters based on the fusion weight of each node.
Optionally, the parameter fusion submodule is specifically configured to obtain a recent historical weight of the first node; the recent historical weight of each node includes: when the node is subjected to parameter fusion, fusion weights of all nodes in the federated learning system are aimed at the node; for each second node, determining a model parameter of the second node, and taking a training contribution degree of the second node to a machine learning model in the first node as the training contribution degree of the second node; and updating the fusion weight of each node aiming at the first node in the latest historical weight of the first node based on the training contribution degree of each second node.
Optionally, the training information of each second node further includes: when the second node performs parameter fusion, the first node performs fusion weight aiming at the second node;
the parameter fusion submodule is specifically configured to update the fusion weight of each node with respect to the first node in the recent history weight of the first node, based on the training contribution of the model parameter of each second node and the fusion weight of the first node with respect to each second node.
Optionally, the parameter fusion submodule is specifically configured to calculate an average value of the model parameters of the machine learning model and the model parameters of each second node, so as to obtain a first parameter; testing the accuracy of the machine learning model when the model parameter is the first parameter, and taking the accuracy as a first accuracy; calculating the mean value of the model parameters of the machine learning model and the model parameters of the third nodes aiming at each second node to obtain a second parameter; when the model parameter is the second parameter, the accuracy of the machine learning model is tested and used as a second accuracy corresponding to the second node; wherein the third node is: among the second nodes, nodes other than the second node; and calculating the difference value of the first accuracy and the second accuracy corresponding to each second node as the model parameter of the second node and the training contribution degree of the machine learning model.
Optionally, the mode determining submodule is specifically configured to randomly select one parameter fusion mode from multiple parameter fusion modes, and use the selected parameter fusion mode as a parameter fusion mode to be utilized; or selecting the parameter fusion mode with the most times from the parameter fusion modes selected last N times as the parameter fusion mode to be utilized.
Optionally, the parameter fusion submodule is specifically configured to determine the data weight of each node based on the data amount of each node if the parameter fusion mode to be utilized is the data weight mode; and calculating the average value of the received model parameters and the weighted values of the model parameters of the machine learning model deployed in the first node as the fused model parameters based on the data weight of each node.
In a fourth aspect, an embodiment of the present invention provides an electronic device, which is characterized by including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of the first aspect when executing a program stored in the memory.
In a fifth aspect, the embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the method steps of any one of the first aspect.
The embodiment of the invention has the following beneficial effects:
in the model training method provided by the embodiment of the present invention, a first node in a federal learning system may receive model parameters sent by at least one second node, and fuse the received model parameters with model parameters of a machine learning model deployed in the first node to obtain fused model parameters, and further update the model parameters of the machine learning model by using the fused model parameters to obtain the machine learning model including the fused model parameters, and further train the obtained machine learning model based on a local data set of the first node, and return to the step of receiving training information sent by the at least one second node. For each node in the federal learning system, the machine learning model deployed in the node is fused by using the model parameters of the second node each time, and then the obtained machine learning model is trained based on the data set of the first node, so that the model parameters in the trained machine learning model are more suitable for the node. Therefore, in the federal learning system adopting the scheme of the invention, each node can obtain model parameters more suitable for the node in each training process, so that the training effect of the machine learning model in each node in the federal learning system is improved.
Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a schematic diagram of a model training system according to the related art;
FIG. 2 is a schematic structural diagram of a model training system according to an embodiment of the present invention;
FIG. 3 is a flowchart of a model training method according to an embodiment of the present invention;
FIG. 4 is another flow chart of a model training method according to an embodiment of the present invention;
FIG. 5 is another flow chart of a model training method according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a model training system according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of a model training apparatus according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Federal learning refers to a distributed machine learning paradigm in which data is distributed to multiple nodes (e.g., edge devices, mobile terminals, servers, etc.) and not shared, and a machine learning model is built by combining across multiple distributed nodes.
Fig. 1 is a schematic structural diagram of a federal learning system in the related art, and includes a server and a plurality of nodes, where each node is deployed with an organic learning model, and each node has a respective data set. In the model training process, each time, after the node trains the machine learning model based on the data set of the node, uploading the trained model parameters to a server; and after receiving the model parameters sent by each node, the server performs parameter fusion on the received model parameters and sends the fused model parameters to each node. Correspondingly, after each node receives the fused model parameters, the model parameters of the machine learning model of the node are updated, and the next training of the machine learning model is performed.
In the related art, the model parameters of each node after each model training is finished are the same, and when the server performs parameter fusion, the model parameters tend to be part of the node model parameters, if the model parameters of some nodes are high in weight, the model parameters of some nodes are low in weight, so that the fused model parameters are only applicable to part of the nodes, and on other parts of the nodes, after the machine learning model applies the fused model parameters, the accuracy of the machine learning model is low. Therefore, in the related art, the federate learning system has a problem of model deviation, which causes different effects of the machine learning models in the nodes in different application scenarios, that is, in the federate learning system, the training effect of the machine learning models in some nodes is low.
Furthermore, since a server needs to be deployed separately, the deployment cost of the whole federal learning system is high, which is not favorable for large-scale use.
In order to solve technical problems in the related art, embodiments of the present invention provide a model training method, apparatus, and system.
As shown in fig. 2, a schematic structural diagram of a federated learning system provided in an embodiment of the present invention includes a plurality of nodes, and the nodes may communicate with each other in a broadcast manner. Unlike the federal learning system in the related art, the federal learning system in the embodiment of the present invention eliminates the server.
In the embodiment of the present invention, the nodes of the federal learning system may be various electronic devices, such as a personal computer, a server, a mobile phone, and other devices having data processing capability. Moreover, the model training method provided by the embodiment of the invention can be realized in a software, hardware or combination of software and hardware.
The model training method provided by the embodiment of the invention can comprise the following steps:
receiving training information sent by at least one second node; wherein the training information of each second node comprises: the latest model parameters of the second node, at least one second node is: other nodes except the first node in the federal learning system;
fusing the received model parameters with model parameters of a machine learning model deployed in the first node to obtain fused model parameters;
updating the model parameters of the machine learning model by using the fused model parameters to obtain a machine learning model containing the fused model parameters;
and training the obtained machine learning model based on the data set of the first node, and returning to the step of receiving the training information sent by at least one second node.
In the model training method provided by the embodiment of the invention, for each node in the federal learning system, the machine learning model deployed in the node is trained on the basis of the data set of the first node after model parameters of the second node are fused each time, so that the model parameters in the trained machine learning model are more suitable for the node. Therefore, in the federal learning system adopting the scheme of the invention, each node can obtain model parameters more suitable for the node in each training process, so that the training effect of the machine learning model in each node in the federal learning system is improved. Furthermore, since the federal learning system does not need a deployment server, the deployment cost of the federal learning system can be reduced.
The following describes the model training method, apparatus, and model training system provided in the embodiments of the present invention in detail with reference to the accompanying drawings.
As shown in fig. 3, a model training method provided in an embodiment of the present invention is applied to a first node, where the first node is any node in a federal learning system, and may include the following steps:
s301, receiving training information sent by at least one second node;
wherein the training information of each second node comprises: the latest model parameters of the second node, at least one second node is: and other nodes except the first node in the federal learning system.
Illustratively, the federal learning system comprises three nodes, namely a node 1, a node 2 and a node 3, wherein if the node 1 is used as a first node, the node 2 and the node 3 are used as a second node; if node 2 is the first node, then node 1 and node 3 are the second nodes; if node 3 is the first node, then node 1 and node 2 are the second nodes.
For the federate learning system in the embodiment of the present invention, after each training is finished, each node in the federate learning system broadcasts its training information to the outside, and the training information includes the latest model parameters of the node. Thus, each first node (any node in the federated system) may receive the training transmitted by at least one second node.
S302, fusing the received model parameters with model parameters of a machine learning model deployed in the first node to obtain fused model parameters;
after receiving the training information of at least one second node, the received model parameters may be fused with the model parameters of the machine learning model deployed in the first node, so as to obtain fused model parameters.
Generally, a machine learning model includes a plurality of different types of model parameters, and for the embodiment of the present invention, for each type of model parameter, the received model parameter of the type is fused with the model parameter of the type in the machine learning model deployed in the first node, so as to obtain a fused model parameter of the type.
For example, the federal learning system includes three nodes, node 1, node 2, and node 3. Wherein node 1 is a first node and nodes 2 and 3 are second nodes. The machine learning model deployed in the node 1 includes a first model parameter 1 of the type 1 and a second model parameter 1 of the type 2, the training information of the node 2 includes a first model parameter 2 of the type 1 and a second model parameter 2 of the type 2, and the training information of the node 3 includes a first model parameter 3 of the type 1 and a second model parameter 3 of the type 2.
When model parameter fusion is performed, the first model parameter 1, the first model parameter 2 and the first model parameter 3 may be fused to obtain a fused first model parameter; fusing the second model parameter 1, the second model parameter 2 and the second model parameter 3 to obtain a fused second model parameter; and fusing the third model parameter 1, the third model parameter 2 and the third model parameter 3 to obtain the fused third model parameter.
S303, updating the model parameters of the machine learning model by using the fused model parameters to obtain a machine learning model containing the fused model parameters;
in this step, after the fused model parameters are obtained, the model parameters of the machine learning model may be updated to the fused model parameters, and the model parameters of the machine learning model may be obtained to obtain the machine learning model with the model parameters being the fused model parameters.
S304, training the obtained machine learning model based on the data set of the first node, and returning to the step of receiving the training information sent by at least one second node.
In this step, the machine learning model with the model parameters as the post-fusion model parameters is obtained, and the machine learning model with the model parameters as the post-fusion model parameters is trained by further benefiting the data set of the first node.
By training the data set of the first node, model parameters in the machine learning model can be further updated on the basis of the model parameters after fusion, so that the model parameters in the trained machine learning model are more suitable for the node.
Optionally, in an implementation manner, after obtaining the machine learning model whose model parameters are the post-fusion model parameters, Fine-Tuning (Fine-Tuning) training may be performed on the machine learning model. The fine tuning training is a model training method for training other data sets by using an existing model. In the embodiment of the invention, the machine learning model with the model parameters being the fused model parameters is an existing model, and the machine learning model of the data set in the first node is trained.
The machine learning model can be used for functions of face recognition, action recognition, vehicle recognition, voice recognition, target classification and the like.
In the model training method provided by the embodiment of the invention, for each node in the federal learning system, the machine learning model deployed in the node is trained on the basis of the data set of the first node after model parameters of the second node are fused each time, so that the model parameters in the trained machine learning model are more suitable for the node. Therefore, in the federal learning system adopting the scheme of the invention, each node can obtain model parameters more suitable for the node in each training process, so that the training effect of the machine learning model in each node in the federal learning system is improved. Furthermore, since the federal learning system does not need a deployment server, the deployment cost of the federal learning system can be reduced.
Based on the foregoing embodiment, as shown in fig. 4, in another model training method provided in an embodiment of the present invention, a process of obtaining a fused model parameter by fusing a received model parameter with a model parameter of a machine learning model deployed in a first node may include S401 to S402:
s401, determining a parameter fusion mode to be utilized;
the parameter fusion mode comprises at least one of a full identity fusion mode, a data weight mode and a node control mode. Wherein, the fully equivalent fusion mode is a mode in which the fusion weights of all nodes are the same; the data weight mode is a mode for determining fusion weight according to the data volume of each node for generating the model gradient, where the data volume of each node for generating the model gradient is the number of samples contained in the data set of the node, for example, 10 ten thousand sample images are contained in the data set of node a, and then the data volume of node a for generating the model gradient is 10 ten thousand. The node control mode is a mode for determining the fusion weight of each node according to the training contribution degree of each node to the first node. Any one of the above-mentioned parameter fusion modes will be described in detail in the following embodiments, and will not be described again.
The above-mentioned ways of determining the parameter fusion mode to be utilized may be many, and for example, include at least one of the following two ways:
a first mode determination mode, which is to select a parameter fusion mode as a parameter fusion mode to be utilized from a plurality of parameter fusion modes at random;
in this embodiment, a parameter fusion mode can be selected from the multi-frame parameter fusion modes at equal probability, and the selected parameter fusion mode is used as the parameter fusion mode to be used. For example, if the randomly selected parameter fusion mode is a node control mode, the node control mode is a parameter fusion mode to be utilized.
In the second mode determination mode, the parameter fusion mode with the most times of selection in the parameter fusion modes selected last N times is used as the parameter fusion mode to be utilized.
Wherein N is an integer, such as 1, 2, 3, etc.
When N is 1, it indicates that the parameter fusion mode selected last time is used as the parameter fusion mode to be utilized. For example, if the parameter fusion mode selected last time is the identical fusion mode, the identical fusion mode is also selected this time.
When N is an integer greater than 1, the parameter fusion mode with the largest number of times of selection in the previous N times of selection may be calculated as the parameter fusion mode to be utilized. For example, if N is 3, and in the previous 3 times, the congruent fusion mode was selected 1 time, and the node control mode was selected 2 times, the node control mode is selected the largest number of times, and the node control mode is used as the parameter fusion mode to be used.
Of course, if the number of times each selected parameter fusion mode is selected is the same among the most recently selected parameter fusion modes N times, one parameter fusion mode may be selected at random from each selected parameter fusion mode.
S402, fusing the received model parameters and the model parameters of the machine learning model deployed in the first node according to a parameter fusion mode to be utilized to obtain fused model parameters.
In one implementation, if the parameter fusion mode to be utilized is a congruent fusion mode, a mean value of the received model parameters and model parameters of the machine learning model deployed in the first node is calculated as the fused model parameters.
The fully equivalent fusion mode means that the weight average of the nodes is the same, that is, the weight average of the first node and each second node is the same, so that after receiving the model parameter sent by at least one second node, the average of the model parameter of the machine learning model deployed in the first node and the model parameter received from each second node can be calculated as the fused model parameter.
In one implementation, if the parameter fusion mode to be utilized is a node control mode, determining a fusion weight of each node in the federated learning system for the first node, and further calculating an average value of the received model parameters and weighted values of the model parameters of the machine learning model deployed in the first node based on the fusion weight of each node, as the fused model parameters.
The fully equivalent fusion mode means that the fusion weight of each node (including the first node itself) for the first node is continuously updated along with the training process, so that the fusion weight of each node for the first node in the federal learning system needs to be determined before parameter fusion is performed, and then the average value of the received model parameters and the weighted values of the model parameters of the machine learning model deployed in the first node is calculated and used as the fused model parameters. The specific updating process of the fusion weight will be described in detail later, and will not be described herein again.
In one implementation, if the parameter fusion mode to be utilized is a data weight mode, determining the data weight of each node based on the data volume of each node; and calculating the average value of the received model parameters and the weighted values of the model parameters of the machine learning model deployed in the first node as the fused model parameters based on the data weight of each node.
The data weight mode is determined by the data weight of each node according to the number of samples in the data set of the node. For example, if the number of samples used for training at node 1 is 10 ten thousand, the number of samples used for training at node 2 is 5 ten thousand, and the number of samples used for training at node 3 is 15 ten thousand, the data weight of node 1 is 1/3, the data weight of node 2 is 1/6, and the data weight of node 3 is 1/2. After the sum of the data weights of each node is calculated, the average value of the received model parameters and the weighted values of the model parameters of the machine learning model deployed in the first node is calculated based on the data weights of the nodes, and the average value is used as the fused model parameters.
In the model training method provided by the embodiment of the invention, for each node in the federal learning system, the machine learning model deployed in the node is trained on the basis of the data set of the first node after model parameters of the second node are fused each time, so that the model parameters in the trained machine learning model are more suitable for the node. Therefore, in the federal learning system adopting the scheme of the invention, each node can obtain model parameters more suitable for the node in each training process, so that the training effect of the machine learning model in each node in the federal learning system is improved. Furthermore, since the federal learning system does not need a deployment server, the deployment cost of the federal learning system can be reduced.
Based on the foregoing embodiment, as shown in fig. 5, in another model training method provided in an embodiment of the present invention, the process of determining the fusion weight of each node in the federated learning system with respect to the first node may include S501 to S503:
s501, obtaining the recent history weight of the first node; the recent historical weight of each node includes: when the node is subjected to parameter fusion, fusion weights of all nodes in the federal learning system aiming at the node are calculated;
and when the recent historical weight of the first node is the parameter fusion of the first node for the last time, each node aims at the fusion weight of the first node.
S502, determining model parameters of each second node, and taking the training contribution degree of the second node to a machine learning model in the first node as the training contribution degree of the second node;
optionally, in an implementation manner, an average value of the model parameters of the machine learning model and the model parameters of each second node may be calculated to obtain a first parameter, and the accuracy of the machine learning model when the model parameters are the first parameter is tested to serve as the first accuracy.
And then calculating the mean value of the model parameters of the machine learning model and the model parameters of the third nodes aiming at each second node to obtain second parameters, and testing the accuracy of the machine learning model when the model parameters are the second parameters to be used as the second accuracy corresponding to the second node.
And the third node is the node except the second node in each second node. In this way, for each second node, the second accuracy of the machine learning model in the absence of the second node can be calculated, and then the change of the accuracy of the machine learning model in the absence of the model parameter of the second node can be calculated by using the first accuracy, wherein the change reflects the contribution degree of the model parameter of the second node to the training of the machine learning model in the first node.
Optionally, for each second node, a difference between the first accuracy and a second accuracy corresponding to the second node may be calculated, and the difference is used as a model parameter of the second node and contributes to training of the machine learning model.
And S503, updating the fusion weight of each node aiming at the first node in the recent history weight of the first node based on the training contribution degree of each second node.
Optionally, the training information of each second node further includes: when the second node performs parameter fusion, the fusion weight of the first node with respect to the second node may be updated based on the training contribution of the model parameter of each second node and the fusion weight of the first node with respect to each second node, in the recent history weight of the first node, the fusion weight of each node with respect to the first node.
Illustratively, taking node a, node B, and node C as examples, the original accuracy of node a is X, the first accuracy of a + B + C under the same weight is Y, the second accuracy of node C is Z, and the second accuracy of node B is W. If Z or W is larger than Y, the training contribution degree of the node C or the node B is 0, namely the fusion weight is zero; if Y > Z > X, the training contribution degree of the node B is Z-X, the training contribution degree of the node C is Y-Z, and the fusion weight can be adjusted according to the training contribution degree.
In the model training method provided by the embodiment of the invention, for each node in the federal learning system, the machine learning model deployed in the node is trained on the basis of the data set of the first node after model parameters of the second node are fused each time, so that the model parameters in the trained machine learning model are more suitable for the node. Therefore, in the federal learning system adopting the scheme of the invention, each node can obtain model parameters more suitable for the node in each training process, so that the training effect of the machine learning model in each node in the federal learning system is improved. Furthermore, since the federal learning system does not need a deployment server, the deployment cost of the federal learning system can be reduced.
Based on the method, the embodiment of the invention also provides a model training system. As shown in fig. 6, a model training system provided in an embodiment of the present invention includes a plurality of nodes; wherein the content of the first and second substances,
the first node 601 is configured to receive training information sent by at least one second node, where the training information of each second node includes: the latest model parameters of the second node; fusing the received model parameters with model parameters of a machine learning model deployed in the first node to obtain fused model parameters; updating the model parameters of the machine learning model by using the fused model parameters to obtain a machine learning model containing the fused model parameters; training the obtained machine learning model based on the data set of the first node, sending the training information of the first node to at least one second node, and returning to execute the step of receiving the training information sent by the at least one second node; the first node is any one node in the federal learning system, and the at least one second node is: other nodes except the first node in the federal learning system;
each second node 602 is configured to send training information to the first node.
In the model training system provided by the embodiment of the invention, for each node in the federal learning system, the machine learning model deployed in the federal learning system is trained on the obtained machine learning model based on the data set of the first node after model parameters of the second node are fused each time, so that the model parameters in the trained machine learning model are more suitable for the node. Therefore, in the federal learning system adopting the scheme of the invention, each node can obtain model parameters more suitable for the node in each training process, so that the training effect of the machine learning model in each node in the federal learning system is improved. Furthermore, since the federal learning system does not need a deployment server, the deployment cost of the federal learning system can be reduced.
Corresponding to the method provided from the perspective of the first node, as shown in fig. 7, an embodiment of the present invention further provides a model training apparatus applied to the first node, where the first node is any node in a federal learning system, and the apparatus includes:
an information receiving module 701, configured to receive training information sent by at least one second node; wherein the training information of each second node comprises: the latest model parameters of the second node, the at least one second node is: other nodes in the federated learning system other than the first node;
a parameter fusion module 702, configured to fuse, by using the received model parameter, the model parameter of the machine learning model deployed in the first node to obtain a fused model parameter;
a parameter updating module 703, configured to update the model parameters of the machine learning model by using the fused model parameters, so as to obtain a machine learning model including the fused model parameters;
and a model training module 704, configured to train the obtained machine learning model based on the data set of the first node, and return to execute the information receiving module.
Optionally, the parameter fusion module includes:
the mode determination submodule is used for determining a parameter fusion mode to be utilized;
and the parameter fusion sub-module is used for fusing the received model parameters and the model parameters of the machine learning model deployed in the first node according to a parameter fusion mode to be utilized to obtain fused model parameters.
Optionally, the parameter fusion submodule is specifically configured to, if the parameter fusion mode to be utilized is a congruent fusion mode, calculate a mean value of the received model parameter and a model parameter of a machine learning model deployed in the first node, and use the mean value as a fused model parameter.
Optionally, the parameter fusion sub-module is specifically configured to determine a fusion weight of each node in the federated learning system for the first node if the parameter fusion mode to be utilized is a node control mode; and calculating the average value of the received model parameters and the weighted values of the model parameters of the machine learning model deployed in the first node as the fused model parameters based on the fusion weight of each node.
Optionally, the parameter fusion submodule is specifically configured to obtain a recent historical weight of the first node; the recent historical weight of each node includes: when the node is subjected to parameter fusion, fusion weights of all nodes in the federated learning system are aimed at the node; for each second node, determining a model parameter of the second node, and taking a training contribution degree of the second node to a machine learning model in the first node as the training contribution degree of the second node; and updating the fusion weight of each node aiming at the first node in the latest historical weight of the first node based on the training contribution degree of each second node.
Optionally, the training information of each second node further includes: when the second node performs parameter fusion, the first node performs fusion weight aiming at the second node;
the parameter fusion submodule is specifically configured to update the fusion weight of each node with respect to the first node in the recent history weight of the first node, based on the training contribution of the model parameter of each second node and the fusion weight of the first node with respect to each second node.
Optionally, the parameter fusion submodule is specifically configured to calculate an average value of the model parameters of the machine learning model and the model parameters of each second node, so as to obtain a first parameter; testing the accuracy of the machine learning model when the model parameter is the first parameter, and taking the accuracy as a first accuracy; calculating the mean value of the model parameters of the machine learning model and the model parameters of the third nodes aiming at each second node to obtain a second parameter; when the model parameter is the second parameter, the accuracy of the machine learning model is tested and used as a second accuracy corresponding to the second node; wherein the third node is: among the second nodes, nodes other than the second node; and calculating the difference value of the first accuracy and the second accuracy corresponding to each second node as the model parameter of the second node and the training contribution degree of the machine learning model.
Optionally, the mode determining submodule is specifically configured to randomly select one parameter fusion mode from multiple parameter fusion modes, and use the selected parameter fusion mode as a parameter fusion mode to be utilized; or selecting the parameter fusion mode with the most times from the parameter fusion modes selected last N times as the parameter fusion mode to be utilized.
Optionally, the parameter fusion submodule is specifically configured to determine the data weight of each node based on the data amount of each node if the parameter fusion mode to be utilized is the data weight mode; and calculating the average value of the received model parameters and the weighted values of the model parameters of the machine learning model deployed in the first node as the fused model parameters based on the data weight of each node.
In the model training device provided by the embodiment of the invention, for each node in the federal learning system, the machine learning model deployed in the node is trained on the obtained machine learning model based on the data set of the first node after model parameters of the second node are fused each time, so that the model parameters in the trained machine learning model are more suitable for the node. Therefore, in the federal learning system adopting the scheme of the invention, each node can obtain model parameters more suitable for the node in each training process, so that the training effect of the machine learning model in each node in the federal learning system is improved. Furthermore, since the federal learning system does not need a deployment server, the deployment cost of the federal learning system can be reduced.
An embodiment of the present invention further provides an electronic device, as shown in fig. 8, which includes a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete mutual communication through the communication bus 804,
a memory 803 for storing a computer program;
the processor 801 is configured to implement the method steps provided from the perspective of the first node when executing the program stored in the memory 803.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the above-mentioned model training methods.
In a further embodiment provided by the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the model training methods of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus, device, and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (11)

1. A model training method is applied to a first node, wherein the first node is any node in a federated learning system, and the method comprises the following steps:
receiving training information sent by at least one second node; wherein the training information of each second node comprises: the latest model parameters of the second node, the at least one second node is: other nodes in the federated learning system other than the first node;
fusing the received model parameters with model parameters of a machine learning model deployed in the first node to obtain fused model parameters;
updating the model parameters of the machine learning model by using the fused model parameters to obtain a machine learning model containing the fused model parameters;
and training the obtained machine learning model based on the data set of the first node, and returning to the step of receiving the training information sent by at least one second node.
2. The method according to claim 1, wherein the fusing the received model parameters with model parameters of a machine learning model deployed in the first node to obtain fused model parameters comprises:
determining a parameter fusion mode to be utilized;
and according to a parameter fusion mode to be utilized, fusing the received model parameters and the model parameters of the machine learning model deployed in the first node to obtain fused model parameters.
3. The method according to claim 2, wherein the fusing the received model parameters and the model parameters of the machine learning model deployed in the first node according to the parameter fusion mode to be utilized to obtain fused model parameters comprises:
and if the parameter fusion mode to be utilized is the congruent fusion mode, calculating the mean value of the received model parameters and the model parameters of the machine learning model deployed in the first node, and taking the mean value as the fused model parameters.
4. The method according to claim 2, wherein the fusing the received model parameters and the model parameters of the machine learning model deployed in the first node according to the parameter fusion mode to be utilized to obtain fused model parameters comprises:
if the parameter fusion mode to be utilized is a node control mode, determining the fusion weight of each node in the federated learning system aiming at the first node;
and calculating the average value of the received model parameters and the weighted values of the model parameters of the machine learning model deployed in the first node as the fused model parameters based on the fusion weight of each node.
5. The method according to claim 4, wherein the determining a fusion weight for each node in the federated learning system for the first node comprises:
acquiring the recent history weight of the first node; the recent historical weight of each node includes: when the node is subjected to parameter fusion, fusion weights of all nodes in the federated learning system are aimed at the node;
for each second node, determining a model parameter of the second node, and taking a training contribution degree of the second node to a machine learning model in the first node as the training contribution degree of the second node;
and updating the fusion weight of each node aiming at the first node in the latest historical weight of the first node based on the training contribution degree of each second node.
6. The method of claim 5, wherein the training information for each second node further comprises: when the second node performs parameter fusion, the first node performs fusion weight aiming at the second node;
the updating, based on the training contribution of the second nodes, the fusion weight of each node with respect to the first node in the recent history weight of the first node includes:
updating the fusion weight of each node aiming at the first node in the latest historical weight of the first node based on the training contribution degree of the model parameters of each second node and the fusion weight of the first node aiming at each second node.
7. The method of claim 5, wherein determining, for each second node, the model parameters of the second node, the contribution to the training of the machine learning model in the first node comprises:
calculating the mean value of the model parameters of the machine learning model and the model parameters of each second node to obtain first parameters; testing the accuracy of the machine learning model when the model parameter is the first parameter, and taking the accuracy as a first accuracy;
calculating the mean value of the model parameters of the machine learning model and the model parameters of the third nodes aiming at each second node to obtain a second parameter; when the model parameter is the second parameter, the accuracy of the machine learning model is tested and used as a second accuracy corresponding to the second node; wherein the third node is: among the second nodes, nodes other than the second node;
and calculating the difference value of the first accuracy and the second accuracy corresponding to each second node as the model parameter of the second node and the training contribution degree of the machine learning model.
8. The method of claim 2, wherein determining the parameter fusion mode to be utilized comprises:
randomly selecting a parameter fusion mode from a plurality of parameter fusion modes as a parameter fusion mode to be utilized;
alternatively, the first and second electrodes may be,
and taking the parameter fusion mode with the most selected times from the parameter fusion modes selected for the last N times as the parameter fusion mode to be utilized.
9. The method according to claim 5, wherein the fusing the received model parameters and the model parameters of the machine learning model deployed in the first node according to the parameter fusion mode to be utilized to obtain fused model parameters comprises:
if the parameter fusion mode to be utilized is a data weight mode, determining the data weight of each node based on the data volume of each node;
and calculating the average value of the received model parameters and the weighted values of the model parameters of the machine learning model deployed in the first node as the fused model parameters based on the data weight of each node.
10. A model training system is characterized in that the model training system comprises a plurality of nodes; wherein the content of the first and second substances,
the first node is configured to receive training information sent by at least one second node, where the training information of each second node includes: the latest model parameters of the second node; fusing the received model parameters with model parameters of a machine learning model deployed in the first node to obtain fused model parameters; updating the model parameters of the machine learning model by using the fused model parameters to obtain a machine learning model containing the fused model parameters; training the obtained machine learning model based on the data set of the first node, sending the training information of the first node to the at least one second node, and returning to execute the step of receiving the training information sent by the at least one second node; the first node is any one of the nodes in the federal learning system, and the at least one second node is: other nodes in the federated learning system other than the first node;
each second node is configured to send the training information to the first node.
11. A model training device is applied to a first node, wherein the first node is any node in a federated learning system, and the method comprises the following steps:
the information receiving module is used for receiving training information sent by at least one second node; wherein the training information of each second node comprises: the latest model parameters of the second node, the at least one second node is: other nodes in the federated learning system other than the first node;
the parameter fusion module is used for fusing the received model parameters with model parameters of a machine learning model deployed in the first node to obtain fused model parameters;
the parameter updating module is used for updating the model parameters of the machine learning model by using the fused model parameters to obtain the machine learning model containing the fused model parameters;
and the model training module is used for training the obtained machine learning model based on the data set of the first node and returning to execute the information receiving module.
CN202111641151.1A 2021-12-29 2021-12-29 Model training method and device and model training system Pending CN114298326A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111641151.1A CN114298326A (en) 2021-12-29 2021-12-29 Model training method and device and model training system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111641151.1A CN114298326A (en) 2021-12-29 2021-12-29 Model training method and device and model training system

Publications (1)

Publication Number Publication Date
CN114298326A true CN114298326A (en) 2022-04-08

Family

ID=80971212

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111641151.1A Pending CN114298326A (en) 2021-12-29 2021-12-29 Model training method and device and model training system

Country Status (1)

Country Link
CN (1) CN114298326A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115297008A (en) * 2022-07-07 2022-11-04 鹏城实验室 Intelligent computing network-based collaborative training method and device, terminal and storage medium
WO2024036526A1 (en) * 2022-08-17 2024-02-22 华为技术有限公司 Model scheduling method and apparatus

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115297008A (en) * 2022-07-07 2022-11-04 鹏城实验室 Intelligent computing network-based collaborative training method and device, terminal and storage medium
CN115297008B (en) * 2022-07-07 2023-08-22 鹏城实验室 Collaborative training method, device, terminal and storage medium based on intelligent computing network
WO2024036526A1 (en) * 2022-08-17 2024-02-22 华为技术有限公司 Model scheduling method and apparatus

Similar Documents

Publication Publication Date Title
CN113282960B (en) Privacy calculation method, device, system and equipment based on federal learning
CN109872242B (en) Information pushing method and device
CN114298326A (en) Model training method and device and model training system
CN109598414B (en) Risk assessment model training, risk assessment method and device and electronic equipment
CN109768879B (en) Method and device for determining target service server and server
CN108965951B (en) Advertisement playing method and device
CN113591068B (en) Online login device management method and device and electronic device
CN112183627A (en) Method for generating predicted density map network and vehicle annual inspection mark number detection method
CN110069997B (en) Scene classification method and device and electronic equipment
CN111125240B (en) Distributed transaction realization method and device, electronic equipment and storage medium
CN108805332B (en) Feature evaluation method and device
CN111078773B (en) Data processing method and device
CN111008873B (en) User determination method, device, electronic equipment and storage medium
CN112836128A (en) Information recommendation method, device, equipment and storage medium
CN111582456B (en) Method, apparatus, device and medium for generating network model information
CN112235723B (en) Positioning method, positioning device, electronic equipment and computer readable storage medium
CN111414921B (en) Sample image processing method, device, electronic equipment and computer storage medium
CN111754984B (en) Text selection method, apparatus, device and computer readable medium
CN113076451B (en) Abnormal behavior identification and risk model library establishment method and device and electronic equipment
CN114679680A (en) Positioning method and device based on IP address, readable medium and electronic equipment
CN111080349A (en) Method, apparatus, server and medium for identifying multiple devices of same user
CN110399803B (en) Vehicle detection method and device
CN112926608A (en) Image classification method and device, electronic equipment and storage medium
CN111400678A (en) User detection method and device
CN111741526A (en) Positioning method, positioning device, electronic equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination